Lead DevOps Engineer (m/f/d)
TLDR
Leads automated, self-healing infrastructure for massive distributed workloads with sub-millisecond latency across global data centers, blending systems engineering and leadership.
- Lead the design and implementation of automated infrastructure provisioning pipelines to manage large-scale bare-metal environments across global data centers, ensuring consistency, reliability, and scalability.
- Build and maintain infrastructure automation systems using tools such as Terraform, Ansible, Puppet, or Chef to enable standardized and repeatable deployments.
- Oversee operations, monitoring, and incident response, including security monitoring, system recovery, and resolution of hardware and software failures in high-availability environments.
- Manage infrastructure maintenance activities such as OS patching, system upgrades, performance tuning, capacity optimization, and lifecycle management of distributed systems.
- Ensure observability and system health through monitoring and alerting platforms, leveraging tools such as Prometheus, Grafana, and related observability stacks.
- Collaborate with engineering teams to optimize system performance, including load balancing, network tuning, and low-latency architecture improvements.
- Provide technical leadership, mentor engineers, and contribute to scaling DevOps practices, operational excellence, and cross-team collaboration.
- 8+ years of experience in DevOps, systems engineering, infrastructure operations, or site reliability engineering roles in large-scale distributed environments.
- At least 2+ years of proven experience in technical leadership or team management roles.
- Strong hands-on expertise in Linux/Unix systems administration, scripting (Shell, Python, or Java), and SQL.
- Experience with infrastructure automation tools such as Terraform, Ansible, Puppet, or Chef for managing large-scale systems.
- Deep understanding of networking concepts including L4/L7 load balancing (HAProxy, Nginx) and TCP/IP performance optimization.
- Strong experience with CI/CD and orchestration tools such as Jenkins, Airflow, or similar platforms.
- Hands-on experience with observability tools such as Prometheus, Grafana, or InfluxDB.
- Exposure to Kubernetes is a strong advantage.
- Ability to work in rotational shifts and participate in on-call incident response when required.
- Strong analytical mindset with the ability to troubleshoot complex distributed systems under pressure.
- Fully remote-first setup with flexibility to work from anywhere in India
- Direct reporting line to senior leadership, including CTO-level exposure
- High-impact role in a globally scaled, high-performance engineering environment
- Annual learning and development budget to support continuous growth
- Home-office setup allowance to enable productive remote work
- Strong career growth opportunities in a rapidly scaling technical organization
- Collaborative, international team across multiple geographies and engineering disciplines.
Requirements
Benefits
Benefits
Equity Compensation
Direct reporting line to senior leadership, including CTO-level exposure
Home Office Stipend
Home-office setup allowance to enable productive remote work
Learning Budget
Annual learning and development budget to support continuous growth
Remote-Friendly
Fully remote-first setup with flexibility to work from anywhere in India
Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.
- Founded
- Founded 2020
- Employees
- 11-50 employees
- Industry
- Professional Services