Senior Site Reliability Engineer (SRE)
TLDR
Maintain and improve the reliability of large-scale AI and cloud-native services, while advancing CI/CD, observability, and automation across cross-functional teams.
- Maintain high system availability by ensuring fault tolerance, monitoring, and rapid incident response across production services.
- Design, implement, and optimize scalable infrastructure solutions using modern cloud-native technologies.
- Improve and evolve CI/CD pipelines to enable safe, efficient, and automated software delivery.
- Collaborate with engineering teams to troubleshoot complex system issues across compute, networking, and storage layers.
- Apply infrastructure-as-code practices using tools such as Terraform, Ansible, or similar to manage and standardize environments.
- Support containerized environments and orchestration platforms such as Docker, Kubernetes, and Helm.
- Contribute to operational best practices, including observability, alerting, and performance tuning.
- Strong programming skills in languages such as Go, Python, or C++, with a solid foundation in algorithms and data structures.
- Deep understanding of Unix/Linux systems, networking fundamentals, and distributed system behavior.
- Hands-on experience with containerization and orchestration tools such as Docker and Kubernetes.
- Practical experience with infrastructure-as-code and configuration management tools (Terraform, Ansible, Salt, or similar).
- Familiarity with CI/CD systems and modern DevOps practices.
- Experience working with or supporting high-load distributed systems in production environments.
- Strong problem-solving mindset with the ability to diagnose and resolve complex technical issues.
- Excellent communication and collaboration skills in cross-functional engineering teams.
- Competitive compensation package
- Career growth and continuous learning opportunities
- High degree of autonomy, flexibility, and ownership
- Collaborative and innovation-focused engineering culture
- Opportunity to work on large-scale, impactful cloud and AI infrastructure
- International environment with highly skilled engineering teams
Requirements:
Benefits:
Benefits
Flexible Work Hours
High degree of autonomy, flexibility, and ownership
Learning Budget
Career growth and continuous learning opportunities
International environment with skilled teams
International environment with highly skilled engineering teams
Remote-Friendly
High degree of autonomy, flexibility, and ownership
Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.
- Founded
- Founded 2020
- Employees
- 11-50 employees
- Industry
- Professional Services