Senior Software Engineer, Site Reliability Engineering
TLDR
Design and operate highly available, scalable infrastructure for large-scale distributed applications, shaping platform architecture and improving observability across cloud environments.
- Design, build, and maintain scalable and highly available infrastructure and systems that support large-scale distributed applications.
- Define and influence architectural direction for platform services, ensuring resilience, performance, and scalability across systems.
- Develop tools and automation for deployment, monitoring, configuration management, and infrastructure operations.
- Troubleshoot and resolve complex production issues across distributed systems, ensuring minimal downtime and rapid recovery.
- Improve observability, monitoring, and alerting systems to enhance system visibility and reliability.
- Participate in capacity planning, performance tuning, and forecasting to proactively address scaling challenges.
- Collaborate with engineering teams to improve developer experience and reduce operational toil through automation and platform improvements.
- Participate in on-call rotations and provide incident response support for critical systems.
- 5+ years of experience in Site Reliability Engineering, infrastructure engineering, or distributed systems roles.
- Strong expertise in AWS and Linux-based environments.
- Proficiency in programming languages such as Python, Go, JavaScript, or similar for automation and system development.
- Deep understanding of distributed systems and networking protocols including DNS, HTTP/S, TLS, and TCP/IP.
- Hands-on experience operating, monitoring, and debugging large-scale microservices architectures in production environments.
- Strong problem-solving skills with the ability to break down complex system challenges and evaluate technical trade-offs.
- Excellent communication skills with the ability to collaborate across engineering and non-engineering stakeholders.
- Strong focus on system reliability, scalability, and reducing operational overhead.
- Competitive base salary range aligned with experience and location
- Equity participation in a high-growth technology organization
- Comprehensive medical, dental, and vision insurance coverage
- 401(k) retirement plan and financial wellbeing support
- Flexible remote work options within North America
- Flexible paid time off and parental leave policies
- Professional development support and learning opportunities
- Inclusive, engineering-driven culture focused on reliability and innovation
Requirements:
Benefits:
Benefits
Equity Compensation
Equity participation in a high-growth technology organization
Health Insurance
Comprehensive medical, dental, and vision insurance coverage
Learning Budget
Professional development support and learning opportunities
Paid Time Off
Flexible paid time off and parental leave policies
Remote-Friendly
Flexible remote work options within North America
Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.
- Founded
- Founded 2020
- Employees
- 11-50 employees
- Industry
- Professional Services