Jobgether

Senior Site Reliability Engineer - AWS

U.S.

Full-Time

Remote

$175,000 – $195,000 per year

TLDR

Lead automation and reliability efforts across cloud platforms to boost scalability, performance, and incident response in a fast-growing tech environment.

Accountabilities:

Serve as the reliability engineering lead within a cross-functional team, providing technical leadership, mentorship, and guidance on best practices.
Design, implement, and maintain highly automated systems that support software development, deployment, testing, monitoring, and operational workflows.
Act as the primary advocate for reliability, scalability, and operational excellence throughout the entire software development lifecycle.
Develop and maintain monitoring, logging, dashboarding, and alerting solutions that provide visibility into application and infrastructure health.
Continuously improve CI/CD pipelines, automation frameworks, deployment processes, and operational tooling to increase efficiency and reduce manual effort.
Identify and remediate reliability, performance, availability, and security risks across cloud infrastructure and production systems.
Create and maintain technical documentation, operational procedures, architecture standards, and engineering best practices.
Research, evaluate, and implement tools and technologies that improve system resilience and engineering productivity.
Collaborate with engineering teams to troubleshoot complex production issues and ensure rapid incident resolution.
Participate in on-call rotations and provide support during critical production incidents and emergency response situations.
Mentor junior engineers and contribute to the development of a strong reliability engineering culture.

Requirements:

8+ years of experience in software engineering, infrastructure engineering, cloud operations, or related technical disciplines.
Minimum of 7 years of dedicated experience in Site Reliability Engineering (SRE) or closely related reliability-focused roles.
Strong expertise in Python, Bash, PowerShell, and other scripting or automation technologies commonly used within SRE environments.
Extensive experience designing, building, and maintaining autonomous systems that automate deployment, testing, monitoring, and operational processes.
Advanced hands-on experience with AWS services, including EC2, EKS/Kubernetes, CloudWatch, Lambda, S3, IAM, and related cloud-native technologies.
Proven ability to implement and optimize monitoring, alerting, incident management, capacity planning, and performance optimization strategies.
Deep understanding of CI/CD pipelines, infrastructure automation, and modern DevOps and SRE practices.
Experience building and maintaining highly available, scalable, and resilient production systems in fast-paced environments.
Strong problem-solving skills with the ability to independently identify reliability challenges and drive long-term improvements.
Demonstrated success reducing operational toil through automation and process optimization.
Excellent communication, collaboration, and stakeholder management skills.
Bachelor’s degree in Computer Science, Information Systems, or a related field, equivalent certifications, or comparable professional experience.
AWS certifications or other cloud-related certifications are considered a plus.
Self-driven mindset with a passion for continuous learning, innovation, and operational excellence.

Benefits:

Competitive base salary ranging from $175,000 to $195,000, based on experience, skills, and location.
Comprehensive medical, dental, and vision insurance coverage for eligible employees.
Generous paid time off program.
Paid maternity and paternity leave.
Short-term and long-term disability coverage.
Opportunity to work within a dynamic, fast-growing technology organization.
Exposure to large-scale cloud infrastructure and cutting-edge engineering challenges.
Access to experienced leadership and mentorship opportunities.
Strong culture focused on innovation, collaboration, and professional growth.
Competitive compensation package designed to support pay equity.
Company-branded merchandise and employee recognition perks.
Opportunity to make a significant impact on highly scalable, mission-critical systems.

How Jobgether works:

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

#LI-CL1

Benefits

Health Insurance

Comprehensive medical, dental, and vision insurance coverage for eligible employees.

impactful work opportunities

Opportunity to make a significant impact on highly scalable, mission-critical systems.

Paid Parental Leave

Paid maternity and paternity leave.

Paid Time Off

Generous paid time off program.

Remote-Friendly

Opportunity to work within a dynamic, fast-growing technology organization.

Apply for this job

Jobgether

Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.

Founded: Founded 2020
Employees: 11-50 employees
Industry: Professional Services

View company profile

Senior Site Reliability Engineer

Report this job