Collective Health is hiring a

Site Reliability Engineer (SRE)

Chicago, United States

We all depend on healthcare throughout our lifetimes, for ourselves, and our families and friends, but it is notoriously difficult to navigate and understand. As an industry that comprises 20% of the US economy we think healthcare should work better for all of us. At Collective Health we believe it’s time for a new day in healthcare where as members we are informed and empowered to make the right care choices when the decisions are urgent and critical. 

Site Reliability Engineering (SRE) at Collective Health is a discipline combining software and systems engineering skills. We apply modern infrastructure, systems, software, architecture, and development practices to give our customers a more reliable healthcare management experience.

Through designing solutions for reliability, automating and simplifying to reduce toil, and normalizing a robust incident response procedure that resolves uncovered problems: we unlock development velocity so that we can deliver reliable services that make a real difference in healthcare.

Embedded in an engineering team, SREs gain deep localized functional and technical domain knowledge, which they use to build solutions and improve outcomes for their embedded team. As a broader discipline, SREs collaborate and identify themes and solutions to benefit Collective Health at large, engage in regular knowledge sharing activities and retrospectives, and relentlessly support one another in order to gain knowledge, remove barriers, and grow as individuals and a team.

What you'll do: 

  • Measure and monitor availability, latency, and efficiency to build an overall picture of system health.
  • Scale systems through automation, and evolve systems by advocating for changes that improve reliability and velocity.
  • Engage in and improve the development lifecycle of applications—from concept and design, through commit to production deployment, and beyond into operation and iteration.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
  • Practice sustainable incident response and blameless postmortems.

To be successful in this role, you'll need:

Imposter syndrome is real. If you are hesitant to apply because of not checking all the boxes, or you’ve had a less-traditional pathway into Site Reliability Engineering, we encourage you to still apply and mention why you’re interested in the role.

  • Passionate about solving challenging problems.
  • Experience in one or more of the following programming languages: Java, Go (golang), Python, C, C++, Perl, Ruby or shell scripting.
  • Expertise in use of relational databases.
  • Experience in solving, diagnosing and resolving incidents that involve application, OS, network, infrastructure, partners, people, and process.
  • Experience with algorithms, data structures, complexity analysis and software design.
  • Experience with Linux and/or networks (e.g., filesystems, system calls, signals, process states, TCP/IP, routing, AWS VPCs, Firewalls, AWS Security Groups, IP Block Management).

At Collective Health, we care about creating a culture of diversity, openness, and transparency, while engaging our intellectual curiosity, problem solving and software engineering skills. This is vital to maintaining an agile engineering culture while putting a robust user experience front and center. We bring together people with a wide variety of backgrounds and perspectives, while creating an environment where their passions can be supported, and mentored so they can learn and grow.

Founded in 2013, Collective Health has created an ecosystem of innovative partners across care and benefits delivery, as well as built a powerful and flexible infrastructure to better enable employees and their families to understand, navigate, and pay for healthcare. By reducing the administrative lift of delivering health benefits, providing an intuitive member experience, and improving health outcomes, the company guides employees toward healthier lives and companies toward healthier bottom lines. Collective Health is headquartered in San Mateo, CA with locations in Chicago, IL, and Lehi, UT. For more information, please visit

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Collective Health is committed to providing support to candidates who require reasonable accommodation during the interview process. If you need assistance, please contact [email protected].

Looking for a job?

Site Reliability Engineer (SRE) at Collective Health looks great, right? We have dozens of similar job posts on our site, interested? Leave your email and we'll send the best matches.