Stripe is hiring a

Site Reliability Engineer

San Francisco, United States

Site Reliability Engineers work with our engineering teams to ensure we build services that work well at scale.

We're looking for Site Reliability Engineers (SREs) who can help us design, build, and maintain high-performance, scalable, reliable services. As an SRE at Stripe, you will work on one of our infrastructure teams to build and run the core components that power the rest of Stripe. You will also partner with our other engineering teams to help make their services more performant, scalable, observable, and reliable. We believe every engineering team at Stripe should be responsible for the software they build, and SREs play a critical part in providing the tools, practices, and expertise to make that happen.
You will:
  • Design, build, and maintain the core infrastructure used by all of Stripe's engineering teams
  • Develop and promote conventions on production readiness
  • Partner with engineering teams to ensure their products meet production standards
  • Participate in design reviews and production reviews for new features, products, or pieces of infrastructure
  • Debug production issues across services and levels of the stack
  • Participate in on-call rotations, along with every member of the engineering team
  • Improve common operational challenges with tooling
  • Plan for the growth of Stripe's infrastructure
You may be fit for this role if you:
  • Think about systems — their edge cases, failure modes, and life cycles
  • Understand the important of observability and have good intuitions about what to measure and how
  • Can identify toilsome manual tasks and automate them away
  • Know your way around a Unix shell
  • Think clearly under pressure and work quickly and correctly in a crisis
  • Can debug complex problems across the whole stack
  • Focus on the needs of our customers, both internal and external
  • Hold yourself and others to a high bar when working with production
Nice to have(s):
  • 3+ years experience designing, building, and operating large-scale production systems
  • Experience with Ruby, Go, Scala, or Puppet
  • Experience with AWS or other cloud providers
  • Experience with open-source databases (MySQL, Postgres, Mongo, or others)