Stripe is hiring a

Head of Global Incident Response and Management

New York, United States
Remote

Who we are

About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

The Incident Ops team is a global 24/7 team responsible for driving incident response and management from detection to resolution. Stripe is proud of its five 9s reliability and this team is at the forefront of ensuring we keep it that way - working hand-in-hand with Reliability Eng and across the Tech Org. This team of incident response managers (IRM) is defined by our sense of ownership and how we drive incidents to resolution - marshaling the necessary cross-functional resources to respond to and resolve service outages, critical bugs, security attacks and anything that significantly impacts the users of our products. The team is user-first and ensures appropriate external communications from Stripe and senior management to keep our users informed of disruption to their experience of Stripe. The team is skilled in program management, communications, incident handling and technical adeptness as incidents can arise from anywhere and cut across products and orgs in Stripe.

What you’ll do

As the leader of Incident Ops, you’ll build a world class incident response and demand a high bar of reliability expected of Stripe. You’ll work hand-in-hand with leaders in Reliability Eng and Tech Orgs to transform how we detect, respond to incidents, communicate to users, improve related tooling and measure impact. You will lead and nurture a high-performing 24/7 global IRM team that has a strong sense of urgency, skilled program ownership of incidents and comms, with drive to rally engineers to their cause and technical expertise to understand impact. As a result, you’ll be seen as the protector of our users - in minimizing the impact of incidents on their business and ensuring that Stripe is always thinking of our users.

Responsibilities

  • Develop and own the strategy for Stripe’s incident response and management commensurate with Stripe’s reputation on reliability
  • Lead the global 24/7 team of regional managers and IRM with ability to be hands-on and support frontline on-call IRM with speed, cross-functional collaboration and escalation
  • Partner closely with executive leadership, engineering and ops teams to drive large programs and transform workflows and metrics related reliability and incident ops
  • Collaborate with Reliability Eng and Tech Org to improve incident tooling, reliability and user communications
  • Establish user-facing impact, metrics and data to help Stripe make the right decisions on how we respond to incidents, our approach to reliability and user communications (incl. RCA)

Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements

  • 10+ years of management experience, including multiple years of experience managing managers and global teams
  • Affinity for a fast paced work environment, crafting strategic and rapid fixes to high intensity problems with a keen eye for detail and a high bar for quality
  • Comfort navigating ambiguity, while identifying areas for process improvement and establishing best practices
  • Demonstrated ability to lead, influence other leaders and deliver complex strategic projects involving multiple stakeholders
  • Strong analytical skills, and the ability to use data to drive business decisions

Preferred qualifications

  • Proficiency in SQL, Splunk, or equivalent query languages
  • Experience using infrastructure and application monitoring tools such as Signalfx, Prometheus, Sentry and others
  • Experience at a high-growth technology company, especially within the payments or e-commerce space in particular for incident response
  • Experience with managing user-facing communications strategy during sensitive situations such as outages

 

Apply for this job

Please mention you found this job on Startup Jobs. It helps us get more startups to hire on our site. Thanks and good luck!

Get hired quickly
Be the first to apply. Receive an email whenever similar jobs are posted.
Apply for this job