As a Senior Site Reliability Engineer, you will be a part of the Tanium Cloud Engineering team. We have a focus on solving cloud operations problems and keeping our services online. We are looking for individuals who are just as passionate about troubleshooting issues with distributed systems as they are to automate, code and collaborate to solve problems. Here you’ll be responsible for identifying, troubleshooting and reporting platform problems to product engineers (or fixing the code yourself) in order to ensure that we are providing a stable and reliable service.
What you’ll do:
- You will report and solve problems within the Tanium infrastructure services and collaborate on issues with product engineers.
- You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes.
- You will monitor the Tanium Cloud platform and cloud infrastructure, responding to incidents, correcting and improving systems to prevent incidents and planning capacity.
- You will manage cloud provider infrastructure, system deployments and product releases.
- You will be involved in resolving Tanium Cloud customer support issues.
- You will demonstrate and promote best practices for teams using cloud platforms.
- You will participate in 24x365 on-call schedules.
We’re looking for someone with:
- Bachelor's or equivalent experience.
- CS Degree preferred.
- You have at least three years of experience creating public cloud-based services with AWS, GCP or Azure.
- You have at least five years of experience in a software development role.
- You have either a) helped lead the initial deployment of a new SAAS to a public cloud (AWS, GCP or Azure) OR b) been an integral member of an established and high-functioning SRE team for a reputable cloud-hosted SAAS.
- Proven track record of designing and building commercial software products in an Agile environment.
- You have used Ansible, Puppet, Chef or another config management suite, know where it's broken, and are open to trying new alternatives.
- Experience with modern software engineering development and automation tools like Git, Jenkins, Grunt, JIRA, etc.
- Experience managing cloud based infrastructure using infrastructure as code methodologies. Preferred tooling experience; CloudFormation or Terraform.
- Believes in the power of test driven development and the need for writing automated tests as part of development.
- Deliberate and demonstrates sound judgment for balancing between rapid development and long-term code maintainability and supportability.
- Skilled debugger who can put out fires under pressure when things go wrong in production environments.
- Relentless desire to automate and build software tools.
- Have a customer-centric work approach to drive positive experiences for Tanium customers
- Proven ability to work effectively in cross-functional teams.
- Ability to work efficiently and effectively in a remote work setting.
- Motivated self-starter
At Tanium, we empower the world’s largest organizations to manage and protect their mission-critical networks. There’s a reason why 6 of the top 10 retailers, 12 of the top 15 US banks, and 4 of the US Armed Forces use Tanium. We provide lightning-fast capabilities at their fingertips to see everything and do anything across their computer networks – with unparalleled scale.
We pride ourselves on being unstoppable in the pursuit of our mission. We are diverse problem solvers driven to do the right thing and win as a team.
Join our team at tanium.com/careers/.