Okta is hiring a

Principal Site Reliability Engineer, OpsSec

Bellevue, United States

At Okta our motto is "Always On", and nowhere do we embrace that more than in Technical Operations. We strive to build the most reliable and performant systems on the planet through the skillful use of automation. If you like to be challenged and have a passion for solving problems at scale with automation, testing, and tuning then we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it,” and who can rapidly self-educate on new concepts and tools.

You will work on:

  • Designing, building, running and monitoring Okta's production infrastructure
  • Responding to production incidents and determining how we can prevent them in the future
  • Triaging and troubleshooting complex production issues to ensure reliability and performance
  • Identifying and automating manual processes
  • Continuously evolving our monitoring tools and platform
  • Promoting and applying best practices for building scalable and reliable services across engineering
  • Developing and maintaining technical documentation, runbooks, and procedures
  • Supporting a 24x7 online environment as part of an on-call rotation
  • Be a technical lead for a team that designs and builds Okta's production infrastructure with a focus on security at scale in the cloud

You are an ideal candidate if you:

  • Are always willing to go the extra mile: see a problem, fix the problem.
  • Are passionate about encouraging the development of engineering peers and leading by example.
  • Have experience automating, securing and running large scale production Java/Tomcat and containerized services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers.
  • Have deep knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts and IP protocols.
  • Have a deep understanding and familiarity with configuration management tools like Chef, Terraform and Ansible.
  • Have expert-level abilities in operational tooling languages such as Ruby, Python, Go and shell, and use of source control.
  • Experience with industry-standard security tools like Nessus and OSQuery.
  • Understand MySQL including replication and clustering strategies and are familiar with data stores such as DynamoDB, Redis, Cassandra and Elasticsearch.

Bonus points for:

  • OpsSec production experience
  • Experience with Federal and DoD compliance requirements - FedRAMP, IL

Minimum Required Knowledge, Skills, Abilities, and Qualities:

  • 5+ years of experience architecting and running complex AWS or other cloud networking infrastructure resources
  • 5+ years of experience with Ansible, Chef and Terraform
  • Strong leadership skills
  • Strong linux understanding and experience.
  • Strong security background and knowledge.
  • BS In computer science (or equivalent experience).

Education and Training

  • BS. Computer Science (plus) or relevant experience

Okta is rethinking the traditional work environment, providing our employees with the flexibility to be their most creative and successful versions of themselves, no matter where the employees located.  We enable a flexible approach to work, meaning you can work from the office or home, regardless of where you live.  Okta invests in the best technologies, and provides flexible benefits and collaborative work environments/experiences, empowering employees to work productively in a setting that best and uniquely suits their needs.  Find your place at Okta https://www.okta.com/company/careers/.

Okta is an equal opportunity employer.


Looking for a job?

Principal Site Reliability Engineer, OpsSec at Okta looks great, right? We have dozens of similar job posts on our site, interested? Leave your email and we'll send the best matches.