Site Reliability Engineer
Site Reliability Engineers at edX help develop and maintain the infrastructure in AWS for all services and systems required to run the edX website. We're seeking engineers with programming skills and a systems administration background. The team primarily focuses on the provisioning, configuration, deployment, and monitoring of services at edX. If you have a passion for automation, configuration as code, metric-based decision making and continuous improvement then we want to hear from you.
The edX production environment is hosted in AWS on servers running Ubuntu Linux. We are actively working on extending our use of containerization, specifically via Docker.
Our team of 4 participates in on-call and emergency support and there will be occasional out of normal hours work required.
- Help build tools to automate, deploy and monitor high-availability systems.
- Work with developers and staff to maintain and improve the infrastructure of edX.
- Rapidly diagnose and resolve faults with services, and communicate to users as appropriate.
- Help the open-edX community who are standing up the edX platform and make contributions to our open-source code.
- Continuously improve and learn, take on varying projects and consistently evaluate new technology.
- 3-5 years professional experience with at least 2 year spent being responsible for production systems
- Must be adept in programming languages. We prefer Python experience, but can help you ramp up if you need it. Extra credit for strong polyglot skills.
- Must have a working knowledge of Linux both as an end-user and as an administrator.
- Must have experience running web applications in a production environment.
- Must have strong business communication skills as the position requires interfacing with technical and non-technical business stakeholders across edX.
Nice to Have:
- Ideally possesses experience with some of the following technologies: nginx, mysql, mongodb, django environments, splunk, git, jenkins.
- Familiarity with a configuration management system such as Ansible, Puppet, Chef.
- Understanding of general CS concepts including performance and data structures.
Applicants must be able to work out of our Cambridge, MA office. Sorry, Visa sponsorship is not available.