About the role:
The Site Reliability Engineers will work with an agile team of Engineers and Operations team building, managing, and scaling highly resilient, and performant cloud infrastructure in an automated and efficient manner.
What will you do?
- Continuously monitor infrastructure and application performance, build monitors and alerts.
- When a specific system is not behaving appropriately based on an alert you received, you will respond to an incident, analyze and fix it in short term, while designing a solution and implement so that the same incident does not happen in the future.
- Integrate, configure, deploy and manage centrally provided common cloud services (e.g. IAM, networking, logging, Operating systems, Containers)
- Ensure compliance with centrally defined Security Standards (Such as SOC2) or compliance needs (Such as HIPPA, FedRamp, PCI-DSS, etc.)
- Ensure compliance with Operational risk standards (E.g. Network, Firewall, OS, Logging, Monitoring, Availability, Resiliency)
- Build and support continuous integration (CI), continuous delivery (CD), and continuous testing activities
- Update support and operational documentation as required.
What do you have?
- Expert understanding of DevOps principles and Infrastructure as Code concepts and techniques
- Strong understanding of CI/CD and available tools
- Good experience in automation scripting (Python, Go, Bash etc)
- Hands-on experience in configuration management and Infrastructure as a code.
- Good understanding of Linux operating systems.
- You have worked with at least one cloud provider and understand basic principles and working of various services to depth.
Experience: 5+ Years