Site Reliability Engineer (SRE)
TLDR
Help build highly reliable, observable, and secure systems that power mission-critical applications across cloud infrastructure, Kubernetes, observability, security, and automation.
Reliability & Observability
- Design and maintain monitoring, alerting, and dashboarding systems across cloud and edge environments.
- Build visibility into system health through metrics, logs, traces, and performance analytics.
- Define and manage SLIs, SLOs, and service reliability targets.
- Develop proactive monitoring and anomaly detection capabilities to identify issues before they impact users.
- Deploy, manage, and optimize containerized workloads running on Kubernetes.
- Maintain scalable cloud infrastructure across production environments.
- Improve system performance, availability, and operational efficiency.
- Support infrastructure provisioning through Infrastructure-as-Code practices.
- Implement secure access controls and audit mechanisms across infrastructure environments.
- Monitor for cybersecurity threats, unauthorized access attempts, and service disruptions.
- Develop alerting and response procedures for security-related incidents.
- Contribute to operational security best practices and governance initiatives.
- Automate repetitive operational tasks to reduce manual effort and improve reliability.
- Build tooling and scripts to streamline infrastructure operations.
- Support CI/CD workflows and deployment automation.
- Promote documentation, operational standards, and continuous improvement.
- Participate in on-call rotations and incident management.
- Lead troubleshooting efforts during production incidents.
- Conduct root-cause analysis and post-mortem reviews.
- Drive long-term improvements that enhance system resilience.
- Work closely with software, AI, machine learning, hardware, and product teams.
- Ensure new services are production-ready with appropriate monitoring, security, and reliability measures.
- Support the operational needs of both cloud-based and distributed edge computing environments.
Cloud Infrastructure & Platform Operations
Security & Access Management
Automation & Engineering Excellence
Incident Response & Reliability Engineering
Cross-Functional Collaboration
NAHC Limited builds innovative AI-driven solutions for the fintech and consumer technology sectors, streamlining processes and enhancing user experiences. Their products range from automated trading platforms for cryptocurrency traders to interactive motion-driven play systems for active entertainment, catering to diverse user needs while prioritizing safety and data protection. By leveraging cutting-edge technology, NAHC empowers users across different experience levels to engage with complex systems easily and effectively.
- Founded
- Founded 2019
- Employees
- 1-10 employees
- Industry
- Professional Services