Aghanim
Aghanim

SRE / High-Middle DevOps

Aghanim is an integrated commerce, liveops automation, community engagement, and payments platform for video games.

 

Mobile games have traditionally depended on app stores for distribution, payments, and player relationships. We believe there is a better way. Aghanim helps game studios build direct relationships with players, sell directly, and build their future on their own terms. Today, more than 100 games worldwide are already building this future with Aghanim.

 

Our team brings together people across Los Angeles, New York, Seoul, Beijing, London, Lisbon, Belgrade and other locations around the globe, with deep expertise in gaming, fintech and technology. We move quickly, keep communication direct, and focus on getting things done. We believe the best people thrive when they have autonomy, ownership, and a stake in the company's success.

We’re looking for a Middle/High-Middle DevOps / SRE Engineer to help run and improve our production platform in GCP + GKE, fronted by Cloudflare, with observability in Datadog and CI/CD in GitHub Actions.

You’ll work closely with Senior/Principal engineers, implementing reliability improvements, expanding monitoring coverage, and reducing operational toil - especially important in a highload system with sudden traffic spikes.

Key Responsibilities

Platform Operations

  • Operate and improve production systems on GCP, GKE, and related managed services

  • Contribute to platform reliability, scalability, and operational improvements alongside Senior and Principal engineers

Infrastructure & Delivery

  • Implement infrastructure changes using Terraform and maintain Kubernetes configurations, Helm charts, and deployment tooling

  • Improve CI/CD automation and deployment reliability

Observability & Incident Management

  • Build and maintain monitoring, alerting, and observability in Datadog

  • Participate in incident response, troubleshooting, and postmortem activities

Security & Cost Optimization

  • Support security tooling, vulnerability remediation, and secure platform practices

  • Identify and implement cost optimization opportunities without compromising reliability

Required Qualifications

  • Hands-on production experience with Kubernetes (ideally GKE) and basic cluster operations.

  • Working experience with Terraform and Helm in PR-based workflows.

  • Familiarity with GCP services used in SaaS operations (e.g., Cloud SQL, BigQuery, BigTable, Pub/Sub, Cloud Run, Memorystore).

  • Monitoring/alerting and troubleshooting skills (preferably Datadog).

  • Strong scripting/automation mindset to reduce manual work and prevent repetitive incidents.

  • Reliability awareness: understanding how changes affect availability/latency and how to operate under SLA constraints.

Preferred Qualifications

  • Cloudflare basics (WAF/DNS, edge concepts; Workers/CDN is a plus).

  • Experience writing/maintaining runbooks and participating in postmortems.

  • Exposure to SOC 2 / PCI-DSS requirements or willingness to learn.

  • Experience in high-load consumer products or game dev.

Why Join Us

  • World-class team – work alongside experienced professionals from around the globe who have built products used by millions of players

  • High growth, high impact – be part of a fast-growing company where ideas turn into products and reach customers in days, not months

  • Autonomy and ownership – we trust people to make decisions, take initiative, and drive results

  • Modern tools and technology – use AI, automation, and modern tools as part of your everyday work

  • Equity – participate in the company's growth and long-term success

Apply for this job