FUNNOW Group
FUNNOW Group

【SRE】Site Reliability Engineer

TLDR

Maintains scalable, resilient cloud infrastructure with Kubernetes, CI/CD pipelines, and AI-driven automation to support millions of users booking experiences.

【Capsule】
At FunNow, we’re building joyful experiences, at the speed of now. As a Site Reliability Engineer, you’ll play a crucial role in ensuring our platform stays fast, resilient, and secure for millions of users booking spontaneous fun across Asia. But here’s the twist: we don’t just monitor uptime — we build with AI and automation. From Kubernetes tuning to auto-healing infrastructure, CI/CD pipelines to incident response, you'll be hands-on in evolving our DevOps culture. If you love scalable systems, believe in developer efficiency, and treat infrastructure as code, welcome aboard.

【Typical Accountability】

  • Design robust architectures to comprehensively improve system availability, scalability, and service quality
  • Ensure stable service operation, monitor core service status, and quickly troubleshoot issues
  • Conduct in-depth analysis of system performance bottlenecks and propose and implement improvement solutions
  • Maintain and optimize Kubernetes clusters (EKS/GKE), effectively handling resource pressure, node anomalies, and other situations
  • Maintain and improve CI/CD pipelines and automated deployment systems (GitHub Actions / ArgoCD) to significantly enhance engineering team development efficiency
  • Establish and continuously optimize system monitoring and alerting mechanisms (Prometheus / Grafana / Alertmanager)
  • Assist with incident response and problem investigation
  • Regularly participate in system inspections and audits, proactively proposing and implementing improvements
  • Assist in maintaining and implementing fundamental security settings (e.g., IAM, resource permissions, encrypted storage)
  • Actively share your experience to collectively enhance the team's engineering culture

【Essential Competencies】

  • Familiarity with container technologies such as Docker or Kubernetes, and practical experience with Kubernetes operations (deployment, scheduling, resource management)
  • Familiarity with AWS services (e.g., ECS, EKS, S3, CloudFront, IAM, VPC, etc.), and practical experience maintaining AWS or GCP (we primarily use AWS)
  • Familiarity with at least one CI/CD tool (e.g., GitHub Actions, GitLab CI)
  • Proficiency in MySQL daily management and performance analysis
  • Familiarity with service-related log analysis and monitoring tools (e.g., CloudWatch, ELK/EFK, Grafana), and practical experience with Prometheus/Grafana
  • Experience maintaining Elasticsearch clusters
  • Familiarity with Git and basic Git flow operations
  • High degree of self-management, proactive and responsible work attitude, meticulousness, and excellent communication and teamwork skills

【Desirable Competencies】

  • Exposure to or familiarity with the Golang ecosystem
  • Familiarity with Infra-as-Code tools such as CDK, Terraform
  • Experience with IPO advisory or ISO audit
  • Security awareness

【Who You Are】

  • You enjoy solving real-world problems, are proactive in investigation, and act quickly
  • You value stability and data accuracy, and possess a high sense of responsibility
  • You are passionate about learning new tools and enjoy sharing improvement methods
  • You maintain clear communication and good documentation habits in team collaboration
Apply for this job