FUNNOW Group

【SRE】Site Reliability Engineer

Taiwan

Full-Time

TLDR

Maintains scalable, resilient cloud infrastructure with Kubernetes, CI/CD pipelines, and AI-driven automation to support millions of users booking experiences.

【Capsule】
At FunNow, we’re building joyful experiences, at the speed of now. As a Site Reliability Engineer, you’ll play a crucial role in ensuring our platform stays fast, resilient, and secure for millions of users booking spontaneous fun across Asia. But here’s the twist: we don’t just monitor uptime — we build with AI and automation. From Kubernetes tuning to auto-healing infrastructure, CI/CD pipelines to incident response, you'll be hands-on in evolving our DevOps culture. If you love scalable systems, believe in developer efficiency, and treat infrastructure as code, welcome aboard.

【Typical Accountability】

Design robust architectures to comprehensively improve system availability, scalability, and service quality
Ensure stable service operation, monitor core service status, and quickly troubleshoot issues
Conduct in-depth analysis of system performance bottlenecks and propose and implement improvement solutions
Maintain and optimize Kubernetes clusters (EKS/GKE), effectively handling resource pressure, node anomalies, and other situations
Maintain and improve CI/CD pipelines and automated deployment systems (GitHub Actions / ArgoCD) to significantly enhance engineering team development efficiency
Establish and continuously optimize system monitoring and alerting mechanisms (Prometheus / Grafana / Alertmanager)
Assist with incident response and problem investigation
Regularly participate in system inspections and audits, proactively proposing and implementing improvements
Assist in maintaining and implementing fundamental security settings (e.g., IAM, resource permissions, encrypted storage)
Actively share your experience to collectively enhance the team's engineering culture

【Essential Competencies】

Familiarity with container technologies such as Docker or Kubernetes, and practical experience with Kubernetes operations (deployment, scheduling, resource management)
Familiarity with AWS services (e.g., ECS, EKS, S3, CloudFront, IAM, VPC, etc.), and practical experience maintaining AWS or GCP (we primarily use AWS)
Familiarity with at least one CI/CD tool (e.g., GitHub Actions, GitLab CI)
Proficiency in MySQL daily management and performance analysis
Familiarity with service-related log analysis and monitoring tools (e.g., CloudWatch, ELK/EFK, Grafana), and practical experience with Prometheus/Grafana
Experience maintaining Elasticsearch clusters
Familiarity with Git and basic Git flow operations
High degree of self-management, proactive and responsible work attitude, meticulousness, and excellent communication and teamwork skills

【Desirable Competencies】

Exposure to or familiarity with the Golang ecosystem
Familiarity with Infra-as-Code tools such as CDK, Terraform
Experience with IPO advisory or ISO audit
Security awareness

【Who You Are】

You enjoy solving real-world problems, are proactive in investigation, and act quickly
You value stability and data accuracy, and possess a high sense of responsibility
You are passionate about learning new tools and enjoy sharing improvement methods
You maintain clear communication and good documentation habits in team collaboration

Apply for this job

FUNNOW Group

View company profile

Site Reliability Engineer