Senior/Staff Platform Engineer
TLDR
Assist in improving the reliability and scalability of a production platform while utilizing metrics such as SLIs, SLOs, and error budgets.
We are looking for a Senior/Staff Platform Engineer to help improve the reliability, performance, and scalability of our production platform.
This role focuses on operating reliable infrastructure, improving observability, driving incident response, and using data-driven reliability practices such as SLIs, SLOs, SLAs, error budgets, and DORA metrics. Database experience with MongoDB, Elasticsearch, or Redis is a must.
Help us run and secure our platform that allows our users to connect and create their part of the VRChat universe. If you’re interested in keeping the machinery behind the scenes humming and finely tuned, then this role could be right up your alley.
The role reports to the Head of Platform at VRChat. This Engineer will work closely with the IT and Engineering teams, as well as the heads of various functions to plan and deploy infrastructure.
Operate and improve production infrastructure with a focus on reliability, security, performance, and cost efficiency.
Define, measure, and improve reliability using SLIs, SLOs, SLAs, error budgets, and DORA metrics.
Build and improve monitoring, alerting, dashboards, logging, and incident response processes.
Participate in incident management, root cause analysis, postmortems, and follow-up remediation.
Automate infrastructure and operational workflows using modern IaC and scripting tools.
Work closely with engineering teams to improve service reliability, deployment quality, and operational readiness.
Turn ambiguous infrastructure, reliability, and operational problems into clear, scalable, and measurable solutions.
Engage with backend codebases through code reviews, pull requests, and occasional feature or tooling work to build shared context with product engineering teams.
8+ years of experience in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
Strong experience operating high-availability production systems.
Experience with cloud or hybrid cloud environments and tools such as Terraform or OpenTofu.
Strong knowledge of Linux, networking, automation, observability, and incident management.
Strong communication skills and ability to work with technical and non-technical stakeholders.
Operational knowledge of databases such as MongoDB, Elasticsearch, or Redis.
Experience with AWS, including core infrastructure services, cost optimization, and multi-account architecture.
Experience with Kubernetes, including networking, service discovery, ingress, and workload reliability.
Experience with Cilium or other Kubernetes networking/security solutions.
Experience supporting large-scale storage systems.
Experience with CDNs, caching, distributed systems, or real-time platforms.
Work from anywhere! VRChat is a 100% remote company
Health Benefits
401K for US & RRSP for Canadian Employees
Stock Options
Generous paid holiday schedule
Unlimited/Flexible vacation time
Paid parental leave benefits
Benefits
Flexible Work Hours
Unlimited/Flexible vacation time
Health Insurance
Health Benefits
Generous holiday schedule
Generous paid holiday schedule
Paid Parental Leave
Paid parental leave benefits
Remote-Friendly
Work from anywhere! VRChat is a 100% remote company
Stock Options
VRChat is a community-driven platform that empowers users to create and explore immersive virtual worlds, bringing their imaginations to life through user-generated content. Designed for anyone seeking social interaction and creative expression, VRChat offers a limitless collection of experiences, making it a key player in the evolution of the metaverse.
- Founded
- Founded 2014
- Employees
- 51-200 employees
- Industry
- Internet Software & Services
- Total raised
- $95M raised