Senior Production Engineer
TLDR
Supports reliable, scalable systems for cloud platform, leading incident lifecycle, collaborating across teams, with a focus on automation, reliability, and operational excellence.
Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands.
About the Role
As a Production Engineer, you will play a key role in supporting reliable, scalable systems for Veeam's Data Cloud platform. You will own production efficiency, automation and documentation projects, contribute to reliability and observability improvements, and own or participate in the full incident lifecycle — from on-call response, through mitigation, to leading post-incident reviews and driving improvements across support and development teams.
You will work as part of a team of skilled engineers, collaborating with support and development as a bridge and driving force for change. You will communicate with product managers and security professionals to ensure our services are production-ready, performant, and fault-tolerant, and that we rapidly incorporate user feedback into improvements
What You Will Do
Production
-
Own complex and escalated production issues from support, and drive long-term fixes in collaboration with engineering, including code, configuration, and architecture changes.
-
Proactively identify and address risks that are identified during the problem solving process
-
Lead production efficiency initiatives, develop and maintain processes, run-books and knowledge base integrity
Operational Excellence
-
Define, build and maintain production monitoring systems
-
Continuously improve alerting to minimize noise and ensure actionable, well-documented runbooks.
-
Define and maintain SLIs/SLOs for key services, and use error budgets to guide operational and product decisions.
-
Turn manual processes into automation
-
Own and drive post-mortem review process and actions arising from incident analysis.
Team Collaboration
-
Collaborate with support organization as an escalation point and feed back knowledge & improvement recommendations.
-
Collaborate with developers throughout the lifecycle of changes, from design through rollout and patch delivery, ensuring safe deployments and efficient incident mitigation.
-
Participate in design reviews to ensure services are operable with minimal manual intervention in production (automation, safe deployments, clear runbooks), and share learnings through documentation and feedback.
What We Are Looking For
Required
-
3–5 years of experience in software engineering, site reliability, production engineering, or senior technical support roles operating distributed systems.
-
Experience with log analysis and advanced troubleshooting
-
Basic programming experience (e.g., JS, Go, Typescript, Java, or C#).
-
Experience deploying and troubleshooting systems on a public cloud platforms (Azure preferred).
-
Familiarity with observability tooling (e.g., Elastic, Prometheus, Grafana, Open Telemetry).
-
Understanding of distributed systems, networking, automation and CI/CD.
Preferred
-
Prior on-call or incident response experience.
-
Background in automation, performance testing, or service scalability.
-
Familiarity with compliance or security best practices.
Why Join Veeam?
-
Make a high-impact contribution to the architecture and reliability of Veeam's first global SaaS product suite.
-
Help shape a modern SRE organization from the ground up, influencing best practices, tooling, and culture.
-
Collaborate with highly skilled teams across product, cloud engineering, security, and support.
-
Access professional development resources including internal mentorship, technical training platforms, and volunteer days.
-
Enjoy competitive compensation and benefits tailored to local markets in the US, Czechia, India, and Australia.
Join us and help define the future of cloud-native data protection.
What you'll get
- Unlimited paid time off, 12 paid holidays including 4 global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
- Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
- Medical, dental, and vision coverage starting on your first day
- Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
- 401(k) retirement plan with company matching contributions
- Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
- AirVet: 24/7 virtual veterinary care at no cost
- Legal services, identity protection, and supplemental health insurance options
- Tax-advantaged spending accounts for healthcare, dependent care, and commuting
- Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning
Compensation Transparency
Veeam is committed to pay transparency and equitable compensation. For this role, the compensation range below reflects the expected total target compensation (TTC), inclusive of base pay and a competitive performance-based bonus. For roles with a commission plan, the compensation range represents On Target Earnings (OTE), which includes base salary plus variable commission. When determining compensation, Veeam takes into consideration factors such as experience, education, skills, and geographic zone. Offers are typically made below the midpoint of the range.
In addition to compensation, Veeam provides a comprehensive benefits package, including health coverage, retirement plans, and unlimited time off.
Veeam Software is an equal opportunity employer and does not tolerate discrimination in any form on the basis of race, color, religion, gender, age, national origin, citizenship, disability, veteran status or any other classification protected by federal, state or local law. All your information will be kept confidential.
Personal data collected during the recruitment process will be processed in accordance with our Recruiting Privacy Notice, which explains how your information is collected, used, and handled in connection with hiring activities. By applying for this position, you consent to this processing.
By submitting your application, you confirm that the information provided, including any supporting documents, is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification may result in disqualification from consideration or, if discovered after employment begins, termination of employment.
Benefits
Health Insurance
Medical, dental, and vision coverage starting on your first day.
Learning Budget
Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning.
Paid Parental Leave
Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents.
Paid Time Off
Unlimited paid time off, 12 paid holidays including 4 global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares.
Wellness Stipend
Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program.
Veeam Software leads the market in data resilience, offering robust solutions for data backup, recovery, portability, security, and intelligence. Our platform supports a wide range of environments—including cloud, virtual, physical, SaaS, and Kubernetes—empowering organizations to maintain control over their data, ensuring it’s always protected and available. Trusted by over 550,000 customers globally, Veeam is dedicated to helping businesses not only recover from data loss but thrive beyond it.
- Founded
- Founded 2006
- Employees
- 500+ employees
- Industry
- Internet Software & Services
- Total raised
- $500M raised