SRE Engineer (Cloud Infrastructure) m/f/d
We are looking for an SRE Engineer to join one of our Cloud Infrastructure teams. Our Cloud Infrastructure department consists of 4 specialized SRE teams. Each team is dedicated to a specific area, supporting its own group of services and development units.
In this role, you will be part of a team responsible for an area that supports over 200 services running in GCP/GKE. As an SRE, you will balance between maintaining high system availability for your domain and engineering new solutions to enhance our global infrastructure.
Responsibilities
- Service Ownership: Act as the primary point of contact for developers within your domain, handling service-related queries in chats and managing SRE-specific tasks.
- Infrastructure Evolution: Maintain and improve current cloud infrastructure, ensuring high availability and scalability.
- Embedded DevOps: Integrate SRE/DevOps best practices into the development lifecycle, from architecture planning to deployment.
- Innovation & PoC: Research, develop, and implement new infrastructure tools; conduct Proof of Concept (PoC) projects to drive technical excellence.
- Automation: Partner with the Automation team to build efficient CI/CD pipelines and custom automated workflows.
- Reliability & Metrics: Participate in developing quality metrics (SLIs/SLOs) and maintain comprehensive project documentation.
- On-call Support: Join the on-call rotation to ensure 24/7 stability of our mission-critical services.
- 3+ years as a SRE/DevOps Engineer.
- Proven experience with containerization and orchestration tools, Kubernetes is the must (GKE is preferred).
- Knowledge of SRE/DevOps methodologies, such as CI/CD, IaC, gitOps, etc.
- Knowledge of at least one tool from the gitOps approach (FluxCD is preferred).
- Experience in Cloud based infrastructures (GCP is preferred).
- Research and troubleshooting skills.
- Experience in administering and tuning relational and columnar databases, specifically PostgreSQL, MySQL, and ClickHouse.
- Experience in deployment and maintenance of distributed high-load systems.
- Experience in development of fault-tolerance mechanisms - clustering, replication, scaling approaches, etc.
- Configuration of monitoring solutions (Grafana, VictoriaMetric (operator) are preferred).
- Good scripting skills (bash / python are preferred).
- Competitive salaries based on your professional experience
- Fast growing international company with stable employment
- Annual vacation of 25 working days and 1 additional day off on your birthday
- Day off on Municipality day and Carnaval
- Meal Allowance (8 euros per working day)
- Fidelidade Healthcare Plan (Comprehensive health coverage with the option to include your partner and children at discounted rates).
- Mental Wellbeing Program through the OpenUp platform
- AUTODOC corporate discount for purchasing car parts at special rates for personal use
- Exclusive retail discounts via our ‘Benefits at Work’ portal
- Learning & Development (over 650 courses on soft and hard skills on our e-learning platform)
- Free English and German language classes (after probation period)
- Flexible working hours and hybrid work
The position is available for candidates based in Portugal, Poland, the Czech Republic, Moldova or Kazakhstan.
Join us today and let’s create a success story together!