Zynga is hiring a

Site Reliability Engineer

Bengaluru, India
Job Description

Monitoring & Incident Management:

-   Improve the studio’s reliability through monitoring, rapid response, communication and coordination.

-   Develop and manage the deployment architecture for the application, develop the monitoring architecture and implement monitoring agents, dashboards, escalations and alerts.

-   Routinely identifies operational problems by observing and studying system architect, functionality and performance results. Troubleshooting procedures with the overall studio architect and investigating surfaced issues, and handling incidents.

-   Identifies operational priorities by assessing operational objectives; determining project objectives, such as, efficiency, cost savings, energy conservation, operator convenience, safety, environmental quality; estimating relevance, time, and costs.


Development & Data Analyzing:

-   Develops operational solutions by defining, studying, estimating, and screening alternative solutions; calculating economics; determining impact on total system.

-   Create new tools to facilitate automated monitoring of the studio’s operational environment.

-   Anticipates operational problems by studying operating targets, modes of operation, unit limitations; monitoring unit performance.

-   Improves operational quality results by studying, evaluating, and recommending process re architecting, implementing changes, contributing information and opinion to unit design and modification teams.

-   Provides operational management information by collecting, analyzing, and summarizing operating and engineering data and trends.

-   Updates job knowledge by participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.

-   Accomplishes engineering and organization mission by completing related results as needed.


Operations Engineer Skills and Qualifications:

Mastery of Systems Linux and Networking administration

  • Strong systems engineering and troubleshooting skills
  • Shell scripting (BASH/PHP)
  • Strong TCP/IP understanding and ability to produce detailed documentation
  • Write up new and maintain technical documentation
  • Ability to administer networking firewalls, routers, and switches
  • S3 Maintenance, Apache maintenance, Load Balancer Management
  • Puppet Management


Cloud Management

  • AWS Expertise (VPC, RDS, Route53 Integration (DNS))


Database fundamentals

  • Administer and maintain MySQL and other opensource databases
  • Write and perform basic queries  to evaluate database stability, integrity and performance
  • Large/Big Data Management
  • Administer and maintain Aurora infrastructure


Monitoring Systems

  • System Level (Nagios, Munin, Check_MK)
  • Writing checks & scripts
  • Log/Application Level (Splunk, Elastic Searching, Apache)
  • Ability to diagnose infrastructure as a whole!


Extra Credit to have:

  • Java
  • C++
  • Elasticache
  • Vertica