We are looking for a high-energy, experienced, customer-focused Site Reliability Engineer (SRE) to work in our Managed Services team. This individual will help support the team that manages our clients’ hosted cloud applications and accounts. This position will report to the head of Managed Services and work with a team of Engineers and Atlassian Administrators to support a variety of customer requests, automate scalable and supportable client solutions, assist the team with solving new problems and technical roadblocks, drive operational efficiencies & processes, follow proper governance and procedures, and expand our Hosting and Support practice.
Responsibilities:
- Contribute to the success of our Managed Hosting and Atlassian Support teams by:
- Maintaining application availability, stability, and ability to scale.
- Implementing changes to applications and infrastructure as needed.
- Monitoring architecture through automation and custom solutions.
- Working directly with clients to facilitate requests, proactively address issues before they become critical, speak to best practices, and make recommendations.
- Working towards a set of team deliverables that ensure our clients are well supported in order to maintain customer satisfaction.
- Working within our practice to standardize best practices, create highly scalable processes, and document detailed specifications.
- Collaborating when needed with solution architects and other experts on design and implementation across various application and system architectures.
Qualifications & Skills:
- At least 3 years of experience with Linux OS (Ubuntu/CentOS) required
- Well-versed in the Atlassian stack (Jira, Confluence, Bitbucket, Bamboo), including functional, application, and infrastructure administration
- Ability to provide both server-side and application-side support
- At least 2 years of experience in AWS system administration and/or application support
- Solid technical understanding of the latest AWS services and hands-on experience architecting solutions using EC2, RDS, EBS, S3, etc.
- Ability to analyze and review current environments to determine potential areas of improvement
- Knowledge of infrastructure load and performance metrics, and experience with tools like ELK, Splunk, New Relic, and CloudWatch
- Familiar with IT change management processes, with hands-on proficiency in implementing DevOps (CI/CD) tools and solutions
- Familiar with information security requirements and regulatory compliance such as SOC, PCI, HIPAA, etc.
- Familiar with automation tools like Terraform, Ansible, and/or CloudFormation
- Comfortable with diverse technical problem sets across the entire technology stack
- Familiar with database technologies such as MSSQL, MySQL, PostgreSQL, and/or NoSQL
- History of support/queue work with direct customer interactions
- Ability to help provide 24/7/365 support coverage, including on-call weekend rotations
- Thrive in a team atmosphere with daily stand-ups and a customer-focused work environment
- Maintain an excellent, professional, friendly, timely, and empathetic approach to customer communications
- Passion for technology and solving challenging problems
- Strong work ethic, good time management, and the ability to work with diverse teams
- Strong attention to quality, detail, and a focus on customer satisfaction