We all depend on healthcare throughout our lifetimes, for ourselves, and our families and friends, but it is notoriously difficult to navigate and understand. As an industry that comprises 20% of the US economy we think healthcare should work better for all of us. At Collective Health we believe it’s time for a new day in healthcare where as members we are informed and empowered to make the right care choices when the decisions are urgent and critical.
The Application Support team provides technical support to our customer and partner-facing teams. The team’s operational duties include addressing questions and problems, assisting in execution of backend changes, coordinating change & release processes, identifying issues for escalation to development teams, and coordinating multi-team efforts in order to remediate issues. Additionally, the team uses its unique perspective of the operational patterns of our products to continuously improve Collective Health’s services and ensure that we pay off technical debt and push for greater automation and reliability.
The Application Support Engineering Manager’s responsibilities are divided between providing day-to-day operational support, leading and executing projects, and leading and managing team members to enable their success.
While we are embracing a remote-flexible work week, employees are expected to be within commuting distance of an office. The frequency of in-office days will be determined on a team-by-team basis closer to the reopening of our offices.
What you'll do:
- Provide first and second tier support to internal stakeholders
- Apply runbooks to address common problems or requests.
- Troubleshoot, replicate, and triage errors.
- Review log messages and clearly characterize and document issues for escalation.
- Triage automatically-generated service alerts.
- Generate bug and enhancement request tickets for development teams.
- Work on projects in collaboration with other teams, including
- Onboarding new services to be supported by the team.
- Designing new services.
- Refining documentation.
- Speccing out and creating code and tools that automatically detect or remediate common concern.
- Assist engineering and SRE teams as part of a robust incident response process
- Filling communications, scribe, or tech lead role in any given incident.
- Contribute to Collective Health’s blameless postmortem process.
- Coordinate releases to production:
- Collaborate with engineers to ensure deployments have appropriate support, documentation, rollback plans, and testing.
- Actively manage contracts and relationships with engineering and customer teams to ensure that the product remains supportable by the Application Support team.
- Craft SLAs in supported products, and negotiate:
- Infrastructure SLAs.
- Engineering SLAs for escalations.
- Third party partner SLAs.
- Take ownership for multiple issues of differing urgency, ensuring that expectations are effectively communicated and work delegated where necessary.
- Inspire excellence in your team through hands-on people management.
- Work cross functionally with Customer Experience teams, Product, Engineering to identify and fix problems and scale our operations.
- Cultivate a culture of constant improvement and excellence.
- Identify and measure key performance indicators and generate KPI reporting.
- Set goals, standards and expectations for individual and team performance.
- Work independently and autonomously.
To be successful in this role, you'll need:
Imposter syndrome is real. If you are hesitant to apply because of not checking all the boxes, or you’ve had a less-traditional career or educational background, we encourage you to still apply and mention why you believe you’d be a fit for the role.
- 10+ years of work experience in Support, DevOps, Site Reliability Engineering, or Software Engineering.
- Ability to understand and troubleshoot problems in broad technical domains, including:
- Distributed Systems.
- Third-party Services and APIs.
- Troubleshooting using logs, diagnostic tools, etc.
- Ability to troubleshoot legacy software stepping through existing code and configuration, even in unfamiliar languages.
- Relational databases and SQL.
- 5+ years of experience being part of a production oncall rotation.
- 5+ years of experience in change and release processes.
- 5+ years experience in building and leading teams.
- Strong written and verbal communication skills.
- Resourceful, eager to solve new and challenging problems and drive measurable impact on business metrics.
- Demonstrated experience leading business and mission-critical problems to resolution.
- Excellent verbal and written communication skills.
- Fast learner.
- Ability to prioritize different tasks under pressure and ask for backup as needed.
- Experience managing teams remotely is a plus.
- High degree of ownership and attention to detail.
- Ability to drive projects that involve multiple internal and external stakeholders to completion.
- Proven technical domain leadership: decomposing tasks, setting priorities, triaging incoming bugs and requests.
- Knowledge of data structures, algorithms, distributed systems, and information retrieval.
- Methodical problem-solving approach, coupled with strong communication skills and an ability to own and drive projects to completion.
- Demonstrated technical mentorship and ability to increase the abilities of those on and outside the team.
- 15+ years of work experience in Support, DevOps, Site Reliability Engineering, or Software Engineering.
- Experience with HIPAA compliance a plus.
- Experience leading projects that demonstrate strong competencies in infrastructure, architecture, and software. Examples may include 3rd party API integrations, disaster recovery plans, cloud migrations, automation of manual or fragmented operations processes, containerization of services.
- Experience with at least one of the following or similar technologies, including: Kubernetes, Postgres, etcd, Elasticsearch, or related scheduling and persistence services. Apache Kafka, or related eventing systems.
- Familiarity with using Infrastructure as Code technologies (eg. Terraform, Ansible)
- Good understanding of private and public cloud design considerations and limitations in the areas of infrastructure, distributed systems, data storage, Linux-based operating systems, and security.
At Collective Health, we care about creating a culture of diversity, openness, and transparency, while engaging our intellectual curiosity, problem solving and software engineering skills. This is vital to maintaining an agile engineering culture while putting a robust user experience front and center. We bring together people with a wide variety of backgrounds and perspectives, while creating an environment where their passions can be supported, and mentored so they can learn and grow.
Founded in 2013, Collective Health has created an ecosystem of innovative partners across care and benefits delivery, as well as built a powerful and flexible infrastructure to better enable employees and their families to understand, navigate, and pay for healthcare. By reducing the administrative lift of delivering health benefits, providing an intuitive member experience, and improving health outcomes, the company guides employees toward healthier lives and companies toward healthier bottom lines. Collective Health is headquartered in San Mateo, CA with locations in Chicago, IL, and Lehi, UT. For more information, please visit collectivehealth.com.
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. Collective Health is committed to providing support to candidates who require reasonable accommodation during the interview process. If you need assistance, please contact [email protected].