Drive.ai is hiring a

DevOps and Reliability Engineer

Mountain View, United States

Drive.ai is shaping the self-driving car revolution.  Our goal is to improve people's lives by transforming mobility in a cost effective way that can impact everyone, not just those buying automobiles at the very high end of the market.  We currently have autonomous vehicles on city streets.  

As a DevOps and Reliability Engineer, you will be part of the team working towards our vision of autonomous vehicles. Your role will be to help design and implement our build and release infrastructure for large scale deep learning systems. You will be responsible for designing scalable solutions and tooling that can grow with the expansion of the company. 

 

Responsibilities:

Scaling infrastructure: Recommend and implement solutions for scaling computational, storage, and networking units. Identify and mitigate bottlenecks and points of failure.

 

Cluster administration:

Coordinate with engineering and IT to develop policies for cluster utilization

Develop and utilize tools to monitor and diagnose cluster performance

Actively probe and monitor for cluster security

 

Qualifications:

Highly proficient in infrastructure design

Highly proficient in linux resource management and administration tooling

Proficient in Python or other scripting language

Comfortable in C/C++ and SQL

Familiarity with Git and repository management

 

Preferred:

Experience working with GPU servers in a high performance computing environment

Other jobs at Drive.ai