AWS MLOps Engineer
TLDR
Build and scale end-to-end ML pipelines on AWS, operationalize models from training to monitoring, and collaborate with data scientists and engineers in a cloud-native, fast-paced environment.
- Design, build, and maintain end-to-end ML pipelines covering training, evaluation, deployment, versioning, and monitoring of models in production.
- Develop and optimize MLOps workflows using tools such as MLflow, Spark ML, and Python-based ML frameworks.
- Implement CI/CD pipelines for machine learning systems using GitHub Actions and other automation tools.
- Deploy and manage scalable ML services on AWS using ECS, ECR, API Gateway, S3, RDS, and Application Load Balancer.
- Build and maintain backend services and APIs using FastAPI and REST-based architectures.
- Work with SQL and PostgreSQL databases, including schema design and ORM-based data modeling (SQLAlchemy).
- Monitor model performance in production and implement alerting, logging, and optimization strategies.
- Collaborate with cross-functional teams to ensure reliability, scalability, and security of ML systems.
- 2+ years of experience in MLOps or Machine Learning Engineering roles focused on production ML systems.
- Strong hands-on experience with MLflow, Spark ML, Python, and common ML libraries.
- Proven experience in model lifecycle management including training, versioning, deployment, and monitoring.
- Experience building CI/CD pipelines and using GitHub Actions for automated deployments.
- Solid AWS experience with services such as ECS, ECR, API Gateway, S3, RDS, and Application Load Balancer.
- 1+ year of backend development experience using FastAPI and REST APIs.
- Strong knowledge of SQL, relational databases, PostgreSQL, and ORM frameworks such as SQLAlchemy.
- Familiarity with production-grade system design for scalable ML applications.
- Exposure to Databricks (Unity Catalog, Jobs, Workflows) and/or Agentic AI is a plus.
- Strong problem-solving, communication, and collaboration skills in distributed engineering environments.
- Bachelor’s degree in Computer Science, Engineering, Data Science, or related field (or equivalent experience).
- Competitive base salary ranging from $92,250 to $120,000, plus performance-based incentives.
- Comprehensive medical, dental, and vision insurance coverage.
- 401(k) retirement savings plan with employer participation.
- Paid time off, holidays, and dedicated paid learning days.
- Flexible remote work structure within the United States.
- Employee assistance programs supporting well-being and work-life balance.
- Career development opportunities in advanced AI and cloud-native engineering environments.
- Exposure to large-scale, production AI/ML systems and modern cloud technologies.
Requirements:
Benefits:
Benefits
Health Insurance
Comprehensive medical, dental, and vision insurance coverage.
Learning Budget
Exposure to large-scale, production AI/ML systems and modern cloud technologies.
401(k) retirement plan
401(k) retirement savings plan with employer participation.
Paid Time Off
Paid time off, holidays, and dedicated paid learning days.
Remote-Friendly
Flexible remote work structure within the United States.
Wellness Stipend
Employee assistance programs supporting well-being and work-life balance.
Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.
- Founded
- Founded 2020
- Employees
- 11-50 employees
- Industry
- Professional Services