Jobgether
Jobgether

Senior Machine Learning Engineer (Inference Platform)

TLDR

Own and optimize production ML inference infrastructure for a high-scale, AI-driven conversational shopping experience, delivering scalable, cost-efficient serving with strong cross-team ownership.

Accountabilities:

You will be responsible for building and scaling the core infrastructure that serves machine learning models in production, ensuring reliability, efficiency, and observability across all inference workflows.

  • Own and evolve a multi-engine inference platform supporting LLMs, embedding models, and other ML workloads in production environments
  • Build and maintain production-grade ML serving pipelines, from model packaging and deployment to monitoring and lifecycle management
  • Define and enforce SLAs for latency, throughput, availability, GPU utilization, and token-level performance metrics such as TTFT and ITL
  • Design and implement model versioning, rollout, rollback, and reproducibility strategies for safe and scalable deployments
  • Develop observability, monitoring, alerting, and debugging tools for production inference systems
  • Optimize inference performance through batching strategies, GPU utilization, quantization, and hardware-aware system design
  • Ensure secure, scalable, and cost-efficient ML serving infrastructure across cloud environments
  • Partner cross-functionally with ML, data, product, and DevOps teams to translate research into production-ready systems
  • Requirements:

    The ideal candidate brings deep experience in production ML systems, strong software engineering fundamentals, and hands-on expertise with large-scale inference infrastructure.

    • 5–8+ years of experience in ML engineering, software engineering, or platform/infrastructure roles with ownership of production ML systems
    • Hands-on experience operating LLM serving frameworks such as vLLM, TGI, TensorRT-LLM, or SGLang in real production environments
    • Strong Python skills and solid understanding of distributed systems and backend engineering principles
    • Experience with cloud platforms (AWS, GCP, or Azure) and ML lifecycle tooling, including model registries and deployment systems
    • Deep understanding of inference optimization concepts such as KV caching, batching strategies, GPU memory behavior, and latency bottlenecks
    • Experience supporting heterogeneous ML workloads including LLMs, embeddings, and extraction models
    • Strong ability to balance latency, throughput, reliability, and infrastructure cost trade-offs
    • Experience working in fast-paced, high-growth environments with evolving technical requirements
    • Excellent problem-solving, communication, and collaboration skills across technical and non-technical teams
    • Benefits:

      • Competitive compensation aligned with experience and impact
      • Remote-first flexibility within the United States
      • Opportunity to shape core AI infrastructure powering a large-scale consumer-facing product
      • High ownership role with influence over architecture and technical direction
      • Collaborative, cross-functional engineering environment
      • Exposure to cutting-edge LLM and AI inference technologies
      • Fast-paced startup culture with strong autonomy and technical depth
How Jobgether works:
We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.
We appreciate your interest and wish you the best!
 
Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.
 
 
#LI-CL1

Benefits

Startup culture with autonomy

Fast-paced startup culture with strong autonomy and technical depth

Remote-Friendly

Remote-first flexibility within the United States

Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.

Founded
Founded 2020
Employees
11-50 employees
Industry
Professional Services
View company profile
Report this job
Apply for this job