Senior ML Engineer (Token Factory)
TLDR
Develop and optimize large-scale inference and training pipelines for foundation models, blending ML expertise with systems engineering to boost throughput and efficiency.
- Drive inference optimization efforts by identifying bottlenecks and implementing performance improvements across diverse LLM architectures, improving throughput and reducing latency and cost per token.
- Contribute to the design and evolution of inference engines, including techniques such as speculative decoding, KV-cache optimization, and support for dense and MoE models.
- Develop and productionize low-precision training and inference pipelines (e.g., FP8, MXFP4) to maximize efficiency on large GPU clusters.
- Profile and analyze GPU workloads using modern tooling to identify performance constraints and guide architectural improvements.
- Collaborate on scalable distributed training and inference systems, including sharding strategies, custom kernels, and hardware-aware optimizations.
- Contribute to engineering best practices including testing, CI/CD, and maintainable production-grade ML systems.
- Strong understanding of machine learning fundamentals, particularly transformer architectures and large language models.
- Hands-on experience profiling and optimizing GPU workloads using tools such as Nsight or PyTorch Profiler.
- Deep knowledge of GPU architecture, including memory hierarchy and compute vs. memory trade-offs.
- Familiarity with key LLM concepts such as attention mechanisms, RoPE, KV-cache, Flash Attention, and quantization techniques.
- Experience with large-scale deep learning training, including distributed systems, sharding strategies, and custom kernel development.
- Strong software engineering skills, with advanced proficiency in Python and modern ML frameworks.
- Solid understanding of software engineering practices such as version control, CI/CD pipelines, and unit testing.
- Strong communication skills with the ability to collaborate effectively in highly technical, cross-functional teams.
- Competitive compensation package
- Strong career development and continuous learning opportunities
- Flexible work environment with high autonomy and ownership
- Collaborative, innovation-driven engineering culture
- Opportunity to work on frontier AI systems at massive scale
- International, highly skilled, and diverse team environment
Requirements:
Benefits:
Benefits
Flexible Work Hours
Flexible work environment with high autonomy and ownership
Learning Budget
Strong career development and continuous learning opportunities
Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.
- Founded
- Founded 2020
- Employees
- 11-50 employees
- Industry
- Professional Services