AI Research Scientist (Multimodal post-training)
TLDR
Drive multimodal model research and post-training to power autonomous clinical care at scale, bridging science and production.
AI fluency is a core expectation at Sword Health. Every candidate is assessed against our three-level framework — be ready to share real examples of how AI is already part of how you work.
-
Explorer (Level 1) — Uses AI daily to boost personal productivity
-
Builder (Level 2) — Creates workflows and tools that elevate the whole team
-
Integrator (Level 3) — Embeds AI into products and processes at scale
Every hire must demonstrate at least Level 1. The expected level will vary depending on the seniority of the role.
Design and execute research on multimodal model training — with a primary focus on vision-language models and, increasingly, speech-language models — including fine-tuning, alignment, and post-training methods (SFT, RLHF) tailored for clinical domains;
Develop and improve models that enable our AI agents to perceive and understand patients through video, language, and speech, building towards unified multimodal patient understanding;
Contribute to the full model development cycle: multimodal dataset curation and annotation, architecture design, cross-modal training strategies, evaluation, and iteration;
Collaborate across AI Engineering, Product, and Clinical teams to translate multimodal research breakthroughs into production systems that deliver patient care;
Work towards long-term ambitious research goals — such as real-time multimodal patient state estimation, clinical memory, and safety validation — while identifying and delivering immediate milestones;
Advance the field by publishing in top-tier AI venues and clinical journals, contributing to Sword's growing body of peer-reviewed research.
A PhD in Computer Science, Machine Learning, Natural Language Processing, Computer Vision, or a closely related AI field;
Hands-on experience fine-tuning large language models or multimodal large models (e.g., vision-language models, speech-language models), including pre-training, SFT, RLHF, or related post-training techniques;
Experience training or fine-tuning models that operate across multiple modalities (e.g., video + language, image + text, speech + text);
A strong publication track record in peer-reviewed AI conferences or journals;
Proficiency in Python and deep experience with modern ML frameworks (e.g., PyTorch, JAX);
Demonstrated ability to design rigorous experiments and interpret their results.
First-author publications in top-tier AI conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ACL, EMNLP, COLM, Interspeech);
Deep expertise in one or more of: vision-language models, video understanding, speech-language models, multimodal representation learning, or cross-modal fusion architectures;
Experience with video-based or image-based model training in applied settings (e.g., human pose estimation, action recognition, medical imaging, or biological signal processing);
Experience building or contributing to LLM-based agents, including prompt engineering, memory orchestration, or agentic workflows;
A track record of taking research ideas from conception to working systems, including developing and debugging complex multimodal ML pipelines;
Industry experience during or after the PhD (e.g., research internships at leading AI labs);
Comfort with ambiguity and a track record of delivering results in fast-moving, high-uncertainty environments where research and product development happen in parallel;
Strong communication skills and a history of effective cross-functional collaboration;
A broader record of research excellence demonstrated through grants, fellowships, patents, or impactful open-source contributions.
These compensation bands are just the starting point. Once someone joins and proves they’re outlier talent, we adjust quickly to ensure their compensation aligns with their impact.
Our job titles may span more than one career level. Actual pay is determined by skills, qualifications, experience, location, market demand, and other factors. Compensation details listed in this posting reflect the base salary and any potential variable, bonus or sales incentives, and the Company’s estimation of the value of private company stock options, if applicable. The pay range is subject to change, future value of company stock options is not guaranteed, and compensation may be modified in the future. In addition to our total compensation, Sword offers a number of benefits as listed below.
Benefits
Equity Compensation
Equity shares
Flexible Work Hours
Flexible working hours
Free Meals & Snacks
Snacks and beverages
Health Insurance
Health, dental and vision insurance
Paid Time Off
Discretionary vacation
Remote-Friendly
Work from home
Sword Health is transforming healthcare with its AI Care platform, making healthcare more accessible while drastically lowering costs for payers and organizations. Initially focused on pain management, Sword has expanded into women's health, movement health, and mental health, serving over 700,000 members across three continents and helping enterprise clients save over $1 billion in unnecessary healthcare expenses.
- Founded
- Founded 2015
- Employees
- 201-500 employees
- Total raised
- $130M raised