Lead Data Engineer with AI experience
TLDR
Hands-on role building scalable AI-enabled data pipelines, RAG and retrieval systems, and semantic layers powering enterprise AI applications.
- Data Pipeline Engineering: Build, optimize, and maintain robust batch and streaming data pipelines using modern cloud-native tools such as Snowflake, PySpark, Delta Lake, and Kafka, ensuring reliability, scalability, and performance.
- RAG & Retrieval Infrastructure: Design and implement end-to-end retrieval systems including embedding pipelines, vector databases, hybrid search, chunking strategies, and ranking mechanisms to optimize AI context relevance.
- Semantic & Knowledge Layer Development: Develop ontologies, entity mappings, and knowledge graphs while maintaining semantic contracts, metadata systems, and lineage tracking for AI and ML use cases.
- ML/LLMOps Enablement: Support ML and LLM lifecycle workflows including dataset curation, feature engineering, model evaluation, experiment tracking, and production monitoring.
- Agentic Data Systems: Build APIs, context stores, and tool interfaces that enable autonomous agents, including observability for reasoning traces, tool calls, and contextual outputs.
- Governance & Data Quality: Implement robust data governance frameworks including RBAC, PII handling, schema validation, data quality monitoring, and compliance-ready audit logging systems.
- 7+ years of experience in data engineering with strong exposure to cloud-based data platforms.
- 2+ years of experience building production AI/ML or LLM-related data infrastructure at scale.
- Strong expertise in Python, SQL, PySpark, Snowflake, Delta Lake, Kafka, and Spark Structured Streaming.
- Hands-on experience with vector databases, embedding pipelines, and retrieval systems in production RAG environments.
- Solid understanding of MLOps practices including MLflow, CI/CD for ML systems, and automated evaluation frameworks.
- Strong knowledge of data governance, security, compliance, and data quality frameworks.
- Experience working with cloud ecosystems such as AWS or Azure and containerized environments (Docker, Kubernetes).
- Familiarity with AI/LLM tooling such as LangChain, LlamaIndex, OpenAI/Claude/Bedrock APIs, and FastAPI is a plus.
- Strong problem-solving mindset with the ability to design scalable systems and operate in fast-moving AI environments.
- Competitive compensation package aligned with experience and market standards
- Remote-friendly or hybrid work flexibility depending on team structure
- Opportunity to work on cutting-edge AI, LLM, and agentic systems
- Exposure to global engineering teams and enterprise-scale AI transformation projects
- Health, insurance, and wellness benefits (as per policy and location)
- Learning and development support for advanced AI and data engineering skills
- Access to modern cloud-native and AI-first technology stacks
- Collaborative, engineering-driven culture focused on innovation and impact.
Requirements
This role requires a highly experienced data engineering professional with strong cloud, distributed systems, and AI infrastructure expertise. The ideal candidate combines deep technical execution with architectural thinking and hands-on experience building production-grade AI-enabled data systems.
Benefits
Benefits
Health Insurance
Health, insurance, and wellness benefits (as per policy and location)
Learning Budget
Learning and development support for advanced AI and data engineering skills
Remote-Friendly
Remote-friendly or hybrid work flexibility depending on team structure
Jobgether runs the largest remote job platform, effectively linking job seekers with over 200,000 flexible and remote opportunities that match their unique skills and preferences. Our focus is on enhancing the hiring process, ensuring efficiency while prioritizing the candidate experience, particularly in the growing health and wellness sector.
- Founded
- Founded 2020
- Employees
- 11-50 employees
- Industry
- Professional Services