Multimodal AI Systems Architect (AI Engineering)

Hong Kong

TLDR

Develop and optimize AI systems that integrate vision and audio models, focusing on voice-to-voice interactions and multimodal retrieval.

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.

Responsibilities:

Integrate vision encoders and audio-native models into core agent reasoning loops.
Optimize streaming latency for voice-to-voice AI interactions.
Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.

Qualifications:

Experience with Whisper, CLIP, and multimodal LLM integration.
Knowledge of streaming architectures and WebRTC.
Expertise in cross-modal alignment.

Hyphen Connect Limited

Hyphen Connect Limited is your go-to partner in the Web3 talent acquisition landscape, connecting top talent with opportunities across infrastructure, DeFi, NFTs, and gaming. We leverage data-driven insights and extensive resources to facilitate meaningful connections within the vibrant Web3 community, ensuring that local expertise meets global potential.

View company profile

Systems Architect

Report this job

Multimodal AI Systems Architect (AI Engineering)

TLDR

This job is no longer available