We are seeking an experienced GenAI Engineer to design and implement on-premise LLM and vector database solutions. The ideal candidate will have hands-on expertise in deploying open-source large language models and building scalable Retrieval-Augmented Generation (RAG) pipelines in secure, enterprise environments.
Key Responsibilities
- Deploy and optimize open-source LLMs such as Llama 3 and Mistral / Mixtral in on-prem or private cloud environments
- Develop, test, and integrate LLM applications using Python (prompt engineering, inference workflows)
- Implement and optimize CPU-based inference, including model quantization and performance tuning
- Design and build scalable RAG pipelines using vector databases
- Manage embeddings, indexing, and metadata filtering strategies
Required Skills & Experience
- Strong Python programming skills for AI/ML application development
- Hands-on experience with vector databases such as Qdrant, Chroma, Milvus, or pgvector
- Proven experience building and deploying Retrieval-Augmented Generation (RAG) solutions
- Solid understanding of embeddings generation and semantic search techniques
- Experience working in secure, air-gapped, or enterprise-controlled environments
- Knowledge of access control, audit logging, and data governance
Nice to Have
- Experience with LangChain or LlamaIndex
- Familiarity with Docker and Kubernetes for containerized deployments
- Exposure to high-performance languages like Rust, Go, or C++
- Experience with inference frameworks such as vLLM, llama.cpp, or Hugging Face Transformers
- Background in regulated or enterprise environments
Key Deliverables
- Define reference architecture for on-prem LLM + vector DB solutions
- Build a working prototype (LLM + vector DB + RAG pipeline)
- Provide clear documentation and knowledge transfer to internal teamsa
