GenAI Engineer – Final Round (In-Person)

This job has been expired

We are seeking an experienced GenAI Engineer to design and implement on-premise LLM and vector database solutions. The ideal candidate will have hands-on expertise in deploying open-source large language models and building scalable Retrieval-Augmented Generation (RAG) pipelines in secure, enterprise environments.

Key Responsibilities

  • Deploy and optimize open-source LLMs such as Llama 3 and Mistral / Mixtral in on-prem or private cloud environments
  • Develop, test, and integrate LLM applications using Python (prompt engineering, inference workflows)
  • Implement and optimize CPU-based inference, including model quantization and performance tuning
  • Design and build scalable RAG pipelines using vector databases
  • Manage embeddings, indexing, and metadata filtering strategies

Required Skills & Experience

  • Strong Python programming skills for AI/ML application development
  • Hands-on experience with vector databases such as Qdrant, Chroma, Milvus, or pgvector
  • Proven experience building and deploying Retrieval-Augmented Generation (RAG) solutions
  • Solid understanding of embeddings generation and semantic search techniques
  • Experience working in secure, air-gapped, or enterprise-controlled environments
  • Knowledge of access control, audit logging, and data governance

Nice to Have

  • Experience with LangChain or LlamaIndex
  • Familiarity with Docker and Kubernetes for containerized deployments
  • Exposure to high-performance languages like Rust, Go, or C++
  • Experience with inference frameworks such as vLLM, llama.cpp, or Hugging Face Transformers
  • Background in regulated or enterprise environments

Key Deliverables

  • Define reference architecture for on-prem LLM + vector DB solutions
  • Build a working prototype (LLM + vector DB + RAG pipeline)
  • Provide clear documentation and knowledge transfer to internal teamsa

Scroll to Top