This job has been expired

September 2, 2025

C2C

Harry@virtuallabsus.com

Job Title: Data Engineer – Databricks & Machine Learning
Location: Sunnyvale, CA (Onsite)

Job Summary
We are seeking a highly skilled Data Engineer with strong experience in Databricks and Machine Learning pipelines to design, build, and optimize scalable data solutions. The ideal candidate will collaborate with data scientists, analysts, and business stakeholders to ensure high-quality data delivery for analytics, predictive modeling, and AI-driven initiatives.

Key Responsibilities
Design, develop, and maintain data pipelines on Databricks for ingestion, transformation, and processing of large datasets.

Implement ETL/ELT workflows to ensure reliable and efficient movement of structured and unstructured data.

Collaborate with data science teams to operationalize machine learning models and integrate them into production environments.

Optimize Spark-based data processing jobs for performance and cost efficiency.

Manage and monitor Delta Lake tables, ensuring data quality, reliability, and compliance.

Develop CI/CD workflows for data pipelines, model deployment, and versioning.

Ensure adherence to data governance, security, and compliance standards.

Troubleshoot production issues, perform root cause analysis, and implement permanent fixes.

Work closely with stakeholders to understand data requirements and deliver business-ready datasets.

Required Skills & Qualifications
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.

3+ years of experience as a Data Engineer, with a focus on cloud platforms (Azure, AWS, or GCP).

Strong expertise in Databricks, PySpark, and Delta Lake.

Proficiency in Python and SQL for data transformation and pipeline development.

Experience integrating machine learning models into production pipelines (MLflow or similar tools preferred).

Knowledge of data warehousing concepts and modern architectures (e.g., Lakehouse).

Experience with orchestration tools (Airflow, ADF, or similar).

Familiarity with Git-based workflows, DevOps practices, and CI/CD pipelines.

Strong understanding of data governance, security, and compliance (e.g., GDPR, HIPAA).

Preferred Qualifications
Exposure to ML model monitoring and feature store management.

Experience with streaming data pipelines (Kafka, Spark Streaming).

Certifications in Databricks, cloud platforms, or data engineering.