Lead Data Engineer - AI/ML Focus

Michael Baker International

The Role

Overview

Lead data engineer building AI/ML platforms and scalable data pipelines

Key Responsibilities

  • cloud optimization
  • data engineering
  • mlops pipelines
  • feature store
  • data governance
  • real-time analytics

Tasks

-Optimize cloud cost, performance, and reliability for large-scale AI/ML workloads. -Drive innovation in AI/ML data engineering and real-time analytics. -Ensure compliance with SOC2, GDPR, PII standards based on company needs. -Guide data engineers on best practices, code quality, and scalable data engineering patterns. -Lead and mentor data engineering teams. -Implement model monitoring, data quality, feature versioning, and automated retraining. -Drive standards for cloud data infrastructure and reusable data engineering components. -Build ML-ready data environments, feature stores, and training pipelines. -Translate business requirements into scalable data strategies. -Partner with data scientists to productionize ML models with CI/CD/CT. -Define and execute enterprise data strategies aligned with AI/ML initiatives while championing best practices in data engineering, MLOps, and cloud optimization. -Build high-quality data models (dimensional, data vault, lakehouse). -Architect MLOps pipelines using Docker, Kubernetes, Terraform, MLflow, SageMaker, or Vertex AI. -Lead design of scalable data pipelines, ingestion frameworks, and distributed processing systems. -Collaborate with data scientists, ML engineers, and business stakeholders to deliver impactful solutions. -Implement observability, lineage, and data quality frameworks across all pipelines. -Maintain metadata, cataloging, governance processes (Collibra, Alation, Unity Catalog). -Develop scalable ELT/ETL pipelines using Spark, PySpark, SQL, Airflow, DBT, Kafka, Kinesis. -Ensure quality, compliance, and security across all data platforms while implementing observability, lineage, and governance frameworks. -Support real-time and batch feature engineering and inference pipelines. -Implement secure data-sharing, encryption, IAM, tokenization, and access patterns. -Champion emerging technologies including GenAI, vector databases, and LLM-based pipelines.

Requirements

  • python
  • spark
  • databricks
  • snowflake
  • aws
  • mlops

What You Bring

-Background in real-time analytics and low-latency ML inference. -Strong programming in Python, SQL; deep expertise in Spark/Databricks. -Architect enterprise data lake/lakehouse/warehouse solutions (Databricks, Snowflake, BigQuery, Redshift). -Bachelor’s degree in Computer Science or related field, or similar, or equivalent experience. -Experience implementing vector databases (Pinecone, FAISS, Milvus) and LLM-based pipelines including RAG. -Any Data or AI/ML related certifications. -6–12+ years of data engineering experience with 2–5+ years in a lead role. -Own end-to-end execution of data engineering initiatives, including estimation, delivery, and performance optimization. -Expertise with cloud platforms (AWS, Azure, or GCP). -Experience building ML-ready architectures, feature stores, and MLOps pipelines. -Proven ability to lead engineering teams, mentor junior engineers, and drive architectural decisions. -Experience in highly regulated industries (healthcare, fintech, retail, AEC, manufacturing).

Benefits

-Flexible Spending Account (FSA) ​ -Health Savings Account (HSA) ​ -Life, AD&D, short-term, and long-term disability ​ -Generous paid time off​ -Professional and personal development ​ -401 (k) Retirement Plan ​ -Medical, dental, vision insurance ​ -Commuter and wellness benefits

The Company

About Michael Baker International

-Specializes in transportation, water resources, energy, and government-related projects, delivering high-quality solutions. -Known for its expertise in complex, multi-disciplinary projects that shape cities, regions, and transportation networks. -Works on a diverse range of projects including roadways, bridges, airports, water systems, and energy infrastructure. -Regularly supports local, state, and federal government agencies with crucial public infrastructure projects. -The firm’s legacy is built on delivering practical, sustainable, and forward-thinking solutions to challenging engineering problems. -Notable projects include major transportation systems, environmental remediation, and disaster recovery operations. -Employs cutting-edge technology and innovative approaches to solve the most pressing engineering challenges of today.

Sector Specialisms

Industrial

Defense Contractors

Data Centers

Aerospace

Life Sciences

Healthcare

Higher Education

Aviation

Rail and Transit

Federal Government