Visualize results in Sigma Computing and Power BI (including DAX measures), and deliver lightweight custom React web apps for interactive diagnostics.
Orchestrate and productionize workloads with Databricks Workflows, implement CI/CD (Git/GitHub Actions/Azure DevOps), and write unit/integration tests (pytest).
Build robust ELT pipelines with Python, SQL, and PySpark; model data for analytics (Delta Lake, medallion architecture) and enforce governance with Unity Catalog.
Create KPI calculations for manufacturing/quality (e.g., FPY, PPM, Cp/Cpk, SPC trends) and document them as a single source of truth.
Develop forecasting, anomaly detection, and root-cause analytics using Databricks ML, MLflow, and the Model Registry; operationalize models via batch jobs or inference endpoints.
Implement automated data quality checks and reconciliation (e.g., Great Expectations/Deequ) and design lineage/observability for pipelines and jobs.
Ingest data from legacy and modern IT systems (e.g., JIRA, Confluence, on-prem/plant systems, SQL/NoSQL sources, flat files, APIs) into a Databricks lakehouse.
The responsibilities of this role require daily attendance in office with in-person meetings and events regularly.
Cross-functional data & LLM alignment: Proactively coordinate with Production/Manufacturing (shopfloor, MES/MOM), Manufacturing IT, data owners/stewards, and the enterprise LLM/OpenAI platform team to ensure data availability, access, SLAs, and safe model functionality.
Requirements
databricks
spark
python
sql
power bi
mlops
Residing in San Francisco: Pursuant to the San Francisco Fair Chance Ordinance, Scout Motors will consider for employment qualified applicants with arrest and conviction records.
Applicants should expect that the role will require the ability to convene with Scout colleagues in person and travel to participate in events on behalf of the company from time to time.
Residing in New York City: This role is not eligible for remote work in New York City.
Data integration: REST/JDBC connectors, OAuth/service accounts, batch & streaming (e.g., Auto Loader; Kafka/Kinesis a plus).
BI: Power BI (including DAX), Sigma Computing (including Sigma web apps).
Years of Experience required in type of role: 4+ years in data engineering/analytics or ML engineering, including 2+ years hands-on with Databricks/Spark.
Ability to translate complex data and model outputs into clear business decisions for technical and non-technical audiences.
Residing in Los Angeles: Scout Motors will consider for employment qualified applicants with criminal histories in a manner consistent with the Los Angeles Fair Chance Initiative for Hiring Ordinance.
Nice to have: Databricks Data Engineer Professional and/or Machine Learning Professional.
Strong Python (pandas, PySpark) and SQL; performance tuning for Spark jobs and Delta Lake tables.
Web applications: React + TypeScript, FastAPI/Flask, OAuth2, Docker, GitHub Actions CI/CD; integration with Databricks SQL & ML inference endpoints.
Databricks lakehouse expertise: Unity Catalog, Delta Live Tables (or equivalent pattern), MLflow, Feature Store, Model Registry.
A Bachelor’s or Master’s in Computer Science, Data/Industrial/Mechanical Engineering, or related field.
Data quality & governance: expectations/validation frameworks, lineage, access controls, secrets management, and auditability.