
Parallel Systems
Developing autonomous electric freight vehicles for more efficient and sustainable transportation.
Senior ML Ops Engineer (Machine Learning Infrastructure)
Design & implement scalable MLOps platform for autonomous rail vehicle AI.
Job Highlights
About the Role
The Senior ML Ops Engineer will lead the design and development of scalable systems that power the company’s autonomy and perception pipelines. This role owns the end‑to‑end ML infrastructure stack—from distributed training environments and experiment tracking to deployment and monitoring—in both R&D and real‑world settings, and requires at least one day per month onsite at the Los Angeles office. In the first three months, the engineer is expected to quickly master product goals and existing infrastructure, produce a detailed MLOps architecture, and deliver core pipeline capabilities using tools such as MLflow, SageMaker or Kubeflow, establishing a foundation for repeatable model experimentation and deployment at scale. • Design and implement automated MLOps pipelines for data management, model training, deployment, and monitoring. • Architect, deploy, and manage scalable infrastructure for distributed training and inference. • Collaborate with ML engineers to gather requirements and devise data and model development strategies. • Build and operate cloud‑based systems (AWS, GCP) optimized for ML workloads in R&D and production. • Enable continuous integration/deployment, experiment management, and governance of models and datasets. • Automate model evaluation, selection, and deployment workflows. • Within 30 days, develop deep product understanding and propose a preliminary MLOps architecture with evaluated tools and trade‑offs. • Within 60 days, deliver a detailed end‑to‑end ML pipeline design, iterate based on feedback, and build a proof‑of‑concept core workflow. • Within 90 days, implement core MLOps pipeline features, integrate key tools (e.g., MLflow, SageMaker, Kubeflow), and begin remaining feature development for scalable experimentation and deployment.
Key Responsibilities
- ▸mlops pipelines
- ▸scalable infrastructure
- ▸cloud systems
- ▸experiment management
- ▸model automation
- ▸tool integration
What You Bring
• Bachelor’s (or higher) in Computer Science, Machine Learning, or related field. • 5+ years building large‑scale reliable systems; 2+ years focused on ML infrastructure/MLOps. • Proven experience architecting production‑grade ML pipelines and platforms. • Strong knowledge of the full ML lifecycle: data ingestion, training, evaluation, packaging, deployment. • Hands‑on experience with MLOps tools such as MLflow, Kubeflow, SageMaker, Airflow, Metaflow, etc. • Deep understanding of CI/CD practices for ML workflows. • Proficiency in Python, Git, and solid software engineering fundamentals. • Experience designing ML architectures on cloud platforms (AWS, GCP, Azure). • Experience with deep learning architectures (CNNs, RNNs, Transformers) or computer vision. • Hands‑on experience with distributed training tools (PyTorch DDP, Horovod, Ray). • Background in real‑time ML systems and batch inference with CPU/GPU‑aware orchestration. • Prior work in autonomous vehicles, robotics, or other real‑time ML‑driven systems.
Requirements
- ▸bachelor's
- ▸python
- ▸mlflow
- ▸kubeflow
- ▸ci/cd
- ▸aws
Benefits
Parallel Systems offers a target salary range of $150 k‑$240 k USD and is an equal‑opportunity employer committed to diversity, inclusion, and providing reasonable accommodations for applicants with disabilities.
Work Environment
Hybrid