Cloud Orchestration & Scheduling Architect (AI Infrastructure)

Sustainability Economics.Ai

The Role

Overview

Design and implement AI workload orchestration & scheduling on cloud platforms.

Key Responsibilities

  • orchestration
  • scheduling
  • automation
  • resource allocation
  • kubernetes
  • forecasting

Tasks

We are seeking a Cloud Orchestration & Scheduling Architect to design and implement intelligent systems that dynamically balance AI workload demand, supply and compute capacity across cloud data centers. The role focuses on building orchestration, scheduling, and optimization frameworks for large-scale AI inference workloads, ensuring efficient use of compute, energy, and cost resources. -Design and implement workload orchestration and scheduling systems for AI inference pipelines across distributed data centers. -Design the orchestration layer for AI workloads across distributed data centers. -Build automation pipelines for dynamic scaling and job scheduling. -Monitor system performance and drive continuous improvements in utilization and efficiency. -Develop mechanisms to match workload demand with available compute and supply, optimizing for performance, cost, and sustainability. -Shape a first-of-its-kind AI + clean energy platform. -Continuously evaluate new frameworks and optimization algorithms to enhance system scalability and resilience. -Optimize resource allocation by balancing compute demand, supply and economics. -Manage multi-region orchestration using Kubernetes, Ray, or similar distributed compute frameworks. -Integrate predictive demand forecasting and energy availability data into scheduling decisions. -Build automation to dynamically scale GPU/CPU clusters based on real-time and forecasted AI inference workloads. -Build dashboards and observability tools to track compute utilization, cost efficiency. -Implement cost-aware, and latency-aware scheduling policies for optimal workload placement.

Requirements

  • cka
  • kubernetes
  • python
  • docker
  • terraform
  • aws

What You Bring

The ideal candidate will have strong experience in Kubernetes, distributed systems, and scheduling frameworks, with a deep understanding of how to align AI compute workloads with real-time demand and supply. -Proactive, ownership-driven approach, with the ability to improve systems end-to-end. -Certified Kubernetes Administrator (CKA) -Proven experience in AI/ML infrastructure orchestration, workflow scheduling, or HPC environments. -Strong proficiency in Kubernetes, EKS, or Ray for distributed workload orchestration. -Experience with inference batching, request routing, and autoscaling strategies using frameworks like Ray Serve, Triton Inference Server, or KServe. -Experience in AIOps: automated monitoring, anomaly detection, root-cause analysis, and predictive operations for AI workloads. -Experience building or tuning custom schedulers for optimization. -Proficiency in Python, Go, or Bash scripting for automation. -Strong systems thinking and the ability to design control mechanisms for complex, distributed workloads. -Agile mindset, adaptability, and eagerness to learn emerging tools and technologies. -Bachelor's or master's degree in computer science, Information Technology, or related field. -Certifications (preferred): -Exposure to monitoring and observability tools (Prometheus, Grafana, CloudWatch). -Hands-on experience with container orchestration, automation, and observability. -Familiarity with containerization (Docker), IaC tools (Terraform/CloudFormation), and GitOps (Argo CD). -Understanding of GPU utilization optimization, and CUDA profiling for efficient model execution. -Experience building demand forecasting and queue-based scheduling (Redis, Kafka, or similar) systems to balance compute load and supply. -Experience with workflow orchestration tools like Airflow, Argo, or Prefect. -Knowledge of GPU orchestration, NVIDIA Triton, or KServe for model serving at scale. -Understanding scheduling algorithms, cluster autoscaling, and load balancing in large systems. -Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset. -Proven experience of 2–5 years in cloud infrastructure, distributed systems, or large-scale DevOps environments. -Curiosity about distributed scheduling and cloud optimization. -Hands-on experience managing workloads on AWS, or hybrid cloud platforms.

Benefits

-Work with a small, mission-driven team obsessed with impact. -A chance to leave your mark at the intersection of AI and sustainability.

The Company

About Sustainability Economics.Ai

-Founded with a vision to drive the transition to a sustainable future through advanced AI technology. -Specializes in leveraging artificial intelligence to optimize economic sustainability across various industries. -Works on projects that aim to reduce environmental impact while promoting growth and resilience in key sectors. -Has provided consulting for energy, water resources, infrastructure, and transport industries globally. -Their AI models help businesses make data-driven decisions for sustainable operations, improving efficiency and profitability. -Notable for their interdisciplinary approach, combining economic modeling with cutting-edge AI and machine learning techniques. -Has collaborated with government agencies, corporations, and NGOs to shape policies and strategies for sustainable development.

Sector Specialisms

Industrial

Energy

Infrastructure

Buildings

Residential

Commercial

Water Resources

Heavy Civil

Marine

Transport

Utilities

Solar

Wind

Nuclear

Government