Inference Optimization Engineer(LLM and Runtime)

Sustainability Economics.Ai

The Role

Overview

Optimize LLM inference and runtime for scalable, low-latency AI services

Key Responsibilities

  • library development
  • inference pipelines
  • model optimization
  • fine‑tuning
  • model serving
  • performance profiling

Tasks

-Build and maintain internal libraries, wrappers, and benchmarking suites for continuous performance evaluation. -Develop custom inference pipelines capable of high throughput and low latency under real-world traffic. -Apply and evaluate advanced model optimization techniques such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance. -Implement custom fine-tuning pipelines using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead. -Work closely with platform and infrastructure teams to reduce latency, memory footprint, and cost-per-token during production inference. -Design and implement scalable model-serving architectures on GPU clusters and cloud infrastructure (AWS, GCP, or Azure). -Shape a first-of-its-kind AI + clean energy platform. -Monitor and profile performance using tools such as Nsight, PyTorch Profiler, and Triton Metrics to drive continuous improvement. -Optimize runtime performance of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate. -Evaluate hardware–software co-optimization strategies across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators. -Optimization and customization of large-scale generative models (LLMs) for efficient inference and serving. -Take ownership of end-to-end optimization lifecycle — from profiling bottlenecks to delivering production-optimized LLMs.

Requirements

  • ph.d.
  • 2-3 years
  • llm
  • deep learning
  • inference optimization
  • collaboration

What You Bring

We are seeking a highly skilled and innovative Inference Optimization (LLM and Runtime) to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation. -A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges. -Curiosity-driven attitude — keeps up with emerging model compression and inference technologies. -Pragmatic problem-solver who values efficiency, reproducibility, and maintainable code over theoretical exploration. -2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work. -Collaborative mindset, with ability to work across research, engineering, and product teams. -Ph.D. in Computer Science or a related field, with a specialization in Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML). -Hands-on experience in building, optimizing machine learning or Agentic Systems at scale. -Startup DNA → bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset. -Strong analytical and mathematical reasoning ability with a focus on measurable performance gains.

Benefits

-Work with a small, mission-driven team obsessed with impact. -A chance to leave your mark at the intersection of AI and sustainability.

The Company

About Sustainability Economics.Ai

-Founded with a vision to drive the transition to a sustainable future through advanced AI technology. -Specializes in leveraging artificial intelligence to optimize economic sustainability across various industries. -Works on projects that aim to reduce environmental impact while promoting growth and resilience in key sectors. -Has provided consulting for energy, water resources, infrastructure, and transport industries globally. -Their AI models help businesses make data-driven decisions for sustainable operations, improving efficiency and profitability. -Notable for their interdisciplinary approach, combining economic modeling with cutting-edge AI and machine learning techniques. -Has collaborated with government agencies, corporations, and NGOs to shape policies and strategies for sustainable development.

Sector Specialisms

Industrial

Energy

Infrastructure

Buildings

Residential

Commercial

Water Resources

Heavy Civil

Marine

Transport

Utilities

Solar

Wind

Nuclear

Government