Autodesk

Design and make software for architecture, engineering, construction, and entertainment industries.

11,600Building DesignConstructionAutomotiveBuilding Product Manufacturing3D AnimationArchitectureEngineeringConstruction ProfessionalsMechanical EngineeringMechanical CADThermal SimulationElectronic Design AutomationPrint Circuit Board DesignMechanical, Electrical, and Plumbing (MEP)HVACFabricationEstimationInfrastructureCivil EngineeringGenetic Engineering (Life Sciences)Website

Intern, Research Foundational Models

Research intern exploring spatial reasoning in vision-language models.

Toronto, Ontario, Canada

Internship

Entry-level

Job Highlights

Environment

Office Full-Time

About the Role

During the internship you will work closely with research mentors to investigate new modeling and training paradigms that move beyond one-shot visual reasoning. The project will focus on approaches such as reinforcement learning, test-time computation, and “thinking with images,” where models iteratively attend to visual evidence, reason over intermediate representations, and verify hypotheses through visual feedback. The goal is to advance state-of-the-art methods for spatially grounded reasoning and contribute insights relevant to both the research community and Autodesk’s long-term vision for intelligent design tools. Over the course of the internship you will define and drive a focused research project, including model development, experimental validation, and analysis, with the opportunity to publish results and present findings internally and externally. You will be expected to produce high-quality research deliverables and collaborate with a team of researchers. • Define and execute a research project on geometric reasoning in vision-language models • Conduct literature reviews to identify limitations of existing VLMs and related multimodal reasoning work • Design and implement novel training or inference strategies using reinforcement learning, test-time computation, or iterative visual reasoning • Develop model architectures, training pipelines, and evaluation benchmarks for spatial and geometric tasks • Run large-scale experiments, analyze results, and iterate on model designs based on empirical findings • Compare proposed approaches against strong baselines and state-of-the-art methods • Collaborate closely with research mentors and other researchers, sharing progress and incorporating feedback • Author a research paper suitable for submission to a top-tier machine learning or computer vision conference • Present research results internally at Autodesk and externally at academic venues

Key Responsibilities

▸model development
▸training pipelines
▸benchmark evaluation
▸large experiments
▸result analysis
▸paper publication

What You Bring

We are seeking a research intern to explore fundamental challenges in geometry, design understanding, and relative spatial reasoning for vision-language models (VLMs). Modern VLMs excel at captioning, semantic understanding, and segmentation, yet they struggle with geometric reasoning, layout understanding, and precise relative positioning—capabilities essential for design, engineering, and creation workflows. Candidates must be currently enrolled in a PhD program in Computer Science, Machine Learning, Computer Vision, or a closely related field, with at least one academic semester remaining after the internship. A strong publication record in top-tier conferences, hands-on experience with vision-language models and reinforcement learning, and solid implementation skills in modern deep-learning frameworks are required. • Currently enrolled in a PhD program in CS, ML, CV, or a related field with at least one semester remaining • Strong publication record in top-tier conferences (e.g., ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV) • Hands‑on experience training vision-language models and reinforcement learning algorithms • Proficiency with modern deep learning frameworks such as PyTorch, TRL, and Ray • Solid background in machine learning fundamentals and experimental research methodology • Ability to work independently on open-ended research problems and communicate results clearly • Experience with multimodal or embodied reasoning, test-time optimization, or iterative inference methods (preferred) • Familiarity with geometric vision, spatial reasoning benchmarks, or synthetic visual datasets (preferred) • Experience scaling experiments on distributed systems or large compute clusters (preferred) • Strong written and verbal communication skills

Requirements

▸phd
▸publications
▸vision-language
▸reinforcement learning
▸pytorch
▸distributed

Benefits

Autodesk offers a competitive compensation package, a culture of belonging, and the chance to work on technology that shapes the built environment, from green buildings to smart factories and blockbuster movies.

Work Environment

Office Full-Time

Apply Now