Welcome: The 2nd edition of the 3D-LLM/VLA Workshop will be held at CVPR 2026!

Bridging Language, Vision and Action in 3D Environments

2nd Workshop on 3D-LLM/VLA | CVPR 2026 | June 3 | Room 1CD | Denver, CO, USA

Workshop Overview

This workshop addresses a critical gap in current AI research by focusing on the integration of language and 3D perception, which is essential for developing embodied agents and robots, especially considering the recent rise of multimodal LLMs and vision-language-action (VLA) models.

Building on the momentum and insights from our first workshop at CVPR 2025, this 2nd edition will continue to provide a unique platform for discussing the integration of language and 3D perception. The workshop will deepen the collaboration established in the first edition and further advance the state-of-the-art in 3D-LLM/VLA research.

Topics Include:

  • Integration of language and 3D perception
  • Large language models (LLMs) in 3D environments
  • 3D vision-language-action (VLA) models
  • Embodied agents that integrate language, vision, and action
  • 3D scene understanding and generative world models
  • Robot control and navigation using natural language
  • Multimodal learning for embodied AI

Important Dates

Paper Submission

April 26, 2026

Notification

May 10, 2026

Camera Ready

May 24, 2026

Workshop Date

June 3, 2026 (Half-day)

Call for Papers

Overview

We invite submissions of papers related to the integration of language and 3D perception, with a focus on developing embodied agents and robots. Topics of interest include, but are not limited to:

  • Language-guided 3D perception and understanding
  • Large language models (LLMs) for 3D environment understanding
  • 3D vision-language-action (VLA) models
  • Embodied agents that integrate language, vision, and action
  • 3D scene understanding and generative world models
  • Robot control and navigation using natural language
  • Multimodal learning for embodied AI
  • Datasets and benchmarks for 3D-LLMs and 3D-VLAs
  • Applications of 3D-LLMs and 3D-VLAs in real-world scenarios
  • Ethical considerations in developing embodied AI systems

Awards

Congratulations to our seven spotlight papers, including the best paper and runner-up award recipients:

  • Best Paper Award:
    Ψ0: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation
  • Best Paper Runner-up Awards:
    Do 3D Large Language Models Really Understand 3D Spatial Relationships?
    PA3FF: Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation
    VLS: Steering Pretrained Robot Policies via Vision–Language Models
    Robot Learning from a Physical World Model
  • Spotlight Papers:
    GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation
    LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans

Submission Guidelines

Papers can be submitted in any major conference's format, with a length of 2-8 pages (excluding references). All submissions will be peer-reviewed, and accepted papers will be presented at the workshop as posters.

Submissions should be made through the OpenReview submission system

Important Dates

  • Paper Submission Deadline: April 26, 2026
  • Decision Notification: May 10, 2026
  • Camera Ready Deadline: May 24, 2026
  • Workshop Date: June 3, 2026

Publication

The workshop will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.

Schedule

June 3, 2026 | Room 1CD | CVPR 2026, Denver, CO, USA

12:30pm - 1:00pm

Poster Setup & Poster Session

Authors set up posters and early attendees can browse

1:00pm - 1:45pm
Ziwei Liu

Keynote 1: Ziwei Liu

Nanyang Technological University

Computer Vision, Multimodal AI

1:45pm - 2:30pm
Yue Wang

Keynote 2: Yue Wang (Remote)

University of Southern California

Vision, Robotics

2:30pm - 3:15pm
Leonidas Guibas

Keynote 3: Leonidas Guibas

Stanford University

3D Vision, Robotics

3:15pm - 3:30pm

Coffee Break

3:30pm - 4:15pm
Angela Dai

Keynote 4: Angela Dai

Technical University of Munich

3D Vision

4:15pm - 5:00pm
Ranjay Krishna

Keynote 5: Ranjay Krishna

University of Washington

Vision, NLP, Robotics

5:00pm - 5:45pm
Marc Pollefeys

Keynote 6: Marc Pollefeys

ETH Zurich

3D Vision

5:45pm - 6:15pm

Closing Poster Session, Best Paper Announcement & Remarks

Poster presentations, best paper announcement, and workshop closing discussion

Keynote Speakers

Leading experts in 3D perception, language, and robotics

Organizing Committee

Meet the team behind the 2nd 3D-LLM/VLA Workshop

Accepted Papers

Stress-Aware Reasoning for Robust Vision-Language-Action Agents
LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans Spotlight
Ψ0: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation Best Paper Spotlight
LightSplat: Fast and Memory-Efficient Open-Vocabulary 3D Scene Understanding in Five Seconds
Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation
Chasing Ghosts: A Simulation-to-Real Olfactory Navigation Stack with Optional Vision Augmentation
FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning
EthosVLA: A Constitutional Safety Framework for Vision-Language-Action Models in Physical 3D Environments
CausalScene: Learning Causal 3D Scene Graphs for Counterfactual Reasoning in Embodied Agents
What Do VLAs Actually Learn through In-Context Failure Conditioning?
Do 3D Large Language Models Really Understand 3D Spatial Relationships? Runner-up Spotlight
Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering
Code3DBench: Single-Image to Executable Low-Poly 3D Code Generation
Video2Assets: Extracting 3D Object Assets from Unconstrained Video
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation Spotlight
Multimodal Causal Subtask Modeling for Scalable VLA Pipelines in Long-Horizon Manipulation
SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning
Reference-Free Assessment of Physical Consistency in World Model-based Video Generation
Robot Learning from a Physical World Model Runner-up Spotlight
Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations
AbsVLA: Learning Robust Primitive Manipulation Skills for VLA Models in Object-Centric Abstracted States
Autonomous Frontier-Based Exploration with VLM Guidance
SPATIOROUTE: Dynamic Prompt Routing for Zero-Shot Spatial Reasoning
LychSim: A Controllable and Interactive Simulation Framework for Multimodal LLM
LangPose: SE(3)-Equivariant Language Grounding for 3D Vision-Language-Action Models
A Taxonomy-Driven Modular Defense against Non-Canonical Language in Vision-Language-Action Models
Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping
Flat-Pack Bench: Evaluating Spatio-Temporal Understanding in Large Vision-Language Models through Furniture Assembly
VLS: Steering Pretrained Robot Policies via Vision–Language Models Runner-up Spotlight
Token Warping Helps MLLMs Look from Nearby Viewpoints
Explicit Token-Based Adapters for Frozen Vision-Language-Action Models
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
Compressing 3D Scene Context for LLMs: Spatial Enriched Graph Attention
Can Embodied Agents Remember What You Said? Evaluating Dialogue-Grounded Embodied Memory in 3D Environments
Dynamic Anchors for Closed-Loop Language-Guided Camera Control in Basketball Scenes
BIT-Nav: Brain-Inspired Trajectory Memory for Embodied Navigation
PA3FF: Learning Part-Aware Dense 3D Feature Field For Generalizable Articulated Object Manipulation Runner-up Spotlight
ConstructGPT: BIM-Conditioned 3D Diffusion and World Models for Automated Construction Site Monitoring and Compliance Checking

Contact Us

Have questions? We're here to help

Email

For general inquiries:

yinihong@cs.stanford.edu

whu@cs.ucla.edu

Paper Submissions

Submit your papers via:

OpenReview Submission System

Workshop Location

CVPR 2026

Room 1CD

Denver, CO, USA

Frequently Asked Questions

What is the paper submission deadline?

The paper submission deadline is April 26, 2026.

Is the workshop in-person or virtual?

The workshop will be held in-person on June 3, 2026 at CVPR 2026 in Room 1CD, Denver, CO, USA.

Are the workshop papers archival?

No, the workshop will be non-archival. Authors of accepted papers retain the full copyright of their work and are free to submit extended versions to conferences or journals.

What is the maximum page length for submissions?

Papers can be submitted in any major conference's format, with a length of 2-8 pages (excluding references).

Sponsors

Figure AI is building the world's first commercially viable autonomous humanoid robot. We are looking for world-class researchers and engineers to join our team and help deploy AI into the real world.

View Open Roles