Long Chen is a distinguished research lead, with a proven track record in developing disruptive AI technologies. He currently holds the position of Staff Scientist at Wayve, where he is at the forefront of building vision-language-action (VLA) models for the next wave of autonomous driving, such as Driving-with-LLMs and LINGO. Previously, he was a research engineer at Lyft Level 5, where he led the data-driven planning models from crowd-sourced data for Lyft’s self-driving cars. His extensive experience also includes applying AI technologies in various domains such as mixed reality, surgical robots, and healthcare.
PhD in Computer Vision / Machine Learning, 2015 - 2018
Bournemouth University, UK
MSc in Medical Image Computing, 2013 - 2014
University College London (UCL), UK
BSc in Biomedical Engineering, 2009 - 2013
Dalian University of Technology (DUT), China
- Sep 2024: Keynote talk at ECCV 2024 Workshop: Autonomous Vehicles meet Multimodal Foundation Models.
- Sep 2024: Keynote talk at IEEE ITSC 2024 Workshop: Large Language and Vision Models for Autonomous Driving.
- Sep 2024: Keynote talk at IEEE ITSC 2024 Workshop: Foundation Models for Autonomous Driving.
- July 2024: Paper LingoQA: Video Question Answering for Autonomous Driving was accepted to ECCV 2024!
- June 2024: Keynote talk at CVPR 2024 Workshop: Vision and Language for Autonomous Driving and Robotics.
- June 2024: Organized the CVPR 2024 Tutorial: End-to-End Autonomy: A New Era of Self-Driving in Seattle, US.
- June 2024: CarLLaVA won the 1st place of CARLA Autonomous Driving Challenge!
- May 2024: Presented the ICRA 2024 Paper: Driving-with-LLMs in Yokohama, Japan.
- June 2023: Organized the ICRA 2023 Workshop on Scalable Autonomous Driving in London, UK.
- June 2021: Co-organized the CVPR 2021 Tutorial: Frontiers in Data-driven Autonomous Driving
- Feb 2021: Granted US patent Guided Batching - a method for building city-scale HD maps for autonomous driving
- June 2021: Two papers, Data-driven Planner and SimNet, got accepted by ICRA 2021
- June 2020: We released the Lyft Level 5 Prediction Dataset
AV2.0 - building the next generation of self-driving cars with End-to-End (E2E) Machine Learning, Vision-Language-Action (VLA) models.
[CVPR 2024: End-to-End Tutorial] [ICRA 2023: End-to-End Workshop]
Autonomy 2.0 - Data-Driven Planning models for Lyft’s self-driving vehicles.
[CVPR 2021: Autonomy 2.0 Tutorial] [ICRA 2021: Crowd-sourced Data-Driven Planner]
Autonomous driving has long faced a challenge with public acceptance due to the lack of explainability in the decision-making process. Video question-answering (QA) in natural language provides the opportunity for bridging this gap. Nonetheless, evaluating the performance of Video QA models has proved particularly tough due to the absence of comprehensive benchmarks. To fill this gap, we introduce LingoQA, a benchmark specifically for autonomous driving Video QA. The LingoQA trainable metric demonstrates a 0.95 Spearman correlation coefficient with human evaluations. We introduce a Video QA dataset of central London consisting of 419k samples that we release with the paper. We establish a baseline vision-language model and run extensive ablation studies to understand its performance.
Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver’s proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.
Mixed reality (MR) is a powerful interactive technology for new types of user experience. We present a semantic-based interactive MR framework that is beyond current geometry-based approaches, offering a step change in generating high-level context-aware interactions. Our key insight is that by building semantic understanding in MR, we can develop a system that not only greatly enhances user experience through object-specific behaviours, but also it paves the way for solving complex interaction design challenges. In this paper, our proposed framework generates semantic properties of the real-world environment through a dense scene reconstruction and deep image understanding scheme. We demonstrate our approach by developing a material-aware prototype system for context-aware physical interactions between the real and virtual objects. Quantitative and qualitative evaluation results show that the framework delivers accurate and consistent semantic information in an interactive MR environment, providing effective real-time semantic-level interactions.