I am a highly driven individual with over 10 years of research experience, deeply invested in the research, development, and deployment of cutting-edge Computer Vision (CV) and Machine Learning (ML) technologies. My work is centered around creating future disruptive applications, with key areas of focus including Surgical Robotics, Mixed Reality, and Autonomous Vehicles.
PhD in Computer Vision / Machine Learning, 2015 - 2018
Bournemouth University, UK
MSc in Medical Image Computing, 2013 - 2014
University College London (UCL), UK
BSc in Biomedical Engineering, 2009 - 2013
Dalian University of Technology (DUT), China
- June 2023: Organized the ICRA 2023 Workshop on Scalable Autonomous Driving
- June 2021: Co-organized the CVPR 2021 Tutorial: Frontiers in Data-driven Autonomous Driving
- Feb 2021: I was granted a US patent Guided Batching - a method for building city-scale HD maps
- June 2021: Two papers, Data-driven Planner and SimNet, got accepted by ICRA 2021
- June 2020: We released the Lyft Level 5 Prediction Dataset
Large Language Models (LLMs) have shown promise in the autonomous driving sector, particularly in generalization and interpretability. We introduce a unique object-level multimodal LLM architecture that merges vectorized numeric modalities with a pre-trained LLM to improve context understanding in driving situations. We also present a new dataset of 160k QA pairs derived from 10k driving scenarios, paired with high quality control commands collected with RL agent and question answer pairs generated by teacher LLM (GPT-3.5). A distinct pretraining strategy is devised to align numeric vector modalities with static LLM representations using vector captioning language data. We also introduce an evaluation metric for Driving QA and demonstrate our LLM-driver’s proficiency in interpreting driving scenarios, answering questions, and decision-making. Our findings highlight the potential of LLM-based driving action generation in comparison to traditional behavioral cloning. We make our benchmark, datasets, and model available for further exploration.
Mixed reality (MR) is a powerful interactive technology for new types of user experience. We present a semantic-based interactive MR framework that is beyond current geometry-based approaches, offering a step change in generating high-level context-aware interactions. Our key insight is that by building semantic understanding in MR, we can develop a system that not only greatly enhances user experience through object-specific behaviours, but also it paves the way for solving complex interaction design challenges. In this paper, our proposed framework generates semantic properties of the real-world environment through a dense scene reconstruction and deep image understanding scheme. We demonstrate our approach by developing a material-aware prototype system for context-aware physical interactions between the real and virtual objects. Quantitative and qualitative evaluation results show that the framework delivers accurate and consistent semantic information in an interactive MR environment, providing effective real-time semantic-level interactions.