Mudit Jain — Senior Deep Learning Engineer, Multimodal AI

👨‍💻 About Me

Hi, I'm Mudit Jain, an engineer at heart with expertise in 3D/2D Machine Learning, SLAM, Computer Vision, GPU programming, and Embedded Systems.

Currently, my work is centered on using Multimodal AI, particularly fusing LiDAR and camera data for 2D/3D static/dynamic object detection/tracking. My interests include applied ML and CV research, real-time SLAM systems, and advancing computer vision capabilities for autonomous vehicles.

📰 News & Updates

2026: Judge and mentor for the Qualcomm Innovation Fund 2026

2026: Paper accepted at ECCV 2026
BeyondSight: Object Permanence for End-to-End Autonomous Driving

2026: Paper accepted at OmniCV (CVPR Workshop 2026)
FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception

2025: Received 2 Qualcomm Distinguished Innovation Awards

2025: 6 patents submitted

2025: Mentor for Google Summer of Code (GSoC) with OpenCV

2025: Judge and mentor for the Qualcomm Innovation Fund 2025

2025: Reviewer for IEEE ITSC 2025 (Intelligent Transportation Systems Conference)

2025: Reviewer for IEEE IV 2025 (Intelligent Vehicles Symposium)

2024: Joined Qualcomm as Senior Deep Learning Engineer in Multimodal AI.

2024: Mentored a project at Google Summer of Code on 3D Reconstruction.

2024: Serving as a reviewer for WACV 2025.

2021: Joined Qualcomm as Senior Machine Learning Engineer, XR Research.

2021: Graduated with Master's degree from University of California San Diego.

2019: Started Master's in Electrical and Computer Engineering at UCSD.

2019: Joined DroneLab at UCSD as Graduate Student Researcher.

2017: Promoted to Embedded System Software Engineer II at NVIDIA.

2016: Joined NVIDIA as Embedded System Software Engineer I.

2016: Graduated with Bachelor's degree from BITS Pilani.

2016: Selected for Google Summer of Code as Developer for RTEMS.

2015: Joined NVIDIA as an Intern.

2014: Joined Srujana Innovation Center as an Intern.

2012: Started Bachelor's in Electronics and Communication at BITS Pilani.

💼 Work History

Senior Deep Learning Engineer - Multimodal AI Qualcomm Jun 2024 - Present

Building and scaling multimodal perception, auto-labeling, planning, and Vision-Language-Action architectures for autonomous driving. Mentoring innovation fund projects in end-to-end autonomy and fisheye-BEV perception.

Manager: Senthil Yogamani, Dr. Varun Ravi Kumar, Pranav Desai

San Diego, California
Developer & Mentor - 3D Reconstruction Google Summer of Code May 2024 - Sep 2024

Mentoring project on converting unconstrained video to Gaussian Splats with OpenCV organization. Working with Gary Bradski, Founder & President of OpenCV foundation.

Manager: Gary Bradski

San Diego, California
Senior Machine Learning Engineer Qualcomm Aug 2021 - Jun 2024

Led the development of optimized visual odometry solutions for AR/VR/MR use cases. XR Research - Computer Vision team focusing on spatial computing technologies.

Manager: Vasudev Bhaskaran

San Diego, California
Graduate Student Researcher DroneLab, University of California San Diego Sep 2019 - May 2021

Designed and deployed an Attention-based CNN on 600 cameras for wildfire detection (ALERTWildFire initiative). Developed 4K video processing pipeline on NVIDIA AGX Xavier using DeepStream SDK and TensorRT.

Manager: Dr. Falko Kuester

San Diego, California
Graduate Teaching Assistant University of California San Diego Jan 2020 - Mar 2020

Taught Art of Product Engineering (ECE 140A) covering end-to-end software development and hardware integration.

San Diego, California
Embedded System Software Engineer II NVIDIA Oct 2017 - Jul 2019

Designed I2C Virtualization per ISO26262 functional safety standards for ARM-based SoCs (Xavier/Parker). Optimized bootloader and implemented OS-agnostic GPCDMA library for automotive applications.

Bengaluru, India
Embedded System Software Engineer I NVIDIA Jul 2016 - Oct 2017

Bengaluru, India
Developer Google Summer of Code Apr 2016 - Jul 2016

Ported FreeBSD SDMMC driver for RTEMS and added DMA library for Raspberry Pi BSP.

Bengaluru, India · Remote
Intern NVIDIA Jul 2015 - Dec 2015

Developed production tools for automotive customers to create OS firmware and boot targets.

Bengaluru, India
Intern Srujana Innovation Center Dec 2014 - Apr 2015

Developed a low-cost wearable VR headset and Pupil+ platform for eye diagnosis (MIT Media Labs collaboration).

Hyderabad, India

🎓 Education

Master's degree in Electrical and Computer Engineering University of California San Diego (UCSD) 2019-2021

Specialization in Machine Learning and Data Science

Courses: Linear Algebra, Probability and Statistics, Statistical Learning, Visual Learning, Computer Vision I & III, GPU Programming, Deep Learning and Applications
Bachelor of Engineering (B.E.) in Electronics and Communication Birla Institute of Technology and Science, Pilani 2012-2016

🛠️ Skills

💻 Programming Languages

C++ [8+ years] Python [8+ years]

🧠 Technical Knowledge Domains

Multimodal Large Scale Deep Learning [Ray, Kubernetes, PyTorch] 2D/3D Object Detection & Tracking [LiDAR + Camera] BEV Modeling Prediction & Planning End-to-End Autonomous Driving Vision Language Action Models Transformers [PEFT, HuggingFace] Classical Computer Vision [C++, OpenCV] 3D Computer Vision Machine Learning [PyTorch, JAX] SLAM [ORB-SLAM, VINS Mono] Non-linear Optimization [Eigen, g2o, ceres, GTSAM] Bundle Adjustment Camera Calibration Pose Graph Optimization IMU Preintegration Bayesian Inference Embedded Systems SIMD Programming [CUDA] Model & Data Parallelism Model Optimization [TensorRT] 3D Reconstruction [NeRFs, Gaussian Splatting]

📄 Publications

BeyondSight: Object Permanence for End-to-End Autonomous Driving

Sandro Papais, Letian Wang, Mudit Jain, Behnaz Rezaei, Steven L. Waslander

Maintains persistent representations of vehicles and obstacles even while occluded, introducing the nuScenes-Permanence dataset and improving detection of hidden actors and downstream planning.

Accepted at ECCV 2026 · arXiv:2607.09138

FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception

R. Ahuja, Mudit Jain, B. M. M. S. Sudhakar, V. Narayanan, P. Likhar, V. R. Kumar, et al.

Adapts frozen vision foundation models to fisheye geometry via spherical-coordinate rotary position embeddings and LoRA, reaching state-of-the-art WoodScape 2D detection and SynWoodScapes BEV segmentation.

Accepted at OmniCV, CVPR Workshop 2026 · arXiv:2604.10391

🚀 Projects

Autonomous Driving Wiki

A curated knowledge base for autonomous driving research, covering perception, planning, end-to-end models, and multimodal architectures with structured summaries and cross-references across papers and methods.

Minefield Navigator RL

Procedural partially-observable navigation where a recurrent agent sees only a 9×9 fog-of-war window and must reason around walls and mines. Trained with a staged pipeline (imitation → recurrent PPO → MCTS-guided GRPO) over a multi-stage curriculum, reaching 90% success on dense 20×20 maps under instant-death conditions.

Algorithm Visualizer

Interactive step-through visualizer for 42 algorithm problems covering graphs, grids, union-find, tries, and more. Features 6 renderer types, weighted graph support, and auxiliary panels for queue/stack visualization.

Snake RL from Pixels

Trains Snake agents directly from rendered board images, comparing plain DQN, DQN with configurable MCTS rollouts, and GRPO. Includes TensorBoard logging and video generation hooks.

Deduplication Embedding

Local near-duplicate question detection on the Quora Question Pairs dataset using sentence-transformers and HNSW-based approximate nearest neighbor search, with a retrieval-first pattern for semantic similarity and hard example mining.

Tic-Tac-Toe RL

Neural agents for Tic-Tac-Toe using plain DQN and DQN with MCTS-guided action selection, trained against a mixed minimax/random opponent. Includes a live browser demo with exported model weights.

DINOv2 Object Detection

Implementation of object detection using DINOv2 self-supervised vision transformers, enabling state-of-the-art zero-shot detection capabilities.

Bundle Adjustment in the Large

Optimization methods for large-scale bundle adjustment in 3D reconstruction challenges, focusing on efficiency and scalability.

Depth Estimation

Monocular depth estimation techniques for 3D scene understanding from 2D images using deep learning approaches.

Custom CUDA Implementation for Multi-Agent Reinforcement Learning

Accelerated Q-table updates and reward policies for multi-agent Q-learning using CUDA, achieving 100% training accuracy and 99.8% test accuracy in under 4 minutes on a 46×46 grid with 512 agents.

University of California San Diego • Jan 2021 - Mar 2021

Speeding up Mario RL with Custom Torch C++ Extensions

Developed custom CUDA kernels for linear, pooling, ReLU, and convolutional layers to accelerate the training and inference of a CNN for a Double Q-learning based RL agent playing Mario.

University of California San Diego • Jan 2021 - Mar 2021

AlertWILDFire Plume Detection

Deployed an ensemble neural network model across 610 cameras in California for early wildfire detection, using modified MaskRCNN with focal loss and an EfficientNet-based segmentation model with SCSE attention.

Drone Lab - UCSD • Jul 2020 - Jan 2021

Domain Adaptation for Semantic Segmentation

Trained OCNet on Cityscapes dataset and used CycleGAN-based domain adaptation to generate real-world-like data from gaming data, improving model performance through expanded training data.

University of California San Diego • Sep 2019 - Dec 2019

Image Denoising using Deep CNNs

Implemented and compared DnCNN, UDnCNN, and DUDnCNN architectures for image denoising, achieving up to 99.85% accuracy with U-Net with dilated convolutions.

University of California San Diego • Sep 2019 - Dec 2019

3D Reconstruction of the Anterior Segment of the Eye

Developed image processing pipeline and GUI interface for 3D eye model reconstruction by projecting patterns on the anterior segment and applying PCA and object tracking techniques.

Srujana Innovation Center and MIT Media Labs

🏆 Honors & Awards

2 Qualcomm Distinguished Innovation Awards

Qualcomm

Recognized for outstanding contributions to innovation in multimodal AI and autonomous driving technologies.

100% Tuition Scholarship

DroneLab, University of California San Diego

Full tuition coverage for exceptional research contributions and academic merit.

Telangana Overseas Scholarship

Government of Telangana • University of California San Diego

Prestigious government scholarship awarded to exceptional students for pursuing graduate studies abroad.

MCN Scholarship

Birla Institute of Technology and Science, Pilani

Merit-based scholarship recognizing academic excellence and leadership potential.