Multimodal Object Detection Project

Overview

In the course Intro to Deep Learning at CMU, I completed a project on re-implementing and analyzing the methodology presented in the paper Multimodal Object Detection by Channel Switching and Spatial Attention. The objective was to validate the paper's findings and explore enhancements in multimodal object detection, particularly in dimly lit environments. We implemented from scratch the first publicly available codebase of the proposed fusion pipeline using PyTorch, integrating RGB and infrared (IR) data with two ResNet-50 backbones and a Faster-RCNN architecture. The key innovation lies in the Channel Switching and Spatial Attention (CSSA) module, which efficiently fuses multimodal inputs while maintaining computational efficiency. Experiments conducted on the LLVIP dataset confirmed improvements in detection accuracy through multimodal fusion. Additionally, we explored hyperparameter tuning, data augmentation techniques, and parameter-sharing strategies to optimize performance. The repository for our codebase can be found on GitHub here.

Video Summary

Below is a video of us summarizing the project. Check it out!

Paper

Below is the final paper for our project.

Cardiac Patch Delivery Device

Cardiac Patch Delivery Device

CMU Robotics Master's Graduate

MIT Mechanical Engineering Graduate

Multimodal Object Detection Project

Overview

Video Summary

Paper