Projects

Designing and Training a Fully Attentive Multimodal Transformer Network for Medical Visual Question Answering Task

December 2023

Team: Md Mesbahur Rahman

Summary: Medical Question Answering is a very important and impactful application of Multi-modal learning. It can contribute to the interpretability of machine learning model in medical applications, reduce workload of medical professional, and can be a part of fully automated healthcare system. In this project, we have done a background research on the state of the art of Medical Visual Question Answering research. Based on some latest well performing paper, we propose our own fully attention based Transformer only network for solving the medical visual question answering task by treating a multi-class classification problem. We also present some analysis on hyperparameter tuning of the model, compare its performance with models from some other notable papers and suggest some future improvements of our model.

My contribution: Conducted background research on the state of the art of Medical Visual Question Answering research. Based on some latest well performing paper, proposed a novel fully attention-based Transformer only network for solving the medical visual question answering task by treating a multi-class classification problem. Trained the proposed model on the train set of VQA-RAD dataset and our model showed encouraging result on the test-set of the VQA dataset. Also presented some analysis on hyperparameter tuning of the model, compared its performance with models from some other notable papers and suggested some future improvements of our model including steps like pre-training the model on much larger medical vision language datasets.

Analyzing and Mitigating Dataset Artifacts

December 2022

Team: Md Mesbahur Rahman

Resources: [Technical report]

Summary: In NLP research arena Benchmark datasets are often used to compare the performance of different SOTA models. But a high held-out accuracy measure neither conveys the whole story about a model's strengths and weaknesses nor it can guarantee that the model has meaningfully solved the dataset. The model can just learn some spurious correlation in the dataset and can still achieve some high accuracy. This phenomenon is known as Dataset Artifacts and in this project, we tried to identify some cases of it for the ELECTRA-small (Clark et al., 2020) model on the SQuAD problem setting using Checklist and Adverserial Dataset frameworks and took attempt of mitigating some of the Dataset Artifacts using Dataset Inoculation by fine-tuning strategy.

My contribution: Trained ELECTRA-small model on the SQuAD dataset. Then we generated predictions for the respective dataset of Checklist sets and Adversarial SQuAD from this model using our own scripts. Then we used Checklist and Adversarial framework to identify some of the artifacts in the model’s learning. Implemented Inoculation by fine-tuning mwthod for mitigating dataset artifacts by taking our original Electra-small model training on the training set of the SQuAD dataset and fine-tuning it on a small subset sampled from the training set of the ‘Adversarial SQuAD’ dataset. Then we evaluated dataset artifacts of this finetuned model using Checklist sets and Adversarial SQuAD and caompared with the original result.

Autonomous agents for realtime multiplayer ice-hockey

December 2020

Team: Md Mesbahur Rahman, Mohammad Aljubran, Nivethi Krithika, Shubham Bhardwaj

Resources: [Technical report]

Summary: We designed an agent to play SuperTuxKart, and particularly compete with the AI oracle (and other classmate AI agents) in a 2v2 hockey game. Our strategy was to maximize puck possession and minimize puck distance to the opponent’s goal. Imitation Learning and DAgger could not perform sufficiently well when trained using the AI oracle of the game. Instead, an internal state controller was built and found superior to the AI, where it wins 70% of the time and scores an average of 3.1 goals per game when competing in 2v2 against the AI oracle. Based on supervised learning, a planner was trained to detect puck presence and location. Playing 10 2v2 games, this agent wins 30% of the games and scores an average of 1.2 goals per game. Future work can involve training a DAgger learner on the internal state controller.

My contribution: Desihned, coded and trained multi-task fully convolutional CNN for vision stage of pipeline, wrote sections of report.

Unsupervised Anomaly Detection Using Convolutional Autoencoder

May 2020

Team: Md Mesbahur Rahman

Resources: [Code]

Summary: Anomaly detection is a very common and important problem to solve in industrial setting. There are several aproach exists for doing Anomaly Detection using Deep Learning. One of the most effective (both in terms of performace and model training cost) is to utilie unsupervized anomaly detection using Convolutional Autoencoder. In this project, I designed and trained an Convolutional Autoencoder model for detecting anomaly image (images of digit 3 in MNIST dataset) by considfering images of digit 1 as regular image.

My contribution: Built an unsupervised dataset from a supervised labeled dataset of MNIST dataset byremoving its labels. Then defined a Convolutional Autoencoder network in PyTorch and trained it on the unsupervised dataset and allowed the network to learn to reconstruct the training images containing regular images with a small percentage of anomaly images. Then during inference, calculated a reconstruction error (MSE) threshold based on a given percent quantile and declared an input image as an anomaly for who’s the output image reconstruction error is above the preset threshold.

Image Caption Generation using CNN LSTM Encoder Decoder

May 2020

Team: Md Mesbahur Rahman

Resources: [Code]

Summary: Image caption generation is a widely used application of sequential generative model. In this project, I designed and trained a CNN-LSTM encoder-decoder architecture for generating caption from an input image. I did this project as part of the requirement of gaduating 'Computer Vision Nanodegree' from Udacity.

My contribution: Pre-processed the images in the MS COCO Dataset using PyTorch Transforms and converted the captions in the training set into sequence of integers using BOW vocabulary dictionary with a vocabulary threshold of 5. Defined and trained a CNN encoder and a LSTM Decoder on top of a time distributed embedding layer by using pretrained RESNET50 model as a feature extractor to encode an input image into a fixed embed sized vector and then used LSTM decoder to generate captions from the output embedding vector of the CNN encoder. Configurations of the data pre-processing and CNN encoder and LSTM decoder were inspired from this paper. Then inference was done on the ‘test’ portion of the MS COCO dataset.

Facial Keypoint Detection using CNN Haar Cascade Classifier

September 2017

Team: Md Mesbahur Rahman

Resources: [Code]

Summary: Facial keypoint detection is an important example of a computer vision problem which can be solved effectively by treating the problem as an image regression task and and trainign a CNN network for predicting the image location of the key-points. In this project, I trained a CNN network to predict important facial keypoints given an image of a human face. I did this project as a requirements of graduating from Udacity's Computer Vision Nanodegree program.

My contribution: Defined and trained a CNN on facial keypoint dataset from YouTube Faces Dataset using custom transformation in PyTorch to perform regression task to predict the location of 68 facial keypoints as inspired from this paper. During inference detected all the faces in an image using OpenCV’s pre-trained Haar Cascade classifiers and predicted the location of 68 facial keypoints on those detected faces using our trained CNN network.