Hi! I’m Mesbahur, a Senior Computer Vision Engineer at DeepX, Inc.
DeepX, Inc. is a Tokyo based startup aiming for automating any machines and innovating global industries. I research and develop Computer Vision solution to various perception problem of robotic software development.
Bio: From 2017 to 2019 worked as an AI Engineer for Hiperdyne Corporation which is a Tokyo based startup aiming to provide Deep Learning and machine learning based solution to client’s problems in commercial and industrial setting. In Hiperdyne I worked in projects for renowned companies of Japan namely Sony Computer Science Laboratories (Sony CSL) and Sony Network Communications Inc (Sonet). In the meantime, in Fall 2022 I had started a online Masters program in Computer Science in order to learn fundamentals of Computer Science and Machine Learning by studying their theory and doing hands-on projects. Next in 2022, I joined DeepX, Inc. to work as computer vision engineer and contribute to their quest of automation of heavy machines.
Impact: The outcome of my project on machine learning research on music at Sony CSL has been successfully deployed on Sony ATV Music server and since then it being used to predict genre and mood of a music from its audio based feature set. In DeepX I built an anomaly detection system based classical computer vision algorithms. I also developed several Point Cloud filter and image information retrieval system based on the bev (bird’s eye view) image of point clouds which has been tested and deployed in automated excavators in real site.
Research Interest: I am interested in research for Machine Learning/Deep Learning algorithm improvement for pattern recognition in image/point cloud and textual data and real world application of Machine Learning and Deep Leaning in Urban Mobility, Healthcare Informatics, geosatelite and agricultural space. I am also interested in research on label efficient learning, reducing domain gap between simulation and real world for Computer Vision models, fusion of LLM and CV algorithms for bringing the power of LLMs to computer vision domain, etc.
Projects
Designing and Training a Fully Attentive Multimodal Transformer Network for Medical Visual Question Answering Task
Md Mesbahur Rahman
Analyzing and Mitigating Dataset Artifacts
Md Mesbahur Rahman
Autonomous agents for realtime multiplayer ice-hockey
Md Mesbahur Rahman, Mohammad Aljubran, Nivethi Krithika, Shubham Bhardwaj
Unsupervised Anomaly Detection Using Convolutional Autoencoder
Md Mesbahur Rahman
[Code]
Image Caption Generation using CNN LSTM Encoder Decoder
Md Mesbahur Rahman
[Code]
Facial Keypoint Detection using CNN Haar Cascade Classifier
Md Mesbahur Rahman
[Code]