Innovating at the intersection of AI and Generative multimedia.
Currently researching advanced lipreading and speech synthesis technologies at IIIT Hyderabad.
Education
MS by Research, CSE
IIIT Hyderabad | Grade: 8.5
2024-2026
Courses taken: Statistical Methods in AI, Digital Image Processing, Advanced NLP(LLMs), Computer Vision, Technology, Product and Entrepreneurship
Research focus on multimodal AI, Speech Technologies, LLMs and Representation Learning.
Working with CVIT Lab in the Audio Visual Team guided by Professor CV Jawahar and Professor Vinay Namboodiri
BTech, Applied Electronics and Instrumentation
HITK, Kolkata | Grade: 8.12
2019-2023
Explored the beauty of interdisciplinary education.
Final Year Thesis on IOT and Edge ML for patients with Epilepsy
Major Projects in Harassment Detection and Women Safety. Runners up at Nasscom Lab 2 Market
Experience
Technical Expertise
AI & ML
Multimodal Generative AI
LipSync & Speech Technologies
Computer Vision & LLMs
MLOps
AWS, Azure, Google Cloud
Docker
FastAPI & Flask
Frameworks
PyTorch
TensorFlow
Hugging Face
Publications
reSenseNet: Ensemble Early Fusion Deep Learning Architecture
IHCI 2021
Published in International Conference on Intelligent Human Computer Interaction
Speech@SCIS: Annotated Indian Video Dataset
Smart Intelligent Computing and Applications
Published in Smart Intelligent Computing and Applications, Volume 1