Innovating at the intersection of AI and Generative multimedia. Currently researching advanced lipreading and speech synthesis technologies at IIIT Hyderabad.
IIIT Hyderabad | Grade: 8.5
2024-2026
HITK, Kolkata | Grade: 8.12
2019-2023
Under Submission
A generic and efficient modular framework that adapts an existing frozen pre-trained Text-to-Speech model into a lip-synchronized speech generator.
Under Submission
A mobile app for silent video to speech communication leveraging a adapted VSR using a novel constrained beam search strategy
ICDAR 2025
The first Single Stage Multi Domain OCR for Indian languages, showing better results than all Industry grade OCRs like that of Google, AWS, Azure and open souce SOTA models. Our commitment to open source remains and the model and codes have been made open source.
HindiOCR-VLM : Adapting Vision-Language Models for OCR in Indian Languages
CLIP based domain adaptation via Residual Hypernetworks
Began my research journey at IIIT Hyderabad, focusing on multimodal AI and speech technologies under the guidance of Professor CV Jawahar.
Excited to join Sync Labs as a Research Engineer, working on cutting-edge lip-sync and facial animation technologies.