Innovating at the intersection of AI and Generative multimedia. Currently researching advanced lipreading and speech synthesis technologies at IIIT Hyderabad.
IIIT Hyderabad | Grade: 8.5
2024-2026
HITK, Kolkata | Grade: 8.12
2019-2023
ICDAR 2025
The first Single Stage Multi Domain OCR for Indian languages, showing better results than all Industry grade OCRs like that of Google, AWS, Azure and open souce SOTA models. Our commitment to open source remains and the model and codes have been made open source.
HindiOCR-VLM : Adapting Vision-Language Models for OCR in Indian Languages
CLIP based domain adaptation via Residual Hypernetworks
Began my research journey at IIIT Hyderabad, focusing on multimodal AI and speech technologies under the guidance of Professor CV Jawahar.
Excited to join Sync Labs as a Research Engineer, working on cutting-edge lip-sync and facial animation technologies.