AI Researcher & Data Scientist

Souvik Ghosh

Innovating at the intersection of AI and Generative multimedia. Currently researching advanced lipreading and speech synthesis technologies at IIIT Hyderabad.

Souvik Ghosh

Education

MS by Research, CSE

IIIT Hyderabad | Grade: 8.5

2024-2026

  • Courses taken: Statistical Methods in AI, Digital Image Processing, Advanced NLP(LLMs), Computer Vision, Technology, Product and Entrepreneurship
  • Research focus on multimodal AI, Speech Technologies, LLMs and Representation Learning.
  • Working with CVIT Lab in the Audio Visual Team guided by Professor CV Jawahar and Professor Vinay Namboodiri

BTech, Applied Electronics and Instrumentation

HITK, Kolkata | Grade: 8.12

2019-2023

  • Explored the beauty of interdisciplinary education.
  • Final Year Thesis on IOT and Edge ML for patients with Epilepsy
  • Major Projects in Harassment Detection and Women Safety. Runners up at Nasscom Lab 2 Market

Experience

Technical Expertise

AI & ML

  • Multimodal Generative AI
  • LipSync & Speech Technologies
  • Computer Vision & LLMs

MLOps

  • AWS, Azure, Google Cloud
  • Docker
  • FastAPI & Flask

Frameworks

  • PyTorch
  • TensorFlow
  • Hugging Face

Publications

HindiOCR-VLM : Adapting Vision-Language Models for OCR in Indian Languages

ICDAR 2025

The first Single Stage Multi Domain OCR for Indian languages, showing better results than all Industry grade OCRs like that of Google, AWS, Azure and open souce SOTA models. Our commitment to open source remains and the model and codes have been made open source.

ReSenseNet: Ensemble Early Fusion Deep Learning Architecture for Multimodal Sentiment Analysis

IHCI 2021

Explores Multimodal sentiment analysis using a novel ensemble early fusion deep learning architecture. Opens the door for targetted sentiment analysis in constrained environments.

Speech@SCIS: Annotated Indian Video Dataset

SCI 2021

With the advent of AI based content creation, clean and annotated datasets on Indian languages are necessary. This work proposes a dataset of balanced make and female speakers for Indian languages.

Recent Updates

June 2025

Paper Accepted at ICDAR 2025

HindiOCR-VLM : Adapting Vision-Language Models for OCR in Indian Languages

April 2025

Paper Acceptd at DG-EBF@CVPR 2025

CLIP based domain adaptation via Residual Hypernetworks

August 2024

Started MS by Research at IIIT Hyderabad

Began my research journey at IIIT Hyderabad, focusing on multimodal AI and speech technologies under the guidance of Professor CV Jawahar.

February 2024

Joined Sync Labs (YC W24)

Excited to join Sync Labs as a Research Engineer, working on cutting-edge lip-sync and facial animation technologies.