Computer Vision and Digital Twin

We focus our Generative Computer Vision research on advancing algorithms that create and interpret images and videos with exceptional realism and precision. These innovations empower next-generation applications in security, healthcare, and creative industries, driving smarter and more immersive visual experiences.

Research Areas:

Vision foundation model

Research on large-scale visual models that learn general-purpose image representations, enabling tasks like classification, detection, and transfer learning across diverse vision applications.

Digital twin and 3D reconstruction

Researching AI methods to build accurate virtual replicas of physical systems through 3D modeling, using multi-modal sensor data from street-view, drones, and satellites to achieve city-scale 3D reconstruction for smart city and infrastructure applications.

Vision Language Models

Investigating AI models that integrate visual and textual information, enabling research in image captioning, visual question answering, and multimodal reasoning.

Vision Reasoning models

Research on AI models capable of logical, causal, and spatial reasoning over visual data, supporting tasks such as scene understanding, visual commonsense, and decision-making.

Video Large Language Models

Studying models that understand and generate content from video data, combining temporal reasoning, multimodal understanding, and predictive modeling for dynamic environments.

Published Research Papers

1. PMODE: Prototypical Mask based Object Dimension Estimation

2.Ego Vehicle Speed Estimation using 3D Convolution with Masked Attention

3. Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection

4. SAM-CD: Change Detection in Remote Sensing Using Segment Anything Model

5. GESCAM: Gaze Estimation Method and Dataset

6. CMFPN: Context Modeling Meets Feature Pyramid Network

Granted Patents

1. Document Verification - Method & Apparatus of Authenticating Documents Having Embedded Landmarks

2. Ego Vehicle Speed estimation

3. Multi-model attention estimation and gaze target detection

4. Shop signage Dimension Estimation