Computer Vision and Digital Twin

​​We focus our Generative Computer Vision research on advancing algorithms that create and interpret images and videos with exceptional realism and precision. These innovations empower next-generation applications in security, healthcare, and creative industries, driving smarter and more immersive visual experiences.

Research Areas:​

Vision foundation model ​
Research on large-scale visual models that learn general-purpose image representations, enabling tasks like classification, detection, and transfer learning across diverse vision applications.​

Digital twin and 3D reconstruction​
Researching AI methods to build accurate virtual replicas of physical systems through 3D modeling, using multi-modal sensor data from street-view, drones, and satellites to achieve city-scale 3D reconstruction for smart city and infrastructure applications.​

Vision Language Models​
Investigating AI models that integrate visual and textual information, enabling research in image captioning, visual question answering, and multimodal reasoning.​

Vision Reasoning models​
Research on AI models capable of logical, causal, and spatial reasoning over visual data, supporting tasks such as scene understanding, visual commonsense, and decision-making.​

Video Large Language Models​
Studying models that understand and generate content from video data, combining temporal reasoning, multimodal understanding, and predictive modeling for dynamic environments.


Published Research Papers​

1. PMODE: Prototypical Mask based Object Dimension Estimation​
2.Ego Vehicle Speed Estimation using 3D Convolution with Masked Attention​
3. Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection​
4. SAM-CD: Change Detection in Remote Sensing Using Segment Anything Model​
5. GESCAM: Gaze Estimation Method and Dataset​
6. CMFPN: Context Modeling Meets Feature Pyramid Network​


Granted Patents​

1. Document Verification - Method & Apparatus of Authenticating Documents Having Embedded Landmarks 
2. Ego Vehicle Speed estimation 
3. Multi-model attention estimation and gaze target detection 
4. Shop signage Dimension Estimation 
​​​