Taking video and multimodal AI from research to production.
Vision-Language Models · Edge Streaming · Inference Optimization
AI architect with 5+ years taking video and multimodal AI from research to production. Shipped deep learning systems to thousands of cameras across schools, banks, warehouses, and moving metro trains.
Founding team member at BLUE, where I built perceptual video compression for machine consumers (detectors, VLMs) achieving a 40% median bitrate reduction (up to 69%) over H.265 with under 1% impact on downstream detection accuracy.
Deep, hands-on stack: Vision-Language Models (LLaVA, Qwen-VL, InternVL), TensorRT and CUDA inference optimization, and GStreamer/RTSP streaming infrastructure on edge and GPU hardware. Track winner at HackZurich and NASA Space Apps; 2 publications and 1 patent application.
Led development of multi-camera safety system with firearm detection and person re-ID. Deployed across 100+ cameras in US schools in real-time.
Real-time ski coaching from action camera footage only — analyzes ski angles, gap, and technique. Won Sunrise GMBH, Huawei & Swisski track.
CV & deep learning platform on satellite/UAV imagery to locate, classify, and predict ocean debris trajectory for cleanup coordination.
Autonomous drone system generating critical disaster-zone data: locality density, population estimates, depth maps for responder coordination.
Drone with custom YOLO model detecting whales from aerial footage and autonomously collecting snot samples for algal bloom prediction research.
AI writing assistant using Hugging Face Transformers + GPT-2 for contextual text generation and sentence completion.
SOTA 2021 ALPR module with minimal resource footprint, hybrid CPU/GPU inference paths, real-time deployment-ready.
Arduino-based wearable with ultrasonic sensors providing haptic feedback to assist visually impaired individuals in navigation.
Research on transformer-based models for writing assistance.
Application of YOLOv3 object detection for classifying severity of skin burns.
Patent application associated with the transformer writing assistance system.
I'm interested in senior AI/ML engineering roles, technical collaborations, and research that pushes what's possible at the intersection of vision and language. If you have something worth discussing — reach out.
Response time: usually within 24 hours.
Based
in Bengaluru, India — open to remote-first roles
globally.