Bengaluru, India · Senior AI Architect

Shubham
Baid

|

Taking video and multimodal AI from research to production.
Vision-Language Models · Edge Streaming · Inference Optimization

0+
Years Exp.
0+
Cameras Deployed
0%
Bitrate Cut
2pub
1 Patent App
Shubham Baid
shubhambaid99@gmail.com
scroll
PyTorch· TensorFlow· OpenCV· TensorRT· CUDA· Python· C++· YOLO· InternVL· Qwen-VL· LLaVA· CrewAI· GStreamer· DeepStream· OpenVINO· Nvidia Jetson· ARM / NEON· RAG· Hugging Face· Docker· PyTorch· TensorFlow· OpenCV· TensorRT· CUDA· Python· C++· YOLO· InternVL· Qwen-VL· LLaVA· CrewAI· GStreamer· DeepStream· OpenVINO· Nvidia Jetson· ARM / NEON· RAG· Hugging Face· Docker·

AI architect with 5+ years taking video and multimodal AI from research to production. Shipped deep learning systems to thousands of cameras across schools, banks, warehouses, and moving metro trains.

Founding team member at BLUE, where I built perceptual video compression for machine consumers (detectors, VLMs) achieving a 40% median bitrate reduction (up to 69%) over H.265 with under 1% impact on downstream detection accuracy.

Deep, hands-on stack: Vision-Language Models (LLaVA, Qwen-VL, InternVL), TensorRT and CUDA inference optimization, and GStreamer/RTSP streaming infrastructure on edge and GPU hardware. Track winner at HackZurich and NASA Space Apps; 2 publications and 1 patent application.

Vision-Language Models TensorRT & CUDA GStreamer / RTSP Edge Inference PyTorch Model Quantization Python / C++ Multi-Agent Systems
Quick Stats
Experience5+ years
Cameras DeployedThousands
Publications2
Patents1 App
Hackathon wins2 tracks
LocationBengaluru, India
BLUE
myblue.ai · Bengaluru
Apr 2025
Present
Senior AI Architect, Founding Team
  • Built perceptual video compression optimized for downstream AI (object detection, VLM inference) rather than human viewing: 40% median and up to 69% bitrate reduction beyond H.265 on a public surveillance benchmark, with detection precision/recall deltas under 1% and VLM output deviation under 0.1%.
  • Architected the end-to-end video pipeline (RTSP ingest, encode, stream, decode, VLM forward pass) with CPU/GPU fallback; running in production across offices, banks, warehouses, and moving metro trains while preserving source resolution and frame rate.
  • Benchmarked LLaVA, Qwen-VL, and InternVL on throughput-per-watt, latency, and task accuracy to drive model selection for compute-constrained edge deployments.
  • Built vision analytics on live CCTV for quick service restaurants (QSR): VLM-driven order validation verifying tray items against POS orders in real time, with structured outputs and fuzzy matching to handle menu variants.
Perceptual CompressionVLMsEdge StreamingLLaVAQwen-VL
CAFU
Dubai, UAE
Jan 2025
Apr 2025
AI Engineer
  • Optimized the ML-based ETA prediction system with real-time analytics, cutting delayed fulfillment from 13% to 8% over 3 months and reducing SLA breaches over 5 minutes by 42%.
  • Engineered an agentic LLM workflow for marketing content generation, lifting CTR by 15% and cutting copywriting turnaround time by 50%.
  • Designed autonomous B2B lead acquisition with multi-modal LLM agents, processing 1,400 prospects per month and reducing manual prospecting hours by 90%.
Agentic AILLMsML OptimizationPython
Avathon
Formerly SparkCognition
May 2022
Jan 2025
Senior AI Engineer
  • Led AI for the VAIA School Safety Suite: real-time video understanding across 100+ cameras in live, safety-critical US school deployments; owned model deployment, pipeline reliability, and inference optimization.
  • Trained and shipped detection and classification models serving thousands of production cameras across enterprise and industrial sites over my tenure.
  • Built a gamified active-learning annotation tool that materially accelerated dataset curation and continuous retraining loops for production CV models.
  • Integrated Vision-Language Models into classical CV pipelines for complex scene understanding beyond fixed-class detection.
  • Cut inference CPU usage by 50% at sustained real-time throughput via ARM/NEON-optimized pipelines with INT8 quantization.
Computer VisionTensorRTNEONMultimodal LLMsActive Learning
Integration Wizards
Acquired by SparkCognition
Sep 2020
May 2022
Senior AI Engineer
  • Built a production ALPR system (YOLO detection plus OCR) with hybrid CPU/GPU inference, optimized for real-time video streams on a minimal resource footprint.
  • Migrated legacy CV models to NVIDIA DeepStream and TensorRT GPU pipelines, establishing the team's production video inference architecture.
  • Delivered PPE detection, fall-arrester detection, and vehicle classification models end to end, from dataset curation and training through real-world deployment; this core CV technology was central to the company's acquisition.
ALPRDeepStreamTensorRTYOLOEdge AI

Multimodal & Deep Learning

PyTorch, YOLO family95%
VLMs (LLaVA, Qwen-VL, InternVL)90%
Multi-agent (CrewAI, LangGraph) & RAG85%

Video & Streaming

GStreamer & NVIDIA DeepStream92%
RTSP/SRT, FFmpeg, HW encode/decode88%
OpenCV & Optical Flow90%

Inference & Edge

TensorRT & CUDA95%
Quantization (INT8/FP16)88%
NVIDIA Jetson, ARM/NEON90%

Languages & Infrastructure

Python95%
C++80%
Docker, Linux, FastAPI, Vector DBs90%
Production

VAIA School Safety Suite

Led development of multi-camera safety system with firearm detection and person re-ID. Deployed across 100+ cameras in US schools in real-time.

Object DetectionPerson Re-IDMulti-camera
HackZurich 2021 🏆

YetiCoach

Real-time ski coaching from action camera footage only — analyzes ski angles, gap, and technique. Won Sunrise GMBH, Huawei & Swisski track.

Computer VisionSports TechPython
NASA 2021 🏆

Aegir — Ocean Debris AI

CV & deep learning platform on satellite/UAV imagery to locate, classify, and predict ocean debris trajectory for cleanup coordination.

Deep LearningSatelliteEdge AI

Argus

Autonomous drone system generating critical disaster-zone data: locality density, population estimates, depth maps for responder coordination.

Deep LearningUAVDisaster Tech

AutoSnotBot

Drone with custom YOLO model detecting whales from aerial footage and autonomously collecting snot samples for algal bloom prediction research.

YOLOUAVDeepStream

Hailey — AI Writing Assistant

AI writing assistant using Hugging Face Transformers + GPT-2 for contextual text generation and sentence completion.

TransformersGPT-2NLP
Production

Automatic License Plate Recognition

SOTA 2021 ALPR module with minimal resource footprint, hybrid CPU/GPU inference paths, real-time deployment-ready.

OCRTensorRTGPU Inference

Cap for Blind

Arduino-based wearable with ultrasonic sensors providing haptic feedback to assist visually impaired individuals in navigation.

ArduinoIoTAssistive Tech
01
JAZ, Vol 44, 2023 · Submitted Aug 2020

Text Generation Tool for Writing Assistance using Transformer

Research on transformer-based models for writing assistance.

NLPTransformersGenerative AI
02
TEST Engineering and Management · May 2020

Detection of Different Degrees of Skin Burn using YOLOv3

Application of YOLOv3 object detection for classifying severity of skin burns.

Computer VisionYOLOv3Medical AI
Patent Application (India)

System and Method for Text Generation Tool for Writing Assistance Using Transformer

Patent application associated with the transformer writing assistance system.

Patent Application
Education
B.Tech, Computer Science REVA University, Bengaluru · 2017 – 2021 Best Outgoing Student. Founded GDSC REVA (Google Developer Student Club).
Certifications
NVIDIA Jetson AI Specialist
Intel Edge AI Specialist
TensorFlow Developer Certificate (Google)
Awards & Recognition
🏆
HackZurich 2021, Track WinnerEurope's largest hackathon: YetiCoach, a real-time sports computer vision system built for Sunrise, Huawei, and Swiss-Ski; recognized by the Swiss-Ski federation.
🚀
NASA Space Apps 2021, Track WinnerProject Aegir, satellite and UAV computer vision for ocean debris detection.

I'm interested in senior AI/ML engineering roles, technical collaborations, and research that pushes what's possible at the intersection of vision and language. If you have something worth discussing — reach out.

Response time: usually within 24 hours.
Based in Bengaluru, India — open to remote-first roles globally.

Send a Message Download CV