Shubham Baid — Senior AI Architect

About

AI architect with 5+ years taking video and multimodal AI from research to production. Shipped deep learning systems to thousands of cameras across schools, banks, warehouses, and moving metro trains.

Founding team member at BLUE, where I built perceptual video compression for machine consumers (detectors, VLMs) achieving a 40% median bitrate reduction (up to 69%) over H.265 with under 1% impact on downstream detection accuracy.

Deep, hands-on stack: Vision-Language Models (LLaVA, Qwen-VL, InternVL), TensorRT and CUDA inference optimization, and GStreamer/RTSP streaming infrastructure on edge and GPU hardware. Track winner at HackZurich and NASA Space Apps; 2 publications and 1 patent application.

Quick Stats

Experience5+ years

Cameras DeployedThousands

Publications2

Patents1 App

Hackathon wins2 tracks

LocationBengaluru, India

Links

LinkedIn GitHub Twitter Medium Email

Experience

BLUE

myblue.ai · Bengaluru

Apr 2025
Present

Senior AI Architect, Founding Team

Built perceptual video compression optimized for downstream AI (object detection, VLM inference) rather than human viewing: 40% median and up to 69% bitrate reduction beyond H.265 on a public surveillance benchmark, with detection precision/recall deltas under 1% and VLM output deviation under 0.1%.
Architected the end-to-end video pipeline (RTSP ingest, encode, stream, decode, VLM forward pass) with CPU/GPU fallback; running in production across offices, banks, warehouses, and moving metro trains while preserving source resolution and frame rate.
Benchmarked LLaVA, Qwen-VL, and InternVL on throughput-per-watt, latency, and task accuracy to drive model selection for compute-constrained edge deployments.
Built vision analytics on live CCTV for quick service restaurants (QSR): VLM-driven order validation verifying tray items against POS orders in real time, with structured outputs and fuzzy matching to handle menu variants.

Perceptual CompressionVLMsEdge StreamingLLaVAQwen-VL

CAFU

Dubai, UAE

Jan 2025
Apr 2025

AI Engineer

Optimized the ML-based ETA prediction system with real-time analytics, cutting delayed fulfillment from 13% to 8% over 3 months and reducing SLA breaches over 5 minutes by 42%.
Engineered an agentic LLM workflow for marketing content generation, lifting CTR by 15% and cutting copywriting turnaround time by 50%.
Designed autonomous B2B lead acquisition with multi-modal LLM agents, processing 1,400 prospects per month and reducing manual prospecting hours by 90%.

Agentic AILLMsML OptimizationPython

Avathon

Formerly SparkCognition

May 2022
Jan 2025

Senior AI Engineer

Led AI for the VAIA School Safety Suite: real-time video understanding across 100+ cameras in live, safety-critical US school deployments; owned model deployment, pipeline reliability, and inference optimization.
Trained and shipped detection and classification models serving thousands of production cameras across enterprise and industrial sites over my tenure.
Built a gamified active-learning annotation tool that materially accelerated dataset curation and continuous retraining loops for production CV models.
Integrated Vision-Language Models into classical CV pipelines for complex scene understanding beyond fixed-class detection.
Cut inference CPU usage by 50% at sustained real-time throughput via ARM/NEON-optimized pipelines with INT8 quantization.

Computer VisionTensorRTNEONMultimodal LLMsActive Learning

Integration Wizards

Acquired by SparkCognition

Sep 2020
May 2022

Senior AI Engineer

Built a production ALPR system (YOLO detection plus OCR) with hybrid CPU/GPU inference, optimized for real-time video streams on a minimal resource footprint.
Migrated legacy CV models to NVIDIA DeepStream and TensorRT GPU pipelines, establishing the team's production video inference architecture.
Delivered PPE detection, fall-arrester detection, and vehicle classification models end to end, from dataset curation and training through real-world deployment; this core CV technology was central to the company's acquisition.

ALPRDeepStreamTensorRTYOLOEdge AI

Skills

Multimodal & Deep Learning

PyTorch, YOLO family95%

VLMs (LLaVA, Qwen-VL, InternVL)90%

Multi-agent (CrewAI, LangGraph) & RAG85%

Video & Streaming

GStreamer & NVIDIA DeepStream92%

RTSP/SRT, FFmpeg, HW encode/decode88%

OpenCV & Optical Flow90%

Inference & Edge

TensorRT & CUDA95%

Quantization (INT8/FP16)88%

NVIDIA Jetson, ARM/NEON90%

Languages & Infrastructure

Python95%

C++80%

Docker, Linux, FastAPI, Vector DBs90%

Projects

Production

VAIA School Safety Suite

Led development of multi-camera safety system with firearm detection and person re-ID. Deployed across 100+ cameras in US schools in real-time.

Object DetectionPerson Re-IDMulti-camera

HackZurich 2021 🏆

YetiCoach

Real-time ski coaching from action camera footage only — analyzes ski angles, gap, and technique. Won Sunrise GMBH, Huawei & Swisski track.

Computer VisionSports TechPython

NASA 2021 🏆

Aegir — Ocean Debris AI

CV & deep learning platform on satellite/UAV imagery to locate, classify, and predict ocean debris trajectory for cleanup coordination.

Deep LearningSatelliteEdge AI

Argus

Autonomous drone system generating critical disaster-zone data: locality density, population estimates, depth maps for responder coordination.

Deep LearningUAVDisaster Tech

AutoSnotBot

Drone with custom YOLO model detecting whales from aerial footage and autonomously collecting snot samples for algal bloom prediction research.

YOLOUAVDeepStream

Hailey — AI Writing Assistant

AI writing assistant using Hugging Face Transformers + GPT-2 for contextual text generation and sentence completion.

TransformersGPT-2NLP

Production

Automatic License Plate Recognition

SOTA 2021 ALPR module with minimal resource footprint, hybrid CPU/GPU inference paths, real-time deployment-ready.

OCRTensorRTGPU Inference

Cap for Blind

Arduino-based wearable with ultrasonic sensors providing haptic feedback to assist visually impaired individuals in navigation.

ArduinoIoTAssistive Tech

Publications & Patents

JAZ, Vol 44, 2023 · Submitted Aug 2020

Text Generation Tool for Writing Assistance using Transformer

Research on transformer-based models for writing assistance.

NLPTransformersGenerative AI

TEST Engineering and Management · May 2020

Detection of Different Degrees of Skin Burn using YOLOv3

Application of YOLOv3 object detection for classifying severity of skin burns.

Computer VisionYOLOv3Medical AI

Patent Application (India)

System and Method for Text Generation Tool for Writing Assistance Using Transformer

Patent application associated with the transformer writing assistance system.

Patent Application

Education & Recognition

Education

B.Tech, Computer Science REVA University, Bengaluru · 2017 – 2021 Best Outgoing Student. Founded GDSC REVA (Google Developer Student Club).

Certifications

NVIDIA Jetson AI Specialist

Intel Edge AI Specialist

TensorFlow Developer Certificate (Google)

Awards & Recognition

🏆

HackZurich 2021, Track WinnerEurope's largest hackathon: YetiCoach, a real-time sports computer vision system built for Sunrise, Huawei, and Swiss-Ski; recognized by the Swiss-Ski federation.

🚀

NASA Space Apps 2021, Track WinnerProject Aegir, satellite and UAV computer vision for ocean debris detection.

ShubhamBaid