K Sai Sri Teja

MS in EE (CV-AI)

Deep Learning Engineer - II

Research-oriented and versatile Deep Learning Engineer specializing in Computer Vision and AI. Proficient in building end-to-end deep learning pipelines, model optimization, real-time deployment, and image/video generation tasks. Experienced in the software development life cycle (SDLC) with exposure to Agile methodologies. Passionate about research-driven innovation, working efficiently both independently and collaboratively in dynamic environments.

Deep Learning Frameworks and Libraries

Experienced with major deep learning and machine learning frameworks including PyTorch, TensorFlow, Keras, Scikit-learn, and WandB, alongside scientific computing libraries like Scipy, Numpy, OpenCV, and Pandas for building, training, and evaluating models.

Developer Tools

Proficient in using VSCode, Jupyter Notebook, Google Colab, and version control systems like GitHub, GitLab for code development, collaboration, and experimentation tracking.

Platforms and Technologies

Linux (Ubuntu)AWS Cloud ServicesDockerROS (Robot Operating System)PybulletNVIDIA DeepStream

Programming Languages

Strong programming expertise in Python, with experience in building production-grade deep learning pipelines, automation scripts, and backend services.

ML Engineer — Image Generation
Feb 2025 —

Eros GenAI is building Large Cultural Models — generative AI systems trained on Indian cultural context — spanning text-to-image and text-to-video (world generation) at scale.

ML Engineer — Image Generation Mar 2025 — Present

Core contributor to the Image Generation pillar of Large Cultural Models (LCMs). Owned the full pipeline from Indic dataset curation and captioning to pre-training, fine-tuning, and architectural research on large-scale diffusion models.

  • Dataset Pipeline: Web-scraped Indic visual content using Playwright/Selenium from stock imagery; processed metadata and imagery for 12,000+ movies to build culturally grounded datasets.
  • Built automated curation pipeline with a neural network–based image quality checker, actor/object auto-tagging, and deduplication using NVIDIA NeMo and Ray.
  • Evaluated VLMs (Qwen 2.5-VL, Gemma) for captioning quality; designed in-house indicness scoring metrics to measure cultural relevance of generated prompts and captions.
  • Sole owner of the prompt generation system — prompt engineering to ensure diverse, culturally aware coverage across a wide range of users and scenarios.
  • Pre-trained and fine-tuned large-scale diffusion models (PixArt, SANA, Qwen Image, Qwen Image Edit, Flux 2.0, Lumina) using PEFT methods: DreamBooth, IP-Adapter, and LoRA.
  • Explored DiT architectural modifications (MaskDiT and variants) and alternative text encoders to improve Indic language and cultural alignment.
  • Project RefCompose: Developed a multi-reference image generation system that conditions on multiple input images to compose coherent output generations.

ML Engineer — World Models Feb 2025 — Present

Contributor to the World Generation (video) pillar of Large Cultural Models — building temporally consistent video generation from unstructured real-world footage.

  • Built camera pose estimation pipelines from unstructured videos to enable structured 3D-aware training data for world models.
  • Implemented person tracking and re-identification (ReID) pipelines for large-scale video dataset curation.
  • Contributed to video model training for world generation, targeting temporally consistent and culturally grounded scene synthesis.
Deep Learning Engineer - II
Jul 2022 — Jan 2025

Leading provider of AI-driven solutions for industrial automation, specializing in surveillance, inspection, and monitoring technologies.

Developed and tested deep learning models across Classification, Detection, and Segmentation tasks, achieving a system-wide accuracy of 90%. Specialized in real-time deployment, synthetic data generation, and automation tools to enhance training and operational efficiency.

  • Established and managed a DLOps pipeline using Nvidia DeepStream for scalable real-time model deployment.
  • Implemented data quantization techniques to halve model training time while maintaining accuracy, enabling 2x faster experimentation cycles.
  • Utilized diffusion models and GANs for synthetic data generation, improving model performance significantly.
  • Developed an auto annotation tool that reduced labeling time for detection and classification tasks by 75%.
  • Optimized image restoration models (NAFNET, UFormer, DiffIR) to enhance image quality and ground truth data for internal evaluations.
  • Integrated Vision-Language Models (LLama 3.2, LLava, Molmo) with LLMs to build a custom Fishbone Analysis system for Root Cause Analysis (RCA) and Correction and Protection Analysis (CPA) using multi-modal data.
Project Associate
Feb 2021 — Jun 2022

Research and innovation center at IIT Madras focused on advanced healthcare solutions leveraging robotics, AI, and medical technology.

Focused on robotics applications for medical surgery, including path planning, kinematics, and real-time system integration with industrial and collaborative robots.

  • Developed a complete software package in Python for robot-assisted needle-based Spine Surgery including kinematics, path planning, singularity detection, and collision avoidance.
  • Worked with KUKA industrial robots and Han’s Elfin collaborative robots, integrating real-time systems for medical applications.
  • Gained experience with ROS (Robot Operating System) for robotic software development and Intel RealSense cameras for depth sensing and 3D scanning.
Indian Institute of Technology, Madras
Master of Science in Electrical Engineering
2023 — 2025

Specialization in Computer Vision and Artificial Intelligence. Current CGPA: 8.00/10. Published papers in ECCVW 2024, CVPRW 2024 and ICCP 2025.

SASTRA Deemed University
Bachelor of Technology in Electronics and Communication Engineering
2017 — 2021

Graduated with a CGPA of 8.08/10.

KUKA Robotics Challenge 2021 - World Top 5 Team
KUKA • 2021

Selected as one of the top 5 teams globally (Team Aroki) in the prestigious KUKA Robotics Challenge 2021.

MIPI Flare Removal Challenge - Top 5 (CVPR 2024)
CVPR • 2024

Achieved Top 5 ranking in the MIPI Flare Removal Challenge at CVPR 2024, recognizing contributions in nighttime video flare removal research.

In my free time, I enjoy listening to music to unwind, playing guitar as a beginner, and exploring calligraphy. I also stay engaged with AI research papers, visit new places to refresh my mind, and take part in creative activities to broaden my horizons.