Applied ML Engineer & Researcher

Tanveer Hossain Munim

Applied ML Engineer · GeoAI Researcher · Incoming Erasmus Mundus MSc, Lund (2026)

I ship production LLM and VLM systems under real constraints, e.g., 7B models on 4GB GPUs, multi-stage VLM cascades with conformal prediction, and full fine-tuning via frontier-model distillation. Sole architect of country-scale platforms: a UN GCF-funded national meteorological system (Timor-Leste) and CAP v1.2 emergency alert infrastructure across two countries.

Who I Am

About Me

I am an applied ML engineer and researcher with a track record of shipping production systems at a national scale. My work sits at the intersection of machine learning, geospatial science, and infrastructure, from fine-tuning vision-language models on 396,802-image government corpora to architecting multi-server data pipelines processing 1TB+/day across five NWP models and two satellites.

As Senior Software Engineer at RIMES (July 2024–Aug 2026), I was the sole architect and engineer of the GCF Timor-Leste CDIS, the country's national meteorological platform, which is part of UNEP's USD 21.7M Green Climate Fund project. I also designed CAP v1.2 emergency alert infrastructure now deployed in Timor-Leste and Bangladesh (via Grameenphone for population-scale alerting).

My current research formalizes Sparse-Critical Dense Regression (SCDR) as a new ML problem class, with impossibility theorems and a dual-decoder solution (DART) validated across atmospheric ML, crowd density, and radar nowcasting. First-author preprint on arXiv; targeting ICLR 2027.

I am the Founder, CEO & Lead Architect of MIRA AI (NVIDIA Inception 2025) and a Senior Research Engineer at AI GeoLAB Ltd. I am an incoming Erasmus Mundus Geoinformatics & AI MSc student at Lund University (autumn 2026, fully funded). I hold a B.Sc. in CSE from BUET and have been awarded the Bangladesh National Science and Technology Fellowship (2026), the NVIDIA Inception Program (2025), and the Accelerating Asia prize (top 9 of 500+ global startups, $100K, 2023).

Research Interests

Sparse-Critical Dense Regression Vision-Language Models Hybrid RAG & Retrieval Geospatial ML Atmospheric & Climate ML Document AI & OCR LLM Serving & Inference
Tanveer Hossain Munim

Contact

Academic Work

Research & Publications

Peer-reviewed papers, conference proceedings, and preprints.

Selected Work

Projects & Case Studies

Technical projects, open-source tools, and applied research.

SCDR / DART — Sparse-Critical Dense Regression

First-author ML theory + multi-domain empirical research formalizing SCDR as a new problem class. Proves no single-decoder loss can simultaneously achieve high CSI and Bias ≈ 1 (Theorem 1). DART (dual-decoder + gradient isolation) delivers 32–74% bias reduction at statistically equivalent CSI on Himawari atmospheric data (n=5 seeds, |t| up to 23.3); +0.049 CSI@10 on ShanghaiTech crowd density; beats pretrained RainNet on its own DWD radar data with ~7.6× less compute. Targeting ICLR 2027.

Khatian VLM Cascade — Bengali Land-Record Extraction

Production VLM system for extracting JL and plot numbers from Bangladesh government land-record scans at scale. Fine-tuned Qwen3-VL-2B/8B with sibling lookup + symbolic verifier + conformal prediction. Achieved 98.9% JL precision @ 94% auto-accept and 96.7% plot precision @ 92% auto-accept at 1.83s/image on a 396,802-image corpus across 6 regions. Targeting ICDAR 2027.

GCF CDIS — Timor-Leste National Meteorological Platform

Sole architect & engineer of the country's national meteorological platform serving DNMG, part of UNEP's USD 21.7M Green Climate Fund project. Live production at 97.64% task / 96.20% DAG success rate over 1.4+ years, processing 1TB+/day across 5 NWP models (GFS, ECMWF-IFS/AIFS, ICON, UK Met) and 2 satellites (Himawari-9, GK2A). Key optimizations: NetCDF reads 30s → 80ms (375×); observation queries 4s → 800ms (5×).

CAP v1.2 Multi-Country Emergency Alert Infrastructure

Sole architect of full CAP v1.2 stack: pycap-validator (sole-author Python package for OASIS schema enforcement + digital signature verification), Django RSS backend, MQTT pub/sub, and n8n multi-channel dissemination (email, Facebook, Twitter/X). Deployed in production at DNMG Timor-Leste and Bangladesh Meteorological Department via Grameenphone for population-scale automated alerting.

NLAS — Multilingual Livestock Advisory RAG System

Production hybrid RAG chatbot for Bangladesh Dept. of Livestock Services. Qwen2.5-7B Q4_K_M on a 4GB GPU; BGE-M3 dense + BM25 sparse + RRF fusion + bge-reranker-v2-m3 cross-encoder; Redis-backed sliding-window session memory; Bangla/English/Banglish transliteration-before-retrieval. Achieved 71% beta satisfaction from production rating API.

MIRA AI — Conversational Sales Agent

Multi-channel conversational sales AI on LangGraph + FastAPI + RAG over 10K+ products. Designed for 100K end-users; currently serving 4 paying enterprise customers. NVIDIA Inception Program (2025). On course for Bangladesh Innovation Grant.

wis2downloader — WMO Open-Source Contribution

Refactored the World Meteorological Organization's wis2downloader from a monolithic architecture to a distributed Celery + async + spatial-filtering pipeline. Reduced data ingestion latency from ~4 hours to 130ms. Merged upstream; adopted by Timor-Leste's national meteorological agency and other NMHSs.

Technical Expertise

Skills & Tools

LLMs & Generation

  • Qwen3-VL Fine-tuning (Full FT + QLoRA)
  • Frontier Model Distillation
  • vLLM / Ollama / llama.cpp
  • Q4_K_M Quantization
  • KV Cache Management
  • Streaming SSE Inference

RAG & Retrieval

  • BGE-M3 Dense Retrieval
  • BM25 Sparse Retrieval
  • RRF Fusion
  • Cross-Encoder Reranking
  • ChromaDB / Vector Stores
  • Session Memory Design

Calibration & Evaluation

  • Conformal Prediction
  • Multi-seed Protocols
  • Paired t-tests
  • Ablation Design
  • Capacity / Floor Probes

ML Frameworks

  • PyTorch
  • HuggingFace Transformers
  • LangGraph / LangChain
  • PyMC (Bayesian ML)
  • DeepSpeed

ML Infrastructure

  • Apache Airflow (multi-server)
  • Celery
  • Redis
  • Docker
  • Kubernetes (RKE2 bare-metal)
  • HPC / NVIDIA GH200

Backend & Serving

  • Django / DRF
  • FastAPI
  • uvicorn + async SSE
  • PostgreSQL / PostGIS
  • MQTT

Languages

  • Python
  • SQL
  • JavaScript / TS
  • Java
  • C++

Geospatial

  • xarray
  • GeoPandas / GDAL
  • Rasterio / NetCDF
  • Cloud-Optimised GeoTIFF
  • Martin / TiTiler
  • Deck.gl

Background

Education & Experience

Education

  • 2018 – 2023

    B.Sc. in Computer Science and Engineering

    Bangladesh University of Engineering and Technology (BUET)

    GRE 329 (Q170, V159) · IELTS 8.5

  • 2026 – 2028

    Erasmus Mundus MSc — Geoinformatics & AI (GEM)

    Lund University → ITC Twente → thesis (Track 3: Geospatial Developer)

    Fully funded (€1,050/month × 24 months). Awarded Bangladesh National Science and Technology Fellowship (2026) covering tuition and living expenses. Incoming autumn 2026.

Experience & Service

  • April 2026 – Present

    Senior Research Engineer

    AI GeoLAB Ltd

    Designed and shipped the Khatian VLM cascade for Bengali land-record extraction at Bangladesh government scale: fine-tuned Qwen3-VL-2B/8B with sibling lookup + symbolic verifier + conformal prediction, achieving 98.9% JL precision @ 94% auto-accept on 396,802 images across 6 regions. Targeting ICDAR 2027.

  • July 2024 – August 2026

    Senior Software Engineer

    Regional Integrated Multi-Hazard Early Warning System (RIMES)

    Sole architect & engineer of GCF Timor-Leste CDIS — the country's national meteorological platform (UNEP, USD 21.7M GCF project). Live production at 97.64% task / 96.20% DAG success rate over 1.4+ years, 1TB+/day, 5 NWP models, 2 satellites, ~8K task executions/day across a 3-node Airflow cluster. Built NLAS (71% beta satisfaction), Cirrus AI (full fine-tune via frontier distillation), and Heatshield (ICDDR,B). Sole architect of CAP v1.2 infrastructure deployed in Timor-Leste and Bangladesh (Grameenphone). Open-source: wis2downloader refactor (4hr → 130ms, merged upstream, WMO).

  • June 2021 – August 2024

    Tech Lead (prev. Backend Developer)

    Interactive Cares

    Scaled platform 13× (8K → 100K users) and 8× DAU. Designed geo-distributed CDN, load balancer, and API consolidation. Co-led Accelerating Asia victory (Top 9 of 500+ global startups, $100K investment, 2023). Grew engineering team 2 → 8; introduced CI/CD cutting deployment time 60%.

  • September 2023 – January 2024

    Software Engineer

    Shipday, Inc. (Remote)

    Integrated AI assistant chatbot into production platform. Built internal analytics dashboard improving marketing campaign ROI by 25%.

  • September 2022 – August 2023

    Data Engineer (Part-time)

    Survey of Bangladesh, Ministry of Defence — NSDI Project

    Engineered geospatial automation pipeline reducing manual processing by 80% (saved 2,400+ person-hours/year). Built validation system processing 50GB+/month for 43 government departments.

Let's Connect

Get in Touch

Whether you're a hiring manager, research collaborator, or prospective PhD supervisor — I'd love to hear from you.

Connect Online

Open To

Research Internships (2027) ML Research Collaborations Senior ML Engineering Roles Consulting & Advisory PhD Supervision (Post-MSc)