Latest insights & developments from the world of Artificial Intelligence(AI).
AI Models
Indic language
LLM
natural language processing (NLP)
Six interesting Indian AI models from 2024
India's 2024 AI landscape saw six breakthrough models: BharatGen's e-vikrAI for e-commerce, Sarvam-1 supporting 10 Indian languages, NVIDIA's Nemotron-4-Mini-Hindi-4B, AI4Bharat's Chitralekha for video transcreation, Everest 1.0 covering 35 languages, and Surya OCR for document processing. These models integrate local languages and cultural context, setting global AI benchmarks.
RBI's AI initiative MuleHunter.ai: AI solution to tackle digital fraud in India
The Reserve Bank Innovation Hub (RBIH) has developed MuleHunter.AI, an AI/ML-powered tool to detect mule accounts used in financial fraud. Analysing 19 behavioural patterns across banking data, it outperforms traditional rule-based detection methods. Successfully piloted with two public sector banks, it aims for wider rollout to secure India's digital financial ecosystem.
New AI Method Combines Physics and Machine Learning to Predict Floods
MIT scientists developed the "Earth Intelligence Engine," combining a GAN-based generative AI model with a physics-based flood model to generate realistic future flood satellite imagery. Tested on Houston using Hurricane Harvey data, the physics-reinforced method reduces AI hallucinations and accurately maps flood extents pixel by pixel, helping communities visualize risks and make evacuation decisions.
India's AI-powered data centre boom - $100 billion investment forecast by 2027: CBRE
India's data center colocation market is projected to grow at 24.68% CAGR from 2023 to 2029, with investments exceeding $100 billion by 2027. Driven by AI, ML, and generative AI adoption, states like Maharashtra and Tamil Nadu lead growth. Major players including Reliance, Google, and Airtel are building AI-ready infrastructure, positioning India as a global digital hub.
IIT Madras, AI4Bharat, and Sarvam AI launch IndicVoices: A milestone in Indian speech recognition
IIT Madras, AI4Bharat, and Sarvam AI have launched IndicVoices, a 12,000-hour multilingual speech dataset covering 22 Indian languages and 208 districts. Accompanied by IndicASR, the first ASR model supporting all 22 official Indian languages, the initiative is open-sourced under CC-BY-4.0, offering a global blueprint for multilingual speech data collection and advancing inclusive AI development.
Genesis: Revolutionizing robotics and physical AI with a universal physics engine
Genesis is an open-source universal physics engine, up to 80x faster than NVIDIA Isaac Gym and MuJoCo MJX. Built entirely in Python by 20+ research labs, it simulates diverse physical phenomena, supports generative data creation from natural language prompts, and features photo-realistic rendering. It aims to democratize robotics research and accelerate embodied AI development globally.
Exploring Telecom-Specific Large Action Model TSLAM-4b
TSLAM-4B is the first LLM specifically designed for the telecommunications industry, developed by NetoAI. With 4 billion parameters, 128K token context length, and trained on 427 million telecom-specific tokens, it enables network troubleshooting, infrastructure planning, customer support automation, and regulatory compliance, setting a new benchmark for domain-specific AI in telecom.
The Polymathic AI initiative, led by University of Cambridge and collaborators, released two open-source scientific training datasets totalling 115 terabytes covering astrophysics, biology, fluid dynamics, and more. Unlike domain-specific AI, these datasets aim to train cross-disciplinary models that transfer knowledge across scientific fields, potentially uncovering patterns and accelerating discoveries no human could achieve alone.
Cosmopedia: Redefining the synthetic data landscape with the largest open dataset
Cosmopedia v0.1, hosted on HuggingFace, is the largest open synthetic dataset with 30 million samples and 25 billion tokens, generated by Mixtral 7b. It includes textbooks, blog posts, stories, and WikiHow articles across eight dataset splits. Designed to democratize AI research, it supports NLP, model training, and scalable AI development with rich metadata and diverse content.
AIRAWAT: A landmark in India’s AI supercomputing journey
India's AI supercomputer AIRAWAT, installed at C-DAC Pune, ranks No. 75 globally in the Top 500 Supercomputing List. With a peak performance of 13,170 teraflops and 200 AI petaflops, it is India's largest and fastest AI supercomputing system. Funded by MeitY, AIRAWAT supports AI research across healthcare, agriculture, NLP, defence, and education, driving India's technological self-reliance.
AI model generates realistic satellite images of future flooding
MIT scientists have developed a method combining a generative AI model with a physics-based flood model to create realistic satellite images of future flooding. Tested on Houston using Hurricane Harvey data, the physics-reinforced approach reduces AI hallucinations and accurately depicts flood extents. The tool could help residents and policymakers prepare and make evacuation decisions before storms hit.
AI Insights - MIT's EXPLINGO: AI explaining its predictions in plain language
MIT researchers have developed EXPLINGO, a system that uses LLMs to convert complex AI model explanations (SHAP values) into clear, human-readable narratives. It features two components — NARRATOR for generating text and GRADER for quality evaluation. Tested across nine datasets, EXPLINGO aims to improve AI transparency and accessibility for non-technical users across healthcare, finance, and more.