When AI learns from AI-generated data, it trains on its own mistakes. Sorus gives you something synthetic data can't — real conversations, real emotions, real accents. Collected in India. Cleaned and ready to train on.
We specifically collect real spam call recordings — telecallers, scammers, and unsolicited sales pitches in Indian languages. Upload one from your phone, and once we verify and clean it, the money goes straight to your UPI.
Most voice datasets are either scraped, read-aloud in studios, or generated by AI. Ours aren't. We have real spam calls — actual telecaller conversations in 22+ Indian languages, with all the natural stutters, code-switching, and emotional tone that synthetic data simply cannot replicate.
Real phone call recordings across 22+ Indian languages. People talking naturally — not reading scripts. Cleaned, speaker-separated, and transcribed. Good for ASR, TTS, and anything that needs to understand how people actually speak.
Video recordings from everyday Indian settings — facial expressions, gestures, natural environments. For vision models that need to work in South Asian contexts.
Native Indic script text — not just transliterations. Translations, annotations, and raw corpora across multiple languages. For LLMs that need to handle Indian languages properly.
Create an account with your email address — no phone number needed. We send a verification link and you're in.
MP3, M4A, WAV, MP4 — whatever you have on your phone. We specifically collect real spam and telecaller recordings, not personal conversations.
We review it, remove personal info, and send the money to your UPI once approved.
Language, volume, domain, annotation style — send us the specifics and we'll see what we have.
No commitment required upfront. Evaluate the quality via API or direct file download before deciding anything.
One-time purchase, recurring supply, or a custom collection run — we'll work with however your team buys data.
Every recording is submitted voluntarily. We never collect passively or without the contributor knowing exactly what they're agreeing to.
Names, phone numbers, bank details — all removed before any file is logged or stored. The cleaned version is what we keep.
Compliant with India's data protection law. If a contributor asks us to delete their data, we do it — no questions.
We don't sell to brokers or aggregate marketplaces. Data goes directly to verified AI companies under a usage license.
Tell us what you're building and what you need. We'll send back a sample dataset within a business day — no NDA required upfront, no sales call first.