Your AI still trains
on synthetic hallucinations

When AI learns from AI-generated data, it trains on its own mistakes. Sorus gives you something synthetic data can't — real conversations, real emotions, real accents. Collected in India. Cleaned and ready to train on.

22+
Indian languages covered
100%
Explicitly consented recordings
0%
Synthetic or staged audio
Real
Unsolicited call recordings
For Contributors

Share a spam call recording. Get paid.

We specifically collect real spam call recordings — telecallers, scammers, and unsolicited sales pitches in Indian languages. Upload one from your phone, and once we verify and clean it, the money goes straight to your UPI.

Start contributing
For AI Teams

Real call data. No staged scripts. No synthetic noise.

Most voice datasets are either scraped, read-aloud in studios, or generated by AI. Ours aren't. We have real spam calls — actual telecaller conversations in 22+ Indian languages, with all the natural stutters, code-switching, and emotional tone that synthetic data simply cannot replicate.

Talk to us
What we have

Our datasets

Live

Conversational Call Audio

Real phone call recordings across 22+ Indian languages. People talking naturally — not reading scripts. Cleaned, speaker-separated, and transcribed. Good for ASR, TTS, and anything that needs to understand how people actually speak.

22+ Languages Speaker-separated PII removed ASR / TTS
Coming Soon

Video Data

Video recordings from everyday Indian settings — facial expressions, gestures, natural environments. For vision models that need to work in South Asian contexts.

Vision AI Multimodal Labeled
Coming Soon

Text & Written Language

Native Indic script text — not just transliterations. Translations, annotations, and raw corpora across multiple languages. For LLMs that need to handle Indian languages properly.

LLM fine-tuning NLP Annotated
How it works

Straightforward on both ends

For contributors
01
Sign in with your email

Create an account with your email address — no phone number needed. We send a verification link and you're in.

02
Upload a spam call recording

MP3, M4A, WAV, MP4 — whatever you have on your phone. We specifically collect real spam and telecaller recordings, not personal conversations.

03
Get paid via UPI

We review it, remove personal info, and send the money to your UPI once approved.

For AI teams
01
Tell us what you need

Language, volume, domain, annotation style — send us the specifics and we'll see what we have.

02
We send you samples first

No commitment required upfront. Evaluate the quality via API or direct file download before deciding anything.

03
License and access

One-time purchase, recurring supply, or a custom collection run — we'll work with however your team buys data.

People choose to share

Every recording is submitted voluntarily. We never collect passively or without the contributor knowing exactly what they're agreeing to.

Personal info is stripped out

Names, phone numbers, bank details — all removed before any file is logged or stored. The cleaned version is what we keep.

DPDP Act 2023

Compliant with India's data protection law. If a contributor asks us to delete their data, we do it — no questions.

Only licensed partners see the data

We don't sell to brokers or aggregate marketplaces. Data goes directly to verified AI companies under a usage license.

For AI teams

Want to see the data?

Tell us what you're building and what you need. We'll send back a sample dataset within a business day — no NDA required upfront, no sales call first.

Prefer email? Write to us at hello@sorus.io. For data deletion requests: delete@sorus.io.
Got it. We'll reply with sample data within one business day.