All our datasets have been made available under CC BY-NC-SA 4.0. These could be freely used only by educational and research institutions for research and teaching. 

Any commercial organisation (even their research labs) will need to buy the data under a separate, commercial license for use in commercial products or research.

Speech Datasets including recordings, inter-linear glossing (morphologically analysed text, glossed into English at morph level) and translations into English. Available for any Indo-Aryan language including languages like Awadhi, Bhojpuri, Magahi, Bhojpuri, etc (see SpeeD-IL website for the languages we are currently working on).

A disaggregated dataset of over 60,000 comments from different social media platforms in four languages - Meitei (Manipuri), Bangla, Hindi and English - annotated with multiple levels of aggression and bias (viz gender, caste, religion, nationality and ethnicity incl race).

Aggression in Speech - Hindi and English

Politeness in Text - Hindi and English

Propaganda Dataset - Hindi