Under-resourced languages
Bringing modern NLP to languages with little digital text — corpora, models, and benchmarks for African and Indigenous languages.
I'm a Google DeepMind Academic Fellow at University College London. Previously I was a postdoctoral researcher at MBZUAI in the UAE, working with Prof. Thamar Solorio. I hold a PhD in Computer Science from Instituto Politécnico Nacional, Mexico, where I was advised by Prof. Alexander Gelbukh and Prof. Olga Kolesnikova.
My research focuses on natural language processing for the world's under-resourced languages — building multilingual language models, evaluation benchmarks, and speech and multimodal systems that serve communities typically left out of mainstream NLP.
Bringing modern NLP to languages with little digital text — corpora, models, and benchmarks for African and Indigenous languages.
Training and evaluating small and large multilingual language models that work across high- and low-resource languages.
Culturally-aware, linguistically-honest benchmarks — so models are measured on what they actually need to do, not on convenient proxies.
Speech recognition for African accents and clinical domains; vision-and-language datasets that reflect cultures outside the Western web.
A critique of the "low-resource" label itself — arguing that the term collapses meaningfully different language situations and obscures what actually needs fixing.
A benchmark for news topic classification across 16 African languages, with strong baselines and a public dataset.
See all publications →