Atnafu Lambebo Tonja · Atnafu Lambebo Tonja

About

I'm a Google DeepMind Academic Fellow at University College London. Previously I was a postdoctoral researcher at MBZUAI in the UAE, working with Prof. Thamar Solorio. I hold a PhD in Computer Science from Instituto Politécnico Nacional, Mexico, where I was advised by Prof. Alexander Gelbukh and Prof. Olga Kolesnikova.

My research focuses on natural language processing for the world's under-resourced languages — building multilingual language models, evaluation benchmarks, and speech and multimodal systems that serve communities typically left out of mainstream NLP.

Research interests

01

Under-resourced languages

Bringing modern NLP to languages with little digital text — corpora, models, and benchmarks for African and Indigenous languages.

02

Multilingual LMs

Training and evaluating small and large multilingual language models that work across high- and low-resource languages.

03

Evaluation benchmarks

Culturally-aware, linguistically-honest benchmarks — so models are measured on what they actually need to do, not on convenient proxies.

04

Speech & multimodal

Speech recognition for African accents and clinical domains; vision-and-language datasets that reflect cultures outside the Western web.

Spotlight

★ Social Impact Papaer Award ACL 2026

Afri-MCQA: Multimodal Cultural Question Answering for African Languages

Atnafu Lambebo Tonja , Srija Anand, Emilio Villa-Cueva, Israel Abebe Azime, Jesujoba Alabi, Muhidin A Mohamed, Debela Desalegn Yadeta, Negasi Haile Abadi, Abigail Oppong, Nnaemeka Casmir Obiefuna, Idris Abdulmumin, Naome A Etori, Eric Peter Wairagala, Kanda Patrick Tshinu, Imanigirimbabazi Emmanuel, Gabofetswe Malema, Alham Fikri Aji, David Ifeoluwa Adelani, Thamar Solorio

A benchmark for Mutimodal cutural question answering across 16 African languages, with strong baselines and a public dataset.

★ Outstanding Paper EMNLP 2024

The Zeno's Paradox of "Low-Resource" Languages

Hellina H. Nigatu, Atnafu Lambebo Tonja, Benjamin Rosman, Thamar Solorio, Monojit Choudhury

A critique of the "low-resource" label itself — arguing that the term collapses meaningfully different language situations and obscures what actually needs fixing.

★ Best Paper · Area Chair IJCNLP-AACL 2023

MasakhaNEWS: News Topic Classification for African Languages

David I. Adelani, Marek Masiak, Israel Abebe Azime, …, Atnafu Lambebo Tonja, …, Pontus Stenetorp

A benchmark for news topic classification across 16 African languages, with strong baselines and a public dataset.

See all publications →

News

2026 2 papers accepted at ACL 2026: AfriMCQA-Multimodal Cultural Question Answering for African Languages and CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data.
2026 Joined UCL as a Google DeepMind Academic Fellow.
2025 Joined MBZUAI as a postdoctoral researcher with Prof. Thamar Solorio.
2025 1 paper accepted at NAACL 2025: ProverbEval — LLM evaluation for low-resource languages.
2024 1 paper at NeurIPS 2024 D&B: CVQA — culturally-diverse multilingual VQA benchmark.
2024 2 papers at EMNLP 2024: Zeno's Paradox of "Low-Resource" Languages ★ Outstanding Paper & Walia-LLM (Amharic).
2024 1 paper at LREC-COLING 2024: EthioLLM.
2023 2 papers at EMNLP 2023; 1 at TACL (AfriSpeech-200); Best Paper at AACL (MasakhaNEWS); 1 at INTERSPEECH (AfriNames).