For my latest publications please visit google scholar page

2026

<b> Afri-MCQA: Multimodal Cultural Question Answering for African Language </b> <br>

<b>Atnafu Lambebo Tonja </b>, Srija Anand, Emilio Villa-Cueva, Israel Abebe Azime, Jesujoba Oluwadara Alabi, …, Alham Fikri Aji, David Ifeoluwa Adelani, Thamar Solorio

2025

<b>CAMMT: Benchmarking Culturally Aware Multimodal Machine Translation</b><br>

Emilio Villa-Cueva, Sholpan Bolatzhanova, Diana Turmakhan, Kareem Elzeky, …,Injy Hamed, <b>Atnafu Lambebo Tonja</b>, Thamar Solorio <i>In EMNLP 2025 </i>

<b>A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge’ez Script</b><br> Hellina Hailu Nigatu,<b> Atnafu Lambebo Tonja </b>, Henok Biadglign Ademtew, Hizkel Mitiku Alemayehu, Negasi Haile Abadi, Tadesse Destaw Belay, Seid Muhie Yimam <i>In EMNLP 2025 </i>

<b>ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding</b><br> Israel Abebe Azime,<b> Atnafu Lambebo Tonja </b>, Tadesse Destaw Belay, Yonas Chanie, …, Philipp Slusallek, Thamar Solorio, Dietrich Klakow <i>In NAACL 2025 </i>

<b>The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages</b><br> Jenalea Rajab, Anuoluwapo Aremu, Everlyn Asiko Chimoto, …, <b>Atnafu Lambebo Tonja</b>, Maushami Chetty, …, Vukosi Marivate, Benjamin Rosman <i>In ACL 2025 </i>

2024

<b>The Zeno’s Paradox of ‘Low-Resource’ Languages </b><br> Hellina Hailu Nigatu, <b>Atnafu Lambebo Tonja </b>, Benjamin Rosman, Thamar Solorio, Monojit Choudhury <i>In EMNLP 2024 </i>

<b>InkubaLM: A small language model for low-resource African languages </b><br> <b>Atnafu Lambebo Tonja </b>, Bonaventure F. P. Dossou, Jessica Ojo, Jenalea Rajab, Fadel Thior, Eric Peter Wairagala, Anuoluwapo Aremu, Pelonomi Moiloa, Jade Abbott, Vukosi Marivate, Benjamin Rosman

<b>Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets </b><br> Israel Abebe Azime, <b>Atnafu Lambebo Tonja </b>, Tadesse Destaw Belay, Mitiku Yohannes Fuge, Aman Kassahun Wassie, …, Seid Muhie Yimam <i>In EMNLP 2024 </i>

<b>CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark </b><br>

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, <b>Atnafu Lambebo Tonja </b> et al.

<b>Gender Bias Evaluation in Machine Translation for Amharic, Tigrigna, and Afaan Oromoo </b><br>

Walelign Tewabe Sewunetie, <b>Atnafu Lambebo Tonja </b>, Tadesse Destaw Belay,Hellina Hailu Nigatu, Gashaw Kidanu, Zewdie Mossie, Hussien Seid, Seid Muhie Yimam

<b>NLP Progress in Indigenous Latin American Languages </b><br>

<b>Atnafu Lambebo Tonja </b>, Fazlourrahman Balouchzahi, Sabur Butt, Olga Kolesnikova, Hector Ceballos, Alexander Gelbukh, Thamar Solorio

<b>EthioMT: Parallel Corpus for Low-resource Ethiopian Languages </b><br>

<b>Atnafu Lambebo Tonja </b>, Olga Kolesnikova, Alexander Gelbukh, Jugal Kalita

<b>EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation</b><br>

<b> Atnafu Lambebo Tonja </b>, Israel Abebe Azime3, Tadesse Destaw Belay,Mesay Gemeda Yigezu, et al.

2023

<b>Cross-lingual Open-Retrieval Question Answering for African Languages </b><br>

Odunayo Ogundepo, Tajuddeen Gwadabe, Clara Rivera, Jonathan H Clark, Sebastian Ruder, David Adelani, Bonaventure Dossou, <b>Atnafu Lambebo Tonja </b> et al.

<b>The Less the Merrier? Investigating Language Representation in Multilingual Models </b><br> Hellina Hailu Nigatu, <b>Atnafu Lambebo Tonja </b>, Jugal Kalita. <i>In EMNLP 2023 </i>

<b>AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR</b><br> Tobi Olatunji, Tejumade Afonja, Aditya Yadavalli, Chris Chinenye Emezue, Sahib Singh, Bonaventure F.P. Dossou, Joanne Osuchukwu, Salomey Osei, <b>Atnafu Lambebo Tonja</b>, Naome Etori, Clinton Mbataku. <i>In TACL 2023</i>

<b>MasakhaNEWS: News Topic Classification for African languages</b> <br> David Ifeoluwa Adelani, Marek Masiak, Israel Abebe Azime, Jesujoba O. Alabi, <b>Atnafu Lambebo Tonja</b>, Christine Mwase, Odunayo Ogundepo, Bonaventure F. P. Dossou, Akintunde Oladipo, …, and Pontus Stenetorp. <i>In IJCNLP-AACL [Best Paper Award …Area Chair Award(Resources and Evaluation) ], 2023 & AfricaNLP Workshop 2023</i>.

<b>AfriNames: Most ASR models" butcher" African Names</b><br> Tobi Olatunji, Tejumade Afonja, Bonaventure FP Dossou, <b>Atnafu Lambebo Tonja</b>, Chris Chinenye Emezue, Amina Mardiyyah Rufai, Sahib Singh. <i>In INTERSPEECH 2023</i>

<b>Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec</b><br> <b>Atnafu Lambebo Tonja</b>, Christian Maldonado-Sifuentes, David Alejandro Mendoza Castillo, Olga Kolesnikova, Noé Castro-Sánchez, Grigori Sidorov, Alexander Gelbukh. <i>In AmericasNLP Workshop at ACL 2023</i>

<b>Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models</b><br> <b>Atnafu Lambebo Tonja</b>, Hellina Hailu Nigatu, Olga Kolesnikova, Grigori Sidorov, Alexander Gelbukh, Jugal Kalita. <i>In AmericasNLP Workshop at ACL 2023</i>

<b>Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages</b><br> Israel Abebe Azime, Sana Al-azzawi, <b>Atnafu Lambebo Tonja</b>, Iyanuoluwa Shode, Jesujoba Alabi, Ayodele Awokoya, Mardiyyah Oduwole, Tosin Adewumi, Samuel Fanijo, Awosan Oyinkansola. <i>In SemEval-2023 Workshop at ACL 2023</i>

<b>Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities</b><br> <b>Atnafu Lambebo Tonja</b>, Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Moges Ahmed Mehamed, Olga Kolesnikova, Seid Muhie Yimam. <i>In RAIL-2023 Workshop at EACL 2023</i>

<b>Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data</b><br> <b>Atnafu Lambebo Tonja</b>, Olga Kolesnikova, Alexander Gelbukh, Grigori Sidorov. <i>Journal of Applied Sciences</i>

2022

<b>AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages</b><br> Bonaventure F. P. Dossou, <b>Atnafu Lambebo Tonja</b>, Oreen Yousuf, Salomey Osei, Abigail Oppong, Iyanuoluwa Shode, Oluwabusayo Olufunke Awoyomi, Chris Chinenye Emezue. <i> In SustaiNLP Wokshop, co-located with EMNLP 2022 </i>

<b>The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation</b><br> Tadesse Destaw Belay, <b>Atnafu Lambebo Tonja</b>, Olga Kolesnikova, Seid Muhie Yimam, Abinew Ali Ayele, Silesh Bogale Haile, Grigori Sidorov, Alexander Gelbukh. <i>In 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA) </i>

<b>Improving neural machine translation for low resource languages using mixed training:The case of ethiopian languages</b> <br> <b>Atnafu Lambebo Tonja</b>, Olga Kolesnikova, Muhammad Arif, Alexander Gelbukh, Grigori Sidorov. <i> In Mexican International Conference on Artificial Intelligence </i>

<b>Early Ginger Disease Detection Using Deep Learning Approach</b><br> Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, <b>Atnafu Lambebo Tonja</b>. <i> In International Conference on Advances of Science and Technology</i>

2021

<b>A parallel corpora for bi-directional neural machine translation for low resourced ethiopian languages</b><br> <b>Atnafu Lambebo Tonja</b>, Michael Melese Woldeyohannis, Mesay Gemeda Yigezu. <i> In 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)</i>

<b>Multilingual neural machine translation for low resourced languages: Ometo-english</b><br> Mesay Gemeda Yigezu, Michael Melese Woldeyohannis, <b>Atnafu Lambebo Tonja</b>. <i> In 2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA) </i>