Afri‑MCQA Afri‑MCQA
Multilingual · Vision + Language · Africa‑centric

Afri‑MCQA: Multimodal Cultural Question Answering for African Languages

A benchmark that evaluates cultural and regional knowledge across African contexts using images and multilingual questions/answers.

Afri‑MCQA teaser

Afri-MCQA examples

Representative samples showing culturally grounded questions in multiple African languages.

What is Afri‑MCQA?

Afri‑MCQA is a culturally nuanced, multilingual VQA benchmark centered on African languages and communities. It measures whether Multimodal LLMs understand local artifacts, social practices, and region‑specific entities, beyond generic visual recognition.

  • Languages: e.g., Amharic, Hausa, Igbo, Oromo, Swahili, Yoruba, Zulu, Arabic (Maghrebi), etc.
  • Coverage: countries across North, West, East, Central, and Southern Africa
  • Task: single/multi‑choice VQA; short‑answer localization where relevant

Contributions

  • First cross‑continental, culture‑aware MCQA across major African languages
  • Human‑vetted cultural categories: food, clothing, crafts, traditions, public signage
  • Strong baselines with open‑source eval code

Data Statistics

10k+
Q/A pairs
20‑40
Languages
30+
Countries

Exact counts TBD after final curation.

Dataset statistics

Download

We host the dataset on Hugging Face Datasets (train/dev/test splits) and release loaders in Python.

  • Hugging Face: hf.co/datasets/afri-nlp/afri-mcqa
  • License: CC BY‑SA 4.0 for the website; dataset license TBD
  • Ethics: Dataset statement, consent, and redaction policies

Load with 🤗 Datasets

from datasets import load_dataset
load_dataset("afri-nlp/afri-mcqa", split="validation")

Baseline Results

ModelVision EncoderLangAccuracy (dev)
Open‑CLIP + LLM (demo)ViT‑L/14Multilingual
Qwen‑VL 7B (demo)Multilingual
GPT‑4o‑mini (demo)Multilingual

Reproduce with our evaluation scripts; update this table for your camera‑ready.

Test your system!

Submit predictions to our EvalAI leaderboard. We provide a starter notebook and a submission validator.

  1. Download the test images & question JSON
  2. Run inference and produce a predictions.json
  3. Upload to EvalAI challenge Afri‑MCQA

Submission Format

{
  "submission_version": 1,
  "preds": [
    {"question_id": "AMCQA_000001", "answer": "B"},
    {"question_id": "AMCQA_000002", "answer": "D"}
  ]
}

Multi‑choice labels: A/B/C/D. For multilingual free‑form, use the text field.

Paper

Preprint coming soon on arXiv.

Code

Dataset builders and evaluation scripts on GitHub: github.com/afri-nlp/afri-mcqa.

Motivation: Why Afri‑MCQA?

Multimodal LLMs increasingly claim global competence, yet most benchmarks under‑represent African cultures, languages, and visual context. Afri‑MCQA aims to close this gap by measuring culturally grounded understanding across regions and languages at scale.

<5%*
of V&L samples involve African contexts
20–40
Languages (phase 1)
5
Macro‑regions (N/W/E/C/S Africa)

*Indicative; exact figures to be updated in the paper.

Core Goals

  • Cultural validity: questions reflect authentic local practices, artifacts, and language usage.
  • Linguistic breadth: multiple African languages, dialectal variants where feasible.
  • Fair evaluation: per‑language and per‑category breakdowns, plus overall scores.

Categories

  • Food & agriculture
  • Clothing & crafts
  • Public spaces & signage
  • Traditions & social practices
  • Local objects & landmarks
Regional coverage map

Dataset Construction

  1. Sourcing: community contributors, local photographers, and permissive repositories (e.g., CC‑BY/CC0). Sensitive content excluded by policy.
  2. Authoring: native speakers craft MCQA items (A/B/C/D) and rationales; ambiguity checks added.
  3. Validation: cross‑lingual verification and pilot studies; inter‑annotator agreement reported.
  4. Splits: train/dev/test with careful entity and region de‑duplication to reduce leakage.

Schema

{
  "id": "AMCQA_000123",
  "image": "images/region/sample.jpg",
  "language": "sw",
  "question": "...",
  "choices": {"A": "...", "B": "...", "C": "...", "D": "..."},
  "answer": "B",
  "category": "clothing",
  "country": "TZ",
  "rationale": "..."
}

Ethics & Governance

  • Consent & rights: documented consent for identifiable people; removal process via contact form.
  • Representation: balancing across regions, cultures, and urban/rural scenes where feasible.
  • Privacy: PII redaction; default blur for faces of minors.
  • Attribution: credit to contributors; metadata preserves original authorship when licensed.
  • Use policy: non‑harmful research uses; details in dataset card.

See the dataset card for full annotation guidelines and redaction policies.

Community & Contribute

Afri‑MCQA is community‑driven. We welcome contributions across languages, annotation, model baselines, and documentation.

Get Involved

  • Join our Discord/Slack (link)
  • Share feedback via GitHub Discussions
  • Propose new language or category

Contributor Guide

  • Annotation checklist
  • Quality review protocol
  • Submitting a new baseline

Acknowledgements

  • Community annotators and language leads
  • Partner labs & sponsors
  • Open‑source maintainers

BibTeX

@misc{afri_mcqa2025,
  title={Afri-MCQA: Multimodal Cultural Question Answering for African Languages},
  author={First Author and Second Author and et al.},
  year={2025},
  eprint={XXXX.XXXXX},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}