Afri‑MCQA

Multilingual · Vision + Language · Africa‑centric

Afri‑MCQA: Multimodal Cultural Question Answering for African Languages

A benchmark that evaluates cultural and regional knowledge across African contexts using images and multilingual questions/answers.

⬇ Download 🚀 Submit to Leaderboard 📄 Paper 💻 Code

Afri-MCQA examples

Representative samples showing culturally grounded questions in multiple African languages.

What is Afri‑MCQA?

Afri‑MCQA is a culturally nuanced, multilingual VQA benchmark centered on African languages and communities. It measures whether Multimodal LLMs understand local artifacts, social practices, and region‑specific entities, beyond generic visual recognition.

Languages: e.g., Amharic, Hausa, Igbo, Oromo, Swahili, Yoruba, Zulu, Arabic (Maghrebi), etc.
Coverage: countries across North, West, East, Central, and Southern Africa
Task: single/multi‑choice VQA; short‑answer localization where relevant

Contributions

First cross‑continental, culture‑aware MCQA across major African languages
Human‑vetted cultural categories: food, clothing, crafts, traditions, public signage
Strong baselines with open‑source eval code

Data Statistics

10k+

Q/A pairs

20‑40

Languages

30+

Countries

Exact counts TBD after final curation.

Download

We host the dataset on Hugging Face Datasets (train/dev/test splits) and release loaders in Python.

Hugging Face: hf.co/datasets/afri-nlp/afri-mcqa
License: CC BY‑SA 4.0 for the website; dataset license TBD
Ethics: Dataset statement, consent, and redaction policies

Load with 🤗 Datasets

from datasets import load_dataset
load_dataset("afri-nlp/afri-mcqa", split="validation")

Baseline Results

Model	Vision Encoder	Lang	Accuracy (dev)
Open‑CLIP + LLM (demo)	ViT‑L/14	Multilingual	—
Qwen‑VL 7B (demo)	—	Multilingual	—
GPT‑4o‑mini (demo)	—	Multilingual	—

Reproduce with our evaluation scripts; update this table for your camera‑ready.

Test your system!

Submit predictions to our EvalAI leaderboard. We provide a starter notebook and a submission validator.

Download the test images & question JSON
Run inference and produce a predictions.json
Upload to EvalAI challenge Afri‑MCQA

Go to Leaderboard Starter Notebook

Submission Format

{
  "submission_version": 1,
  "preds": [
    {"question_id": "AMCQA_000001", "answer": "B"},
    {"question_id": "AMCQA_000002", "answer": "D"}
  ]
}

Multi‑choice labels: A/B/C/D. For multilingual free‑form, use the text field.

Paper

Preprint coming soon on arXiv.

Code

Dataset builders and evaluation scripts on GitHub: github.com/afri-nlp/afri-mcqa.

Motivation: Why Afri‑MCQA?

Multimodal LLMs increasingly claim global competence, yet most benchmarks under‑represent African cultures, languages, and visual context. Afri‑MCQA aims to close this gap by measuring culturally grounded understanding across regions and languages at scale.

<5%*

of V&L samples involve African contexts

20–40

Languages (phase 1)

Macro‑regions (N/W/E/C/S Africa)

*Indicative; exact figures to be updated in the paper.

Core Goals

Cultural validity: questions reflect authentic local practices, artifacts, and language usage.
Linguistic breadth: multiple African languages, dialectal variants where feasible.
Fair evaluation: per‑language and per‑category breakdowns, plus overall scores.

Dataset Construction

Sourcing: community contributors, local photographers, and permissive repositories (e.g., CC‑BY/CC0). Sensitive content excluded by policy.
Authoring: native speakers craft MCQA items (A/B/C/D) and rationales; ambiguity checks added.
Validation: cross‑lingual verification and pilot studies; inter‑annotator agreement reported.
Splits: train/dev/test with careful entity and region de‑duplication to reduce leakage.

Schema

{
  "id": "AMCQA_000123",
  "image": "images/region/sample.jpg",
  "language": "sw",
  "question": "...",
  "choices": {"A": "...", "B": "...", "C": "...", "D": "..."},
  "answer": "B",
  "category": "clothing",
  "country": "TZ",
  "rationale": "..."
}

Ethics & Governance

Consent & rights: documented consent for identifiable people; removal process via contact form.
Representation: balancing across regions, cultures, and urban/rural scenes where feasible.
Privacy: PII redaction; default blur for faces of minors.
Attribution: credit to contributors; metadata preserves original authorship when licensed.
Use policy: non‑harmful research uses; details in dataset card.

See the dataset card for full annotation guidelines and redaction policies.

Community & Contribute

Afri‑MCQA is community‑driven. We welcome contributions across languages, annotation, model baselines, and documentation.

Get Involved

Join our Discord/Slack (link)
Share feedback via GitHub Discussions
Propose new language or category

Contributor Guide

Annotation checklist
Quality review protocol
Submitting a new baseline

Acknowledgements

Community annotators and language leads
Partner labs & sponsors
Open‑source maintainers

Join the discussion Contribute on GitHub

BibTeX

@misc{afri_mcqa2025,
  title={Afri-MCQA: Multimodal Cultural Question Answering for African Languages},
  author={First Author and Second Author and et al.},
  year={2025},
  eprint={XXXX.XXXXX},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}