×
App Icon
The Standard e-Paper
Home To Bold Columnists
★★★★ - on Play Store
Download Now

Google adds Kikuyu, Luo languages to AI speech dataset WAXAL

Share
Vocalize Pre-Player Loader

Audio By Vocalize

Google adds Kikuyu, Luo languages to AI speech dataset WAXAL
Google adds Kikuyu, Luo to AI speech dataset WAXAL

Google has expanded its WAXAL speech dataset to include Kenya’s Kikuyu and Luo languages, a move aimed at improving AI understanding of African vernaculars for more than 100 million speakers across Sub-Saharan Africa.

Launched in Nairobi on Tuesday, February 2, 2026, the dataset is designed to help developers and researchers build AI systems that understand African languages, a gap that has long limited access to digital services on the continent.

WAXAL is an open-access collection featuring 1,250 hours of transcribed natural speech and more than 20 hours of studio recordings for synthetic voices.

It was created through a three-year Google-funded collaboration with African institutions, including Makerere University, University of Ghana, and Digital Umuganda.

The dataset now covers 21 languages, including Hausa, Yoruba, Swahili, Luganda, Acholi, Shona, and the newly added Kikuyu and Dholuo.

The dataset enables developers to build conversational AI, real-time translation tools, and voice assistants tailored to regional accents and code-switching.

WAXAL also prioritises “local sovereignty,” allowing African partners to retain data ownership and preserve cultural nuances.

Kikuyu is spoken by more than six million people in central Kenya, while Luo (Dholuo) is used by 4.2 million around the Lake Victoria basin.

Their inclusion highlights both Bantu and Nilotic linguistic diversity, bridging a long-standing gap for Africa’s 2,000+ languages that lack high-quality speech data.

“This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people,” said Walcott-Bryantt, Head of Google Research Africa.

Published under a Creative Commons license, WAXAL allows developers wide freedom to use the data.

By supporting the creation of technologies that understand local languages, the initiative makes digital tools more inclusive, helping bridge Africa’s tech divide.

Share

Related Articles