Catherine Gitau
4 Papers
Catherine Gitau is an academic researcher. The author has contributed to research in topics: Computer science & Languages of Africa. The author has an hindex of 1, co-authored 2 publications.
Chat about Author
Papers
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani,Jade Abbott,Graham Neubig,Daniel D'souza,Julia Kreutzer,Constantine Lignos,Chester Palen-Michel,Happy Buzaaba,Shruti Rijhwani,Sebastian Ruder,Stephen Mayhew,Israel Abebe Azime,Shamsuddeen Hassan Muhammad,Shamsuddeen Hassan Muhammad,Chris Chinenye Emezue,Joyce Nakatumba-Nabende,Perez Ogayo,Aremu Anuoluwapo,Catherine Gitau,Derguene Mbaye,Jesujoba O. Alabi,Seid Muhie Yimam,Tajuddeen R. Gwadabe,Ignatius Ezeani,Rubungo Andre Niyongabo,Jonathan Mukiibi,Verrah Otiende,Iroro Orife,Davis David,Samba Ngom,Tosin P. Adewumi,Paul Rayson,Mofetoluwa Adeyemi,Gerald Muriuki,Emmanuel Anebi,Chiamaka Chukwuneke,Nkiruka Odu,Eric Peter Wairagala,Samuel Oyerinde,Clemencia Siro,Tobius Saul Bateesa,Temilola Oloyede,Yvonne Wambui,Victor Akinode,Deborah Nabagereka,Maurice Katusiime,Ayodele Awokoya,Mouhamadane Mboup,Dibora Gebreyohannes,Henok Tilaye,Kelechi Nwaike,Degaga Wolde,Abdoulaye Faye,Blessing Sibanda,Orevaoghene Ahia,Bonaventure F. P. Dossou,Kelechi Ogueji,Thierno Ibrahima Diop,Abdoulaye Diallo,Adewale Akinfaderin,Tendai Marengereke,Salomey Osei +61 more
TL;DR: In this article, the authors present the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages and conduct an extensive empirical evaluation of state-of-the-art methods across both supervised and transfer learning settings.
Textual Augmentation Techniques Applied to Low Resource Machine Translation: Case of Swahili
Catherine Gitau,Vukosi Marivate +1 more
- 26 Jan 2023
TL;DR: The authors investigate the impact of applying textual data augmentation tasks to low resource machine translation and compare their performance with baseline neural machine translation for English-Swahili (En-Sw) datasets.
•Posted Content
MasakhaNER: Named Entity Recognition for African Languages
David Ifeoluwa Adelani,Jade Abbott,Graham Neubig,Daniel D'souza,Julia Kreutzer,Constantine Lignos,Chester Palen-Michel,Happy Buzaaba,Shruti Rijhwani,Sebastian Ruder,Stephen Mayhew,Israel Abebe Azime,Shamsuddeen Hassan Muhammad,Chris Chinenye Emezue,Joyce Nakatumba-Nabende,Perez Ogayo,Anuoluwapo Aremu,Catherine Gitau,Derguene Mbaye,Jesujoba O. Alabi,Seid Muhie Yimam,Tajuddeen R. Gwadabe,Ignatius Ezeani,Rubungo Andre Niyongabo,Jonathan Mukiibi,Verrah Otiende,Iroro Orife,Davis David,Samba Ngom,Tosin P. Adewumi,Paul Rayson,Mofetoluwa Adeyemi,Gerald Muriuki,Emmanuel Anebi,Chiamaka Chukwuneke,Nkiruka Odu,Eric Peter Wairagala,Samuel Oyerinde,Clemencia Siro,Tobius Saul Bateesa,Temilola Oloyede,Yvonne Wambui,Victor Akinode,Deborah Nabagereka,Maurice Katusiime,Ayodele Awokoya,Mouhamadane Mboup,Dibora Gebreyohannes,Henok Tilaye,Kelechi Nwaike,Degaga Wolde,Abdoulaye Faye,Blessing Sibanda,Orevaoghene Ahia,Bonaventure F. P. Dossou,Kelechi Ogueji,Thierno Ibrahima Diop,Abdoulaye Diallo,Adewale Akinfaderin,Tendai Marengereke,Salomey Osei +60 more
TL;DR: The first publicly available high-quality dataset for named entity recognition (NER) in ten African languages is presented in this paper, with the goal of addressing the underrepresentation of the African continent in NLP research.
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
Cheikh M. Bamba Dione,David Ifeoluwa Adelani,Peter Nabende,Jesujoba O. Alabi,Happy Buzaaba,Shamsuddeen Hassan Muhammad,Chris Chinenye Emezue,Perez Ogayo,Anuoluwapo Aremu,Catherine Gitau,Derguene Mbaye,Jonathan Mukiibi,Blessing Sibanda,Bonaventure F. P. Dossou,Andiswa Bukula,Rooweither Mabuya,Allahsera Auguste Tapo,Edwin Munkoh-Buabeng,V. M. Koagne,Fatoumata Kabore,Amelia V. Taylor,Godson Kalipe,Tebogo Macucwa,Vukosi Marivate,Tajuddeen R. Gwadabe,Ikechukwu E. Onyenwe,Gratien Gualbert Atindogbé,O. Samuel,Marie-Rosette Nahimana,Kudzai Gotosa,Apelete Agbolo,Seydou Traore,Chinedu Uchechukwu,Aliyu Ahmad Yusuf,M. Abdullahi,Dietrich Klakow +35 more
- 23 May 2023
TL;DR: In this paper , the authors presented the largest part-of-speech (POS) dataset for 20 typologically diverse African languages and applied various cross-lingual transfer models trained with data available in the universal dependencies guidelines.