All Article Properties:
{
"access_control": false,
"status": "publish",
"objectType": "Article",
"id": "927545",
"signature": "Article:927545",
"url": "https://staging.dailymaverick.co.za/article/2021-05-24-ai-could-make-african-languages-more-accessible-with-machine-translation-but-people-need-to-make-it-happen/",
"shorturl": "https://staging.dailymaverick.co.za/article/927545",
"slug": "ai-could-make-african-languages-more-accessible-with-machine-translation-but-people-need-to-make-it-happen",
"contentType": {
"id": "1",
"name": "Article",
"slug": "article"
},
"views": 0,
"comments": 0,
"preview_limit": null,
"excludedFromGoogleSearchEngine": 0,
"title": "AI could make African languages more accessible with machine translation — but people need to make it happen",
"firstPublished": "2021-05-24 14:50:30",
"lastUpdate": "2021-05-24 14:50:30",
"categories": [
{
"id": "3",
"name": "Africa",
"signature": "Category:3",
"slug": "africa",
"typeId": {
"typeId": "1",
"name": "Daily Maverick",
"slug": "",
"includeInIssue": "0",
"shortened_domain": "",
"stylesheetClass": "",
"domain": "staging.dailymaverick.co.za",
"articleUrlPrefix": "",
"access_groups": "[]",
"locale": "",
"preview_limit": null
},
"parentId": null,
"parent": [],
"image": "",
"cover": "",
"logo": "",
"paid": "0",
"objectType": "Category",
"url": "https://staging.dailymaverick.co.za/category/africa/",
"cssCode": "",
"template": "default",
"tagline": "",
"link_param": null,
"description": "",
"metaDescription": "",
"order": "0",
"pageId": null,
"articlesCount": null,
"allowComments": "1",
"accessType": "freecount",
"status": "1",
"children": [],
"cached": true
},
{
"id": "29",
"name": "South Africa",
"signature": "Category:29",
"slug": "south-africa",
"typeId": {
"typeId": "1",
"name": "Daily Maverick",
"slug": "",
"includeInIssue": "0",
"shortened_domain": "",
"stylesheetClass": "",
"domain": "staging.dailymaverick.co.za",
"articleUrlPrefix": "",
"access_groups": "[]",
"locale": "",
"preview_limit": null
},
"parentId": null,
"parent": [],
"image": "",
"cover": "",
"logo": "",
"paid": "0",
"objectType": "Category",
"url": "https://staging.dailymaverick.co.za/category/south-africa/",
"cssCode": "",
"template": "default",
"tagline": "",
"link_param": null,
"description": "Daily Maverick is an independent online news publication and weekly print newspaper in South Africa.\r\n\r\nIt is known for breaking some of the defining stories of South Africa in the past decade, including the Marikana Massacre, in which the South African Police Service killed 34 miners in August 2012.\r\n\r\nIt also investigated the Gupta Leaks, which won the 2019 Global Shining Light Award.\r\n\r\nThat investigation was credited with exposing the Indian-born Gupta family and former President Jacob Zuma for their role in the systemic political corruption referred to as state capture.\r\n\r\nIn 2018, co-founder and editor-in-chief Branislav ‘Branko’ Brkic was awarded the country’s prestigious Nat Nakasa Award, recognised for initiating the investigative collaboration after receiving the hard drive that included the email tranche.\r\n\r\nIn 2021, co-founder and CEO Styli Charalambous also received the award.\r\n\r\nDaily Maverick covers the latest political and news developments in South Africa with breaking news updates, analysis, opinions and more.",
"metaDescription": "",
"order": "0",
"pageId": null,
"articlesCount": null,
"allowComments": "1",
"accessType": "freecount",
"status": "1",
"children": [],
"cached": true
},
{
"id": "38",
"name": "World",
"signature": "Category:38",
"slug": "world",
"typeId": {
"typeId": "1",
"name": "Daily Maverick",
"slug": "",
"includeInIssue": "0",
"shortened_domain": "",
"stylesheetClass": "",
"domain": "staging.dailymaverick.co.za",
"articleUrlPrefix": "",
"access_groups": "[]",
"locale": "",
"preview_limit": null
},
"parentId": null,
"parent": [],
"image": "",
"cover": "",
"logo": "",
"paid": "0",
"objectType": "Category",
"url": "https://staging.dailymaverick.co.za/category/world/",
"cssCode": "",
"template": "default",
"tagline": "",
"link_param": null,
"description": "",
"metaDescription": "",
"order": "0",
"pageId": null,
"articlesCount": null,
"allowComments": "1",
"accessType": "freecount",
"status": "1",
"children": [],
"cached": true
}
],
"content_length": 6333,
"contents": "<span style=\"font-weight: 400;\">If there was a perfect Machine Translation system for African languages it would mean that all the existing knowledge found on the Internet could be translated into someone’s home language. </span>\r\n\r\n<span style=\"font-weight: 400;\">For example, the number of Xitsonga articles on the global encyclopaedia Wikipedia is tiny. “If we had a perfect [Machine Translation] system… you can take the whole Wikipedia and translate that into someone’s language, then you give them direct access to basically all of knowledge. That’s a little bit amazing,” said Dr Herman Kamper of Stellenbosch University’s Department of Electrical and Electronic Engineering.</span>\r\n\r\n<span style=\"font-weight: 400;\">Machine Translation is automated translation of one language into another, performed by a computer. </span>\r\n\r\n<span style=\"font-weight: 400;\">Kamper was one of about 40 scholars and more than 400 participants who have been teaming up since 2019 to solve speech and language problems in Africa. </span>\r\n\r\n<span style=\"font-weight: 400;\">At the end of 2020, the volunteer community was able to set the first benchmarks for 30-plus African languages in Machine Translation, a first for the African languages in Machine Translation.</span>\r\n\r\n<span style=\"font-weight: 400;\">Researchers used Natural Language Processing (NLP), a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. </span>\r\n\r\n<span style=\"font-weight: 400;\">Their research paper, </span><i><span style=\"font-weight: 400;\">Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages,</span></i><span style=\"font-weight: 400;\"> coming out of the collaborative work went on to win the 2020 Wikimedia Foundation Research Award and some members of the community have since gone on to tackle other NLP tasks. </span>\r\n\r\n<span style=\"font-weight: 400;\">According to Kamper, the benchmarks are a mere starting point for the languages, since the systems are not yet as good as an English-to-French system, for example, that Google makes use of.</span>\r\n\r\n<span style=\"font-weight: 400;\">The benchmarks are evaluation sets to test the Machine Translation systems. </span>\r\n\r\n<span style=\"font-weight: 400;\">If a big tech company like Google wanted to, it could create Machine Translation systems for all the languages, stated Kamper, who focuses particularly on speech recognition. </span>\r\n\r\n<span style=\"font-weight: 400;\">But, if you don’t have the native speakers on the ground, it is difficult to account for the long-tail languages, Kamper said. </span>\r\n\r\n<span style=\"font-weight: 400;\">At the moment, what is needed are people who are native speakers of languages — like those that were accounted for in the benchmarks, namely Khoekhoegowab, Igbo, Sepedi and Setswana — and who know that Google won’t easily work on these Machine Translation systems. </span>\r\n\r\n<span style=\"font-weight: 400;\">Most NLP research fails to have on-the-ground expertise of low-resourced languages. </span>\r\n\r\n<span style=\"font-weight: 400;\">You need people who will say, “I am going to do it”. </span>\r\n\r\n<span style=\"font-weight: 400;\">That was the scheme of the group of thinkers who gathered in 2019 to discuss NLP at a Deep Learning Indaba, held in Kenya. </span>\r\n\r\n<span style=\"font-weight: 400;\">At that 2019 teaching event, it was established that, “we want to do this thing, to build MT systems for all the languages that we possibly can”, Kamper said. </span>\r\n\r\n<span style=\"font-weight: 400;\">“In that room, there were already a whole bunch of people from all over Africa speaking different languages. That was where it started.” </span>\r\n\r\n<span style=\"font-weight: 400;\">The community of creators, translators, curators, language technologists and evaluators called their initiative the Masakhane (meaning “we build together”) project. </span>\r\n\r\n<span style=\"font-weight: 400;\">A year later the group, spearheaded by machine learning engineer Jade Abbott, were able to accomplish some of the first movements in NLP for African languages.</span>\r\n\r\n<span style=\"font-weight: 400;\">From Nigeria, volunteers are translating their own writings, including personal religious stories and undergraduate theses, into Yoruba and Igbo. This is in an effort to ensure that accessible and representative data of their culture are used to train models. </span>\r\n\r\n<span style=\"font-weight: 400;\">“But there is still a lot of work to be done,” Kamper pointed out, adding that the community still continues to work and meet on a weekly basis. </span>\r\n\r\n<span style=\"font-weight: 400;\">“The systems [or benchmarks] are focused on a relatively small domain, meaning that the systems were trained and tested on a specific style of language. They won’t necessarily do well on other types of texts,” said Kamper. </span>\r\n\r\n<span style=\"font-weight: 400;\">More data would need to be collected to cover more diverse styles or domains for it to work across multiple domains, he said. </span>\r\n\r\n<b>The Left-Behinds</b><span style=\"font-weight: 400;\"> </span>\r\n\r\n<span style=\"font-weight: 400;\">While Machine Translation systems for high-resourced languages like English and German work efficiently, the same systems do not work seamlessly for languages that are considered “low-resourced” languages.</span>\r\n\r\n<span style=\"font-weight: 400;\">There is a big discussion around what defines a low-resourced language and definitions vary, Kamper said. According to him, most African languages are considered “low-resource” because it is either difficult to procure data or there is not enough labelled audio-speech or parallel translation between the different languages, he said.</span>\r\n\r\n<span style=\"font-weight: 400;\">This means that it is difficult to procure datasets– a sentence in one language alongside its equivalent translated into another language, and then thousands others like these — for the systems. </span>\r\n\r\n<span style=\"font-weight: 400;\">And, out of the about 7,000 spoken languages in the world, most are further considered endangered languages, with small numbers of speakers, said Kamper. </span>\r\n\r\n<span style=\"font-weight: 400;\">At the same time, there are some languages, like most South African languages, that are spoken by millions of people, but it remains difficult to get labelled data.</span>\r\n\r\n<span style=\"font-weight: 400;\">According to their paper</span><i><span style=\"font-weight: 400;\">, </span></i><span style=\"font-weight: 400;\">most of the 2,000-odd living languages in Africa are considered “The Left-Behinds” and some “The Rising Stars” in NLP research.</span>\r\n\r\n<span style=\"font-weight: 400;\">“For me [a language technologist], to build a Machine Translation system for a language that I don’t speak is actually quite hard,” said Kamper, who helped create the systems for Afrikaans and a bit of isiXhosa system. </span>\r\n\r\n<span style=\"font-weight: 400;\">“The beauty of this project is that we got people, there on the ground, speaking the language,” Kamper said. “Then, the initiative was to quickly upskill [those working on the Machine Translation systems] to build these first systems.” </span>\r\n\r\n<span style=\"font-weight: 400;\">“We are basically trying to equip people all over Africa to fix the problems in their own communities,” said Kamper. </span>\r\n\r\n<span style=\"font-weight: 400;\">Kamper pointed out that in the greater scheme of things his contribution was small — just three days devoted to working on the system. </span>\r\n\r\n<span style=\"font-weight: 400;\">“But the cool thing about it is that 40-something people made a tiny contribution like this, and it turned out to be a big thing,” Kamper said. “If you didn’t have native speakers or languages, and people who sacrificed just a few moments of their time, then that wouldn’t have happened.” </span><b>DM</b>",
"teaser": "AI could make African languages more accessible with machine translation — but people need to make it happen",
"externalUrl": "",
"sponsor": null,
"authors": [
{
"id": "68512",
"name": "Rebecca Pitt",
"image": "",
"url": "https://staging.dailymaverick.co.za/author/rebecca-pitt/",
"editorialName": "rebecca-pitt",
"department": "",
"name_latin": ""
}
],
"description": "",
"keywords": [
{
"type": "Keyword",
"data": {
"keywordId": "123972",
"name": "African languages",
"url": "https://staging.dailymaverick.co.za/keyword/african-languages/",
"slug": "african-languages",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "African languages",
"translations": null
}
},
{
"type": "Keyword",
"data": {
"keywordId": "352015",
"name": "Machine Translation",
"url": "https://staging.dailymaverick.co.za/keyword/machine-translation/",
"slug": "machine-translation",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "Machine Translation",
"translations": null
}
},
{
"type": "Keyword",
"data": {
"keywordId": "352016",
"name": "Natural Language Processing",
"url": "https://staging.dailymaverick.co.za/keyword/natural-language-processing/",
"slug": "natural-language-processing",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "Natural Language Processing",
"translations": null
}
},
{
"type": "Keyword",
"data": {
"keywordId": "352017",
"name": "benchmarks",
"url": "https://staging.dailymaverick.co.za/keyword/benchmarks/",
"slug": "benchmarks",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "benchmarks",
"translations": null
}
},
{
"type": "Keyword",
"data": {
"keywordId": "352018",
"name": "Masakhane project",
"url": "https://staging.dailymaverick.co.za/keyword/masakhane-project/",
"slug": "masakhane-project",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "Masakhane project",
"translations": null
}
},
{
"type": "Keyword",
"data": {
"keywordId": "352019",
"name": "language technology",
"url": "https://staging.dailymaverick.co.za/keyword/language-technology/",
"slug": "language-technology",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "language technology",
"translations": null
}
},
{
"type": "Keyword",
"data": {
"keywordId": "352020",
"name": "Kenya Deep Learning Indaba",
"url": "https://staging.dailymaverick.co.za/keyword/kenya-deep-learning-indaba/",
"slug": "kenya-deep-learning-indaba",
"description": "",
"articlesCount": 0,
"replacedWith": null,
"display_name": "Kenya Deep Learning Indaba",
"translations": null
}
}
],
"short_summary": null,
"source": null,
"related": [],
"options": [],
"attachments": [
{
"id": "115405",
"name": "",
"description": "",
"focal": "50% 50%",
"width": 0,
"height": 0,
"url": "https://dmcdn.whitebeard.net/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg",
"transforms": [
{
"x": "200",
"y": "100",
"url": "https://dmcdn.whitebeard.net/i/JNdUDvGL9zKiBjltTZNUIqAaknI=/200x100/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg"
},
{
"x": "450",
"y": "0",
"url": "https://dmcdn.whitebeard.net/i/hLO3OLx-uLbpU7_4jSkhvWR_Jts=/450x0/smart/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg"
},
{
"x": "800",
"y": "0",
"url": "https://dmcdn.whitebeard.net/i/sXstNPkO2Yv1A4vQWYi3jRF3lhg=/800x0/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg"
},
{
"x": "1200",
"y": "0",
"url": "https://dmcdn.whitebeard.net/i/XFljks_cJhDZm8SftYu3BKW1lKE=/1200x0/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg"
},
{
"x": "1600",
"y": "0",
"url": "https://dmcdn.whitebeard.net/i/ayza3u9Bd5Kp3QEFvHRCBn6z7dI=/1600x0/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg"
}
],
"url_thumbnail": "https://dmcdn.whitebeard.net/i/JNdUDvGL9zKiBjltTZNUIqAaknI=/200x100/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg",
"url_medium": "https://dmcdn.whitebeard.net/i/hLO3OLx-uLbpU7_4jSkhvWR_Jts=/450x0/smart/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg",
"url_large": "https://dmcdn.whitebeard.net/i/sXstNPkO2Yv1A4vQWYi3jRF3lhg=/800x0/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg",
"url_xl": "https://dmcdn.whitebeard.net/i/XFljks_cJhDZm8SftYu3BKW1lKE=/1200x0/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg",
"url_xxl": "https://dmcdn.whitebeard.net/i/ayza3u9Bd5Kp3QEFvHRCBn6z7dI=/1600x0/smart/filters:strip_exif()/file/dailymaverick/wp-content/uploads/2021/05/Rebecca-multilingualStellies.jpg",
"type": "image"
}
],
"summary": "Machine translation benchmarks were recently set for more than 30 African languages, classified in the Natural Language Processing space as the ‘The Left-Behinds’. The benchmarks are the first advances for some of the 2,000-odd living African languages and present a case for information accessibility through language technology.",
"template_type": null,
"dm_custom_section_label": null,
"elements": [],
"seo": {
"search_title": "AI could make African languages more accessible with machine translation — but people need to make it happen",
"search_description": "<span style=\"font-weight: 400;\">If there was a perfect Machine Translation system for African languages it would mean that all the existing knowledge found on the Internet could be translated into som",
"social_title": "AI could make African languages more accessible with machine translation — but people need to make it happen",
"social_description": "<span style=\"font-weight: 400;\">If there was a perfect Machine Translation system for African languages it would mean that all the existing knowledge found on the Internet could be translated into som",
"social_image": ""
},
"cached": true,
"access_allowed": true
}