Mind Your Language: The Battle For Linguistic Diversity In AI

With his Geeky and headphones in the style of Ted-Talk, Sussian Pichai seemed directly from a Silicon Valley incubator.

That Monday, February 10, Google’s executive director took the stage at the Artificial Intelligence Action Summit in Paris. From the Grand Palais podium, he announced a new golden era of innovation.

“Using AI techniques, we add more than 110 new languages to translate Google last year, spoken through half a billion people around the world,” said the technological tycoon, his eyes fixed on his notes. “That leads our total to 249 languages, including 60 African languages, more to come.”

Delivered in a monotonous, his statement was barely registered among the attendees of the summit: an assembly of world leaders, researchers, NGOs and technology executives.

But for the defenders of linguistic diversity in artificial intelligence, Mr. Pichai’s words marked a quiet victory, one achieved after two years of intense negotiations behind the scene in the arcane world of digital diplomacy.

“It shows that the message is happening and the technology companies are listening,” said Joseph Nkalwo Ngoula, Digital Policies Advisor of the UN mission of the Francophonie International Organization, in New York.

Linguistic division

Mr. Pichai’s speech was very far from the false linguistic steps of early generative AI, a branch of artificial intelligence capable of creating original content, from text to images, music and animation.

When Operai launched Chatgpt in 2022, the English non -speakers quickly discovered their limitations.

An English consultation would generate a detailed and informative response. The same notice in French? Two paragraphs, followed by a shy apology: “I’m sorry, I have not been trained in that” or “my model is not updated beyond this date.”

Such a gap is found in the intricate mechanics of AI tools, which are based on the so-called large language models (LLM) such as GPT-4, Meta’s calls or Google Gemini to digest vast torobos of internet data that help them understand and generate text.

But the Internet itself is overwhelmingly Anglophone. While only 20 percent of the world’s population speaks at home, almost half of the training data for the main AI models are in English.

Even today, Chatgpt’s responses in French, Portuguese or Spanish have improved, but they are still less enlightening than their English counterparts.

The UN global digital compact aims to bring together governments and industry to ensure that technology, such as AI, works for all humanity.

More clear approach

“The volume of information available in English is much higher, but it is also more updated,” said Mr. Nkalwo Ngoula. By default, AI models are conceived, train and unfold in English, leaving other languages that fight to catch up.

The division is not just quantitative. The AI, when it is deprived of solid training in any given language, begins to “hallucinate”, generating incorrect or absurd responses with a disturbing authority, as a too confident friend who covers the night.

A classic hallucination of AI is to respond to a request for biographical details on a famous person inventing a Nobel Prize or presenting a strange parallel career, as in this example generated by Chatgpt, at the request of UN news:

UN news: “Who is Victor Hugo?”

Ai amazing: “Victor Hugo, the nineteenth -century French writer, was also a passionate astronaut who contributed to the early design of the International Space Station.” 🚀😆

Black box

“It is a black box that absorbs data,” said Mr. Nkalwo Ngoula. “The results can formally consistent and logically structured, but in fact, they can be very inaccurate.”

Beyond errors in fact, AI tends to flatten linguistic wealth. Chatbots fight with regional accents and languages of languages, such as the languages of Quebecois French or Creole spoken in Haiti and the French Caribbean.

The Frenchman generated by Ia is often disinfected, stripped of his stylistic nuances.

“Molière, Léopold Sédar Senghor, Aimé Césaire, Mongo Beti: Everyone would turn in their tombs if they saw how AI writes French today,” joked Mr. Nkalwo Ngoula.

The problem is deeper in multilingual countries, as in the native camera of the diplomat, where young people commonly speak Camfranglais, a hybrid of French, English, Pidgin and local languages.

“I doubt that young people can ask something in Camfranglais and get a significant response,” he said. Expressions such as “Je Yamo CE pays” (I love this country) or “Réponds-Moi Sharp-Sharp” (answer me quickly) would probably leave bewildered AI models.

Philemon Yang (in Podium and on screens), president of the seventh novine session of the United Nations General Assembly, addresses the opening of the summit of the future on September 22, 2024.

Francophonie’s shadow campaign

The organization of Mr. Nkalwo Ngoula, the Francophonie, which brings together 93 states and governments around the use of French, which represents more than 320 million people worldwide, has turned this linguistic gap into a centerpiece of its digital strategy.

The group’s efforts culminated in the UN Global Compact of the UN last year, a framework for the governance of AI adopted by the Member States. From 2023, the Francophonie took advantage of its diplomatic network, including the influential group of Francophone ambassadors in the UN, to ensure that linguistic diversity became a central principle in the formulation of AI policies.

Along the way, unexpected allies emerged. Lusohone and Hispanic defense groups joined the fight, and even Washington put on the side of his cause. “The United States defended the inclusion of language in the development of AI,” said Mr. Nkalwo Ngoula.

His impulse was worth it. The final digital compact explicitly recognizes cultural and linguistic diversity, an issue that had initially been buried under broader accessibility. “Our goal was to take it at the forefront,” he said.

The movement even reached the Silicon Valley. At the UN Summit for the future in September 2024, where the compact, Pichai, Google CEO, surprised many, to emphasize AI’s need to provide access to global knowledge in multiple languages, was officially adopted.

“We are working towards 1,000 of the most spoken languages in the world,” he promised, a commitment that reaffirmed in Paris months later.

Global digital compact limits

Despite these profits, the challenges remain. The main one is visibility. “The Francophone content is often buried by platform algorithms,” warns Mr. Nkalwo Ngoula.

Transmission giants such as Netflix, YouTube and Spotify prioritize popularity, which means that English content dominates search results.

“If linguistic diversity was really considered, a French -speaking user should watch French films at the top of his recommendations,” he argued.

The overwhelming domain of English in AI training data is another obstacle than the compact, which also omits any reference to the UNESCO cultural diversity Convention, a supervision that, according to Mr. Nkalwo Ngoula, should be rectified.

“The linguistic diversity must be the backbone of the digital defense for the Francophonie,” Nkalwo Ngoula insisted.

Given the rhythm of AI development, these changes must occur: sharp.

Source link

Linguistic division

More clear approach

Black box

Francophonie’s shadow campaign

Global digital compact limits

Related News

Khalida Popal and Javier Zanetti among soccer stars named UN champions

As war displaces millions in the Democratic Republic of the Congo, new schools offer children hope beyond the violence

UN calls for an end to attacks on media workers

Freedom, humanity and justice: the enduring legacy of jazz