Inside the Book “Lost in Automatic Translation” – about AI, language technologies, and English
“Lost in Automatic Translation” is a book about AI language technologies such as ChatGPT and Google Translate. It’s meant for the general audience. Here you can read what each chapter in the book discusses.
Chapter 1: Can We Have a Word?
Automatic translation tools like Google Translate have improved immensely in recent years. Older translation technology selected the sentence that sounded more natural in the target language among multiple prospective word-by-word translations. Conversely, the current tools learn a sentence-level translation function from human translations. Although they are very useful, automatic translation tools don’t work equally well for every pair of languages and every genre and topic. For this reason, automatic translation didn’t yet make second language acquisition obsolete. Mastering English means being able to think in English rather than translating your thoughts from your native language. The language of our thoughts affects our word choice and grammatical constructions, so going through another language might result in incorrect or unnatural sentences. Choosing the right English words involves obstacles such as mispronunciation, malapropism, and inappropriate contexts.
Keywords: automatic translation, machine translation, Google Translate, second language acquisition, English as a foreign language (EFL), mispronunciation, malapropism, artificial intelligence (AI)
Chapter 2: Call the Grammar Police
After acquiring sufficient vocabulary in a foreign language, learners start understanding parts of conversations in that language. Speaking, in contrast, is a harder task. Forming grammatical sentences requires choosing the right tenses and following syntax rules. Every beginner English as a foreign language (EFL) speaker makes grammar errors – and the type of grammar errors can reveal hints about their native language. For instance, Russian speakers tend to omit the determiner “the” because Russian doesn’t use such modifying words. One linguistic phenomenon that is actually easier in English than in many other languages is grammatical gender. English doesn’t assign gender to inanimate nouns such as “table” or “cup.” A few years ago, the differences in grammatical gender between languages helped reveal societal gender bias in automatic translation: translation systems that were shown gender-neutral statements in Turkish about doctors and nurses assumed that the doctor was male while the nurse was female.
Keywords: grammar, syntax, grammar error correction, grammatical gender, gender bias, automatic translation, machine translation, English as a foreign language (EFL), artificial intelligence (AI)
Chapter 3: Reading between the Lines
While what is said can be difficult to understand, what is not said may pose an even bigger challenge. Language is efficient, so often what goes without saying is simply not being said. It is left for the reader or listener to interpret underspecified language and resolve ambiguities, a task that we do seamlessly using our personal experience, knowledge about the world, and commonsense reasoning abilities. In many cases, commonsense knowledge helps EFL learners compensate for low language proficiency. However, what is considered “commonsense” is not always universal. Some commonsense knowledge, especially pertaining to social norms, differs between cultures. Can language technologies help bridge this cultural gap? It depends. Chatbots like ChatGPT seem to have broad knowledge about every possible topic in the world. However, ChatGPT learned about the world from reading all the English text on the web, which is primarily coming from the US, and thus it has a North American lens. In addition, despite being “book smart,” it still lacks basic commonsense reasoning abilities that are employed by us to understand social interactions and navigate the world around us.
Keywords: implicit meaning, underspecified language, language ambiguity, commonsense reasoning, culture, social norms, chatbot, ChatGPT, large language model (LLM), artificial intelligence (AI)
Chapter 4: A Figure of Speech
Non-compositional phrases such as “by and large” are phrases whose meaning cannot be unlocked by simply translating the combination of words they constitute. In particular, figurative expressions – such as idioms, similes and metaphors – are ubiquitous in English. Among other reasons, figurative expressions are acquired late in the language learning journey because they often capture cultural conventions and social norms associated with the people speaking the language. Figurative expressions are especially prevalent in creative writing, acting as the spice that adds flavor to the writing. Artificial intelligence (AI) writing assistants such as ChatGPT are now capable of editing raw drafts into well-written pieces, to the advantage of native and non-native speakers alike. These AI tools, which have gained their writing skills from exposure to vast amounts of online text, are extremely adept at generating text similar to the texts they have been exposed to. Unfortunately, they have demonstrated shortcomings in creative writing that requires deviating from the norm.
Keywords: non-compositional, figurative language, idioms, similes, metaphors, artificial intelligence (AI), ChatGPT, large language model (LLM)
Chapter 5: To Put It Delicately
Euphemisms, a particular type of idiom especially prevalent in American English, are vague or indirect expressions that often substitute harsh, embarrassing, or unpleasant terms. They are widely used to navigate sensitive topics like death and sex. “Passing away,” for example, has long been an accepted term to describe the act of dying. When euphemisms are in use for the length of time it takes to become lexicalized, they are often replaced with new ones, a phenomenon known as “the euphemism treadmill.” Correctly interpreting and using euphemisms can be difficult for EFL learners – and can lead to misuse since these expressions may rely on relevant cultural knowledge. That is unfortunate, given that euphemisms hold sensitive meanings. Artificial intelligence (AI) writing assistants can now go beyond grammar correction to suggesting edits for more inclusive language, such as replacing “whitelist” with “allow-list” and “landlord” with “property owner.” Such suggestions can help inform EFLs and users from diverse cultures – who carry a different cultural baggage – of unintended bias in their writing. At the same time, these assistants also run the risk of erasing individual and cultural differences.
Keywords: euphemism, political correctness, inclusive language, artificial intelligence (AI)
Chapter 6: Grounded in Reality
At what time does the afternoon start, at 1 p.m. or 3 p.m.? Language understanding requires the ability to correctly match statements to their real-world meaning. This mapping process is a function of the context, which includes various factors such as location and time as well as the speaker’s and listeners’ backgrounds. For example, an utterance like, “It is hot today,” would mean different things were it expressed in Death Valley versus Alaska. Based on our background and experiences, people have different interpretations for time expressions, color descriptions, geographic expressions, qualities, relative expressions, and more. This ability to map language to real-world meaning is also required from the language technology tools we use. For example, translating a recipe that contains instructions to “preheat the oven to 180 degrees” requires a translation system to understand the implicit scale (e.g. Celsius versus Fahrenheit) based on the source language and the user’s location. To date, no automatic translation systems can do this, and there is little “grounding” in any widely used language technology tool.
Keywords: grounding, time expression, measurement units, machine translation, automatic translation, artificial intelligence (AI), large language model (LLM), ChatGPT
Chapter 7: Internet Speak Is the Best, Don’t @ Me
Although the internet has removed geographical boundaries, transforming the world into a global village, English is still the most dominant language online. New forms of online communication such as emoji and memes have become an integral part of internet language. While it’s tempting to think of such visual communication formats as removing the cultural barriers – after all, emoji appear like a universal alphabet – their interpretation may rely on cultural references.
Keywords: internet language, emoji, internet memes, vision and language models, search engine, Bing, large language model (LLM), ChatGPT, Gemini, artificial intelligence (AI)
Chapter 8: Can You Repeat That, Please?
Even after achieving a high level of English proficiency, our accents – along with involuntary code-switching, pronunciation of English words as they are pronounced in our native tongue, and more – may still give us away as EFLs. Accent is the most immediately noticeable feature of EFL speakers. After moving to North America, I was faced with a conflict: Should I preserve my foreign accent and embrace it as part of my identity or try to pass as an American? While the perception that all accents are valid is true, it is also – to some extent – naïve. It not only ignores the desire to integrate into American culture but also minimizes the impact of implicit biases, which can go as far as labeling people with foreign accents as less competent. Another practical reason to develop a North American accent is to adjust to personal assistants such as Siri and Alexa that often fail to understand foreign accents. At the same time as the world is becoming more progressive and inclusive, language technologies sometimes inadvertently push us a step back.
Keywords: accent, automatic speech recognition (ASR), personal assistant, Siri, Alexa, code-switching, mispronunciation
Chapter 9: The Unspeakable
In contrast to the rest of the book, this chapter discusses not what to say in and how to speak English but rather what is not socially acceptable to speak about in North American culture: from offensive language and profanity to sensitive topics such as sex and politics. These taboo subjects differ by culture, and EFL speakers who come from cultures that are more direct might find themselves saying something inappropriate – just as chatbots can sometimes generate offensive content. The developers of chatbots like ChatGPT have programmed filters to prevent them from generating offensive text. Those filters are based on the norms of the developers themselves, most of whom are based in North America, and this can make a chatbot’s refusal to answer some questions seem excessively careful through the lens of other cultures.
Keywords: offensive language, profanity, taboo, culture, English as a foreign language (EFL), chatbot, ChatGPT, Bing, large language model (LLM), Gemini
Chapter 10: The Secret Code of Body Language
Apart from the words we speak or write, nonverbal communication – such as tone of voice, facial expressions, eye contact, and gestures – also differs across cultures. For example, travel guides for Italy like to warn against using the 🤌 hand gesture commonly signaling “wait” in many countries, because Italians interpret this gesture as, “What the hell are you saying?” Tech companies are now dipping their toes into analyzing users’ behavior as expressed in nonverbal communication. For example, Zoom is providing business customers with AI tools that can determine users’ emotions during video calls based on facial expressions and tone of voice. Unless companies carefully consider cultural differences, the ramifications could be more algorithmic bias and discrimination.
Keywords: nonverbal communication, tone of voice, facial expressions, eye contact, gestures, Zoom, emotion detection, artificial intelligence (AI)
Chapter 11: Language and Identity
Language learning is often regarded as beneficial for developing a higher level of empathy and cultural appreciation. When we connect with people from a different linguistic background than ours, we can catch a glimpse of the rich cultural and linguistic mosaic that makes up our world – and incorporate these insights into our perspective of humanity. We also recognize that there are certain compromises that EFL speakers face when they make English their dominant day-to-day means of communication. One is the loss of proficiency in their native language, which can include forgetting words and code-switching to English; the second is a change in identity as we adapt our sense of self to each language we speak. Examining these crises related to language and identity can help us map out a future for how we want to communicate – and for how language learning and language technologies can help us realize our vision.
Keywords: first language attrition, first language, identity, advertising, Facebook, artificial intelligence (AI)
Lost in Automatic Translation is available for pre-order!