Around early 2023, rumours started flying in the comments of different Instagram posts. There seems to be something not quite right with the translate feature. Commenting "Oouga Bouga" triggers the translate button to appear and seems to translate nonsensical variations. Some are racist translations, some are sexual innuendos, some seem to be ramblings of sentient beings.

This is a screen recorded from that time -

Your mom on Instagram
6 likes, 24 comments - makeyourmomgreatagain87 on February 23, 2023
https://www.instagram.com/reel/CpALN8FLC9B/

None of the user comments show the behaviour anymore. But everyone expected this to be quickly patched because this had the potential to be a PR disaster for Instagram. However, it's December of 2023 and still didn't seem to be patched.

supercarscreams (@supercarscreams) • Instagram reel
53K likes, 39K comments - supercarscreams on December 16, 2023: "insta found out & patched. can yall help me reach 150k before new years? 🙏😻 - 🏎️ @wsdtony 📸 @v10driver - #funny #drift #drifter #cargram #carinstagram #cars #car #audi #audis5 #gt500 #hellcat #gt350 #mopar #v8 #v12 #v10 #carswithoutlimits #carfails #carfail #jdm #gtr #bmw #bmee36 #bmwm #bmwm3 #porsche #ferrari #koenigsegg #cayenneturbogt #speed".
https://www.instagram.com/reel/C07h1PquFIO/

The comments of this post have some examples that still work in different ways. Coming to March 2024, Matt Rose posts a compilation of all the different variations that were discovered by users.

Also a comment by a reddit-user points out the connection to google translate, where similar behaviour has been observed. You can see the behaviour still exists.

Google Translate
Google's service, offered free of charge, instantly translates words, phrases, and web pages between English and over 100 other languages.
https://translate.google.com/?hl=en&sl=so&tl=en&text=ooga%20booga&op=translate

At this point, when searching for other bugs between Somali and English, I came across this post on a Google support forum

somali to english bug . - Google Translate Community
https://web.archive.org/web/20250315051518/https://support.google.com/translate/thread/3687976/somali-to-english-bug?hl=en

This issue now has been traced atleast back to 2019 mainly on the same Somali to English translation pipeline.

Then I found r/translategate which has examples from August of 2018. If you go through some of the top posts, you will see that the examples exist in multiple languages. These are languages that might not have been extensively trained on. However, some of them are still surprising - Here is a 2025 screenshot of a Bulgarian to English translation that still works.

I was quite puzzled at this point at how this was happening post the 2023 cambrian explosion of LLMs. That is when I came across this post by a reddit user

Reporting Petscop bugs in Google Translate
Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and so much more from users like Nathan1123.
https://imgur.com/gallery/reporting-petscop-bugs-google-translate-IIAoo

This user's theory was that user contributions had poisoned the well. However, if you try to trace it from 2018 to 2025, this would mean that the poisoning was quite effective. So I started to look at other sources to see how Google had built the translation pipelines.

Around 2012 fresh out of the "AI winter" that started in the 1987, neural nets were starting to seem like they could be better approaches to Machine Learning. This was in conjunction with an explosion of GPU-based parallel processing capabilities that enabled neural networks to be more powerful. AlexNet had won a hard fought 10% lead on the runner up in the ImageNet challenge. This was followed by breakthroughs with DeepMind, specifically in AlphaZero and AlphaGo which solved for Chess and Go gameplays.

In 2014, 3 Google engineers Ilya Sutskever, Oriol Vinyals, Quoc V. Le wrote a paper that demonstrated that sequences of words could be semantically linked by reversing them.

This was a far leaner method of semantic linkage than what was then the bleeding edge of language translation - Statistical Machine Translation. This method broke down sentences into words or phrases and statistically chose translations for those words from a large vocabulary dictionary

What they proposed as an alternative was a "sequence-to-sequence" approach instead of a word-to-word approach. To do this they created a new type of neural net that they called the Long Short-Term Memory (LSTM). To train this model, they used 12 million English-French sentence pairs from the WMT'14 dataset consisting of 348M French words and 304M English words. The English sentences were reversed during the training processes. This caused the model to retain a lot more of the context between the words in the sequences. This context however, was not very effective for very long sequences and was prone to hallucinations.

Google, in the years that followed started to train a corpus of

Ethnographies of Datasets: Teaching Critical Data Analysis through R Notebooks | Ethnographies of Datasets: Teaching Critical Data Analysis through R Notebooks | Manifold @CUNY
by Lindsay Poirier
https://cuny.manifoldapp.org/read/ethnographies-of-datasets-teaching-critical-data-analysis-through-r-notebooks-8395d7ae-bbb0-4547-a738-89b4be8b9c12/section/f5d6c3ca-cc2e-4e2a-9049-9e8dfd5f2603
  • Critical Dataset ethnography