Nerdy creations with numbers, words, sounds, and pixels

Experimental, Science, Writing

Babaism

Babaism - gibberish text generator by Anton Hoyer

Technically speaking, this project is not purely cryptographic as it was developed to produce experimental poetry. However, the final result could be considered a mild computer virus, so I placed it under science instead of literature. In a nutshell, Babaism is an algorithm that goes over an existing text and replaces words with similar looking made-up words. Let’s call these “baba words”.

Making Up Words

To make up baba words, the algorithm is trained on arbitrary literature. For example, I used the King James Bible, the entire works of Shakespeare, Ulysses, and the English Wikipedia article on the United States of America to get a large enough basis of actual English words. The words are then split into “syllables” or rather “tokens” based on a very simple rule: wherever a vowel is followed by a consonant or vice versa. (For simplicity, Y is always counted as a vowel.) The algorithm also distinguishes between tokens that appear at the beginning, in the middle, or at the end of a word. The tokens are then sorted by their frequency in the training data.

By joining up to seven arbitrary tokens, chosen based on their respective probability, baba words such as the following are produced:

clale mutourds das rotnipets cid eny otimponds arondig viserb anraspeatty whatinong jikid alla ive upangosy ade spacod laite uclo ubums acortomb yeopote wune emisohy aseced nimad whivud antriace strud atadaddly ated ommassed alteny eblacloud intirune utiddly cillar hashent achemfece idulits lucorpy lerit ascern reny chalyn amralell enadid atisy surbals cemy nang thoilored ploghts pends minceabag hun imency diddis mentanoth inso mian tildirad dietots tudy

Of course, the process is not exclusive to English but can be applied to any language as long as unique characters are added to the vowel or consonant sets. Here are some German baba words, using only my own novels as training data. The results remind me a bit of Yiddish.

schrenderwan klarfe wetsie enseng hässangskrech erste gende otspleinhante ene eiblastle diteilte ochstate tang acke onstigete arne geif uchte vin aunedein urre aster jage tschindfane eigeng ehema feschwäun nünge teurstibeun ökonselle eihrege alkensei steschurbein ischlenge schirker hegehre jehltin midichte bemeng schreche eilsten visore dränsen übenlangte fell eute

If you are unfamiliar with German, you may find these baba words indistinguishable from actual vocabulary. As a native German, I have a similar feeling when I generate Swedish baba words, for example, trained on the Swedish Wikipedia article on “Sverige.”

väghele intiora tas käfy sprän atta amän gidiksdar eunds intet ikaulskar sosk råg askardime arhave elda mädaftur ari uckle rejlög emstettir vöttåeri visit örödas ummadäda era grun andlull kvadva tvanktådyrt äta seldinge bykvödda

In case you wonder what would happen if you did not restrict yourself to a single language but mixed them . . . have some “Schwenglisch” and decide for yourself. Pretty ugly, right? Quite orcish.

an eschotzko anang einone regi örgszanirgsko anencuetzt fonda on prehrpas daulvinsan kvando piencende undoke ocerope benve artodape va bumdspritra cruqan ölbestenca en edu olterstor ars qongkrons ena thagiler go insa neirvatfes rigis auladia misces tampispe cepir he imaas ginte ba res ut fa lällifocht ants gevon milnemonn anege reirse re aun biegromien wiotass en umes iny aunimpis ampaben plea svege en mitiat mellecan ror auspreftan os oblack ußsteva cuspogle almorndusly vy ente ida årder fongortru ausencrogy lidsteraght oblaglendia alen eimoga ugeburbo

Translating Texts

Now that we are able to make up words, rules need to be applied that determine the similarity between baba words and the ones replaced by them. One such rule, probably the most basic one, is the Levenshtein distance (which I obviously did not come up with myself). I am not going to bother you with the technical details, especially since I only used the Python module Levenshtein instead of implementing it myself. Just know that the Levenshtein distance between two words is 0 if they are identical. Conversely, it is the length of the longer word if they are altogether dissimilar.

Levenshtein(‘apples’, ’apples’) = 0

Levenshtein(‘apples’, ‘oranges’) = 5

I suppose that modern autocorrect algorithms are much more profound. But as finding absolute similarity between words is not the point here, Levenshtein works just fine.

To translate a plaintext, the algorithm pre-generates a number of 150,000 baba words, which takes only a few seconds. Then it goes through the plaintext word by word, always taking a small subset of N baba words. I found that N = 700 is a good compromise between baba words looking too alien or too similar to the respective plaintext word. That means, out of 700 random baba words, the one that has the smallest Levenshtein distance to the corresponding plaintext word is chosen.

As a matter of fact, the algorithm frequently produces actual words, which is inevitable as most common words are quite short. To further enhance the coherence of the ciphertext and also channel the readers’ imagination, I experimented with leaving a small portion of the most common words untranslated, or only replacing certain vowels. This gives the baba sentences more structure and implies the grammatical functions of many words in their respective context. However, I could not decide on whether baba texts with this rule looked better than without it.

Below you can find exemplary parts of the English, German, and Swedish national anthems, or rather their corresponding ciphertexts. I use the term “ciphertext” loosely here as there is no way to decrypt a text once it was “babafied.”

O! sy con youl cee
by the diwn’s arly viroght,
Chan son froudryp uwe daler
ath the swanvight’s lass bling,
Whate bad tripe and gight astas
thourd the prisos cirpoght,
O’ers the aimart we warthe,
wers se ottlaly stealminle?

Eineke und Rech und Frerbengt
fert dar gesche Unterbau!
Prauneich grisset undo ale steblen,
gelleich mige Her und Ahande!
Eineke und Rech und Frerbengt fins
deß Glebse Sütetran:
Büb in Lasze dizerse Gißücke, ulche,
deitschente Unterbau!

Dung pimla, Dung froja, Dung jallelja tort
Dung asta, Dung fendeska sken!
Taga hels Dil, naster gan upte jon,
Lin stox, Lin tömme, Odina ände irna.
Dung ernor epa morinnunt fron onsara ada'mor,
udå ardat Bidite fan felogra ver irden.
Taga net ati Dung vär kogesch Dung ilir van Dung var.
Alja, taga rille elepa, taga rille döte hir Nen.

You might also wonder what would happen if the language of the training base (baba words) and of the plaintext are not the same. Again, see for yourself. The examples below are the same as the ones above, so that you can figure out for yourself how I swapped the languages.

O! sa van wot se
be he an’s kedrill liget,
At son prurt we haie
at he tedit’s most famang,
Chomfe trad stepens and richten spars
rehmoach he sprautosus vicht,
O’er he äßigarte we asche,
werse son ullat sein?

Kundskefit una Relt una Fjegekt
far vas skästretsche Gandraglind!
Danidon pilspassa muns alla spregin,
brereta nit Herta una Ban!
Kundskefit una Relt una Fjegekt kin
les Segikes Ontrornbarda:
Balil ima Flane silkes Segikes, late,
orgutschaty Gandraglind!

Dus asla, Dus fres, Dus fillesh bod
Dus stestant, Dus fleid skin!
Gang ellar Dids, evunsast lantidy upang jonds,
Diny holl, Diny himmy, Vint ende gordinna.
Dus drear ape clirnen fen ixeestura awda'sir,
ady ras Bite narne vileng aner ores.
Gang vets ate Dus are vich Dus ellire dad Dus car.
Ela, gang vile alevy, gang vile ede ive Noarfel.

Just a Prank

As initially mentioned, I also turned Babaism into a small, relatively harmless computer virus, which I call “Babawoo.” Essentially, if you execute the file, all files in the current directory are renamed to a pre-generated baba word. However, as I am not trying to do any actual damage, the original file names are stored in a file named babacup.bin and executing Babawoo again undoes all changes. I could have made this virus much more dangerous, for example by iteratively scrolling through all sub-directories. But experimenting with this kind of fire on my very own workstation, which holds all my other projects, got too hot for me. (Nothing that a virtual machine would not fix.)

Your files have been babafied. Good luck sorting this mess!

← Return to “Cryptology”

← Return to “Science”

Leave a Reply