Nerdy creations with numbers, words, sounds, and pixels

Experimental, Science

Boonish

Boonish - cryptographic project by Anton Hoyer

“Teo! Tėp? Gōa bȧognā bōoo lā’hēo. Gȯo hoȧ haān’bȧan? Mēp-bla”

— William S. Burroughs

Background

This substitution cipher was inspired by the hallucinatory writings found in AI-generated images. Typically, the AI attempts to integrate words from the user prompt into the image, often inventing new words or even letters. To an outsider like myself, those words vaguely resemble a Pacific language, like Tagalog, which is spoken in the Philippines. Some of the made-up words sounded amusing yet plausible, so I decided to fashion them into a nomenclator cipher that mimics a constructed language.

Hallucinatory writings created by DALL·E 2

Since I had primarily generated baboons with DALL·E 2 for my project “William Baboon,” many of the hallucinations were evidently based on the word “baboon”, making the letters B, A, O, and N more likely to appear than others. I collected all the words and divided them into syllables, then omitted some because they contained uncommon letters and, therefore, seemed out of place.

The goal was to utilize the Boonish syllables, which encoded the relevant information, to construct words and complete sentences. Translating English into Boonish was relatively easy, as you will read below. However, I encountered difficulties in separating the Boonish words into the original syllables. The issue was that the algorithm could not distinguish whether individual letters should be attributed to the previous or the following syllable.

Consequently, I chose a simple solution by dividing the letters into two groups, prohibiting one group from starting a syllable and the other from ending it. This way, the algorithm always knew where to split the Boonish sentences. By utilizing the available letters and certain recurring patterns, I created more unique syllables until I had precisely 100, which seemed like a reasonable foundation to begin with.

ba, baa, baaa, baan, baap, ban, bao, baon, bap, be, beo, bla, ble, blo, bna, bnaa, bnan, bnap, bo, boa, boan, bon, boo, boon, booo, boop, bop, ga, gaa, gaaa, gaan, gan, gao, gap, ge, geo, gna, gnan, go, goa, goo, goon, goop, gooo, ha, haa, haan, han, hane, hap, hao, he, heo, ho, hoa, hon, hoo, hop, la, laa, laan, lao, laon, le, len, leo, lep, loa, lo, loe, lon, loo, loon, looo, ma, maa, maan, mao, me, meo, mep, mo, moa, mon, moo, moon, ta, taa, taan, tan, tao, te, ten, teo, tep, to, toa, ton, too, toon

As you can see, only 11 different letters of the Latin alphabet are used. This is even fewer than Hawaiian, which has a total of 13 phonemes, including K, U, W, and the ʻOkina, but excluding B and G.

Letter frequency in Boonish syllables

On a side note, I use the word “syllable” here because it suffices in most cases. However, perhaps “token” would be more appropriate because I want to leave the pronunciation of Boonish up to the reader. If you decide that tokens such as hane or maan should be pronounced as two syllables each, I am quite fine with that.

Encryption

After creating a set of 100 Boonish syllables, I needed to map these to the plaintext to facilitate substitution. Assigning one plaintext character to one Boonish syllable did not seem feasible because the syllables averaged 3.09 characters in length, which would unnecessarily inflate the ciphertext.

To address this issue, I opted not only to substitute letters (e.g., e, t, h) but also common bigrams (e.g., th, nd, in). In fact, I went a step further by also replacing common trigrams (e.g., the, ing, her). Theoretically, the algorithm could process larger group lengths, so let us simply call them “n-grams.” I chose trigrams as the largest group because I preferred the appearance of the ciphertext using them.

To obtain the most common n-grams in the English language, I auto-analyzed none other than the Holy Bible, as made famous by King James VI. At this point, I would also like to thank Project Gutenberg for providing me with free books since 2023. In the future, I plan to auto-analyze the complete works of William Shakespeare to see if there are any differences.

Now, I had 11,726 n-grams sorted by their frequency of occurrence in the Book of Books. However, a new problem arose: my 100 Boonish syllables were insufficient to encode all these n-grams. Therefore, I had to both increase the number of syllables and decrease the number of n-grams.

I achieved the first by appending pseudo-punctuation to the Boonish syllables, multiplying their number to 1,100. This included adding spaces, apostrophes, commas, periods, question and exclamation marks, hyphens, and line breaks, significantly enhancing the readability of the Boonish language. For some punctuation, I implemented lower chances of selection (especially line breaks), thus losing some of my newly gained syllables, while maintaining the overall appearance of a real language (though the latter is subject to debate).

Furthermore, I adorned single vowels of the syllables with macrons (e.g., ā instead of a) and diacritical dots (e.g., ȧ instead of a). This not only made Boonish appear more like an indigenous language but also increased the number of syllables to 4,620.

To decrease the number of n-grams, I simply discarded the least common n-grams until their number matched the Boonish syllables. This worked because the set still contained some redundancy. Trigrams were preferred since they occupied the least space in the ciphertext. However, to preserve all the information during encryption, only all single characters needed to be present in the n-grams set.

To enhance the cipher’s security slightly, I shuffled the Boonish syllables after setting a seed for the random number generator. This seed was specified in the form of a password, whose individual characters were converted to their integer representations and then concatenated. For example, the key Evil#666 yielded the integer 6911810510835545454. Of course, Python’s random module does not provide randomness that can be considered cryptographically secure, but it is better than nothing.

Plaintext
Indubitably authentic, intricate, inconceivable, and unexpurgated, Oatcrusher and Old Evil (aka Officially Untitled) is an omniferous, epic amalgamation of anti-anthroposophical allegories and ushers in an era of unseen obscenity, outspoken abhorrence, unsolicited innovation, and arduous amusement.

Ciphertext
Tōo-hȯo? Mē, gaȧ. Gāagaȧ, gaaa tė lė'hā-tė mōnlė'bēhȯn-te? Gaȧ. Taā lō bāo gaȧa. Goȯ-moȯ'maālaamėo meoloȯo gaan bȧ lōoo hoȧ gooōboȯnhāo, lėo hȧn-goȯ loē hoȧ beȯtė baonhan hāan gēo-hoȧ, laō, gān baȯn-goō bȧ taa lāan moagnȧn'loēgȧa'laōngooȯ toā'haan le tāa gȧabȧp hȧan'hoȧ, gōon-lȧonbāap, bāaa! Baon, tė boōn.

Goȧ hēo'mė hėbȧ hoābaȧnloābȧon mao baān? Hoā bāo blȧbȧp lȧ?

Lȧon gȧaa-gȯognan tan? Ge'gaan'ha, goopmā, bāo looȯ-tė bȧotoō bȧn'baōngnāmaānbȧ bnȧa bān lȯo loētoā'heȯbaāmōo

Decryption

The decrypted ciphertext is identical to the plaintext if it only contains symbols that are in the Bible. Otherwise, they are replaced with a specified character such as _.

To showcase the high key sensitivity, the incorrect, yet similar key Evil#667 is used, which only differs from the correct key in the last character. The resulting text is even more gibberish than the ciphertext, showing many characters that could not be decoded at all because the required Boonish syllables were randomly discarded.

on:elsd yn y: aput Jnts; rrintzs; n sbotr,__n your wet, rea__r’t:l, n h a  do nd osaOR Gints wame__es rtaotg  Gd bntkinfort Aamy __low:e jysa:1 whdayoboodtreeirti bll se unMn ook my __gmepond Air __ul ntove__toD, earn.saso e: aus, ee.ust, ssen o10__terwriefohou?__d:ub14:rinbeht, emsntJe b Losheser:4sa7ordhtood blttndehil

Discussion

“Boonish” is a quirky encryption method that could easily be mistaken for a Pacific language upon first glance, lending it a steganographic quality. Its high key sensitivity renders hill climbing attacks ineffective, as the key cannot be gradually adjusted for guessing. The utilization of bigrams and trigrams amplifies the complexity of codebreaking, involving a larger codebase than single characters.

Moreover, the method is relatively fast. For instance, encrypting or decrypting the entire Bible, comprising 790,400 words plus numbers and punctuation, takes approximately 1.5 seconds. Another advantage is that the cipher is case-sensitive, retaining numbers, spaces, and symbols.

However, the most significant drawback is that the substitution cipher is monoalphabetic, rendering it susceptible to traditional frequency analysis. For example, it is easy to deduce that the most common trigram in the English language is “the.” Consequently, one only needs to identify the most common Boonish syllable in the ciphertext to already know 8 % of all words. Fortunately, “the” can also be split into two consecutive trigrams, for example “ th” and “e L,” depending on the position in the ciphertext.

Furthermore, the ciphertext is longer than the plaintext. For instance, the ideal plaintext being the Bible, which was also used to generate the trigrams, results in a ciphertext that is only 1.64 times longer than the plaintext. However, when encrypting a new plaintext, such as the complete works of William Shakespeare, the ciphertext suddenly becomes 2.03 times longer than the plaintext due to the less effective use of trigrams.

Appendix

In case you have not seen enough Boonish yet, or want to try breaking it yourself, find a longer passage below. I will only give you the clue that it is a chapter which is particularly famous with death metal fans.

Loė, hoȧ'mo hōp'gnan'hoābaȯngė hȯp lō hoā gė hȯp me?

Bȧ mā gaaā, len-lāalȧ, moōngaān bnȧnhā too loȧ heō, baȯn'bāo'laōn bōa. Toȧ haȧnho'lō moō'bapbao'gnȧn lō gaān maȧn meo hāne bōa tēp toȧ meomȧagnȧn lō gaān maȧn meo teō boȯp tēp loȧ taan'maān gȯon-gooō laangėo blā'goon, laōn'tan gė tėn lāatōngeȯhȯ-bao lāon tāo lėpmēptȧn ga'hēbȯalȧolāo mo meo bān'gōalaānloo tēp loȧ moo-hā hȧan'tōo laȯ. Bȧ bōp toȯn goā'haȯ maȧ maa boōgaȧ, hoā moōn teȯ gāo baomo maa bo'lēomaȧn gōa!

Looȯ blemeo lȧan'laānlāo mo meo hān hē-bȧ loē'loōohaōgaȧ, mė bȧon'gap loė, loȯ'mo hōp'gaaā, moōhoā boōp gȧohȧneheō bōntōnhaa taābophaȧn māa heo, heȯ bnāpmo meo loōn boȯp laōgoōo! Mo tāo teō blė'hȧa'lō bna laa loȧ hanėgooōbaap tenbēomā'heo gė tėn lāaboȯpgoon, hōn'tan gė lȯ hanėhȧn. Bāon-hȧ loȧ lāanlēomaȧn laon boȧ gōa!

Tȯon looo'blā. Baȧp gė tėn lāablo'lō gė lȯ hanėhȧn. Bāon-hȧ loȧ hȯ, bnȧn, bao tao'goloȯn-goō lėpmēptȧn gāp loȧ hȯ, ta boonbōopgoō gaan'baaȧmāa laȯ'blė bnȧpmȯ lȧagoop boȯ-hagāo'mo maa hȧa tāo lōoo'mōonbāo māa bnaa moōn goā'haȯ gȯa'goȯ booo looogȯagė booo heō mo boȯblē loā, bnȧn gȧan-bȧ bnaā'looo'tān'maȧ bla'toȧ tȧn bȯoblemāa blobnā'laȯ-bȯ bāaplō mȧo'taōbop. Lȯe-gap loė, mėo'mo loȧ gān-loȯoboōp booo'gnan lēo'mȧan, gooō laangėo gāatōgōtoȯnbaȯ māa boȯblē loā, boȯn meo hė'tepbȧ bōp toȯn goōp. Mȯon'looo-taā lō gė lalon gnānten'tȧn, bapmēp'gaan? Mē habaān'mo moa tāo lōoo'mōonbāo māa bnaa boon gaȧn mēpbōon'mȧa haȯ maa bao gōopgnȧn lō boon lōa'la. Gōoogė lȧan-bȧ bnaā'looo'tān'maȧ bla'toȧ bnaa lōa'laā goȧ hānmōon. Tȯo gnȧn lō boon ma boop'gnȧn lō hė'gȧatȧo gap loė, bnap'mo goȧ gė bȯon gaaā. Laȯ moȯmȧan loȧ ga haȯ boȯ laȯ hanėhȧn. Geōbnaa lȧa bnaȧgnȧntȯnheō hȧa gaȯ beo, baap, toȧ bȧon maa maȧnbȧaa'toōn tōa. Maān too loȧ bėo baān-la, teo hēomō maa maȯ bopgnāmaȧn toōn maa taāloon, bangoon, tėp'bȧn-toōn'gaȧn baphe bȧ bȯoploobē'boōp gėo, ga gap loė, taȧn'toȧ, lon hoȧ. Hȧnetoȯ baō māa gaȧa? Tēn boān-bāotēo tȧ? Hō, baȧp geo tao! Boān-blō? Loȧ lon hȧan-gȧo toȯ mȧa haȯ maa baȯn, taȧ mȯ, hȯo goōp leō la'lao moȧ gė hȯp hanėtȯagēo'hȧa toȯn maa lōagnan. Goabȧ hȧ loȧ mėo? Haȯ toōn maa bao gōophȧgoon, heȯ'lȧo mōon'tōo hȧo. Bnȧa gaȯ gaan tōo haalėn bėo'māa-geōgnan hoā gė hōamāotō! Lō teō lȯoo hȧ gȧ, ha, loȯlaān, loo haptȯn'lāo mo loȧ tōa, mēpmaȧ too-gō gaȧabāangoon, beo'lȧo bōp hōahaȧ'la. Moōnhaȯ goȧ gė tȯon looo'lātoo loȧ goōn'hȯo hȯ, hȯo he'hane boōp hao lō geo heȯhaȯ maa bȯoplōebȧ hȧ gėo bā mȧo blo. Taȧngė haa bȧon māa hanėhȧn. Tėo, loȧ goōn'hȯo hȯ, bnȧn, bā bōonloōn boȯp laōgoōo! Mo tāo teō blė'bangoon, bon, lȧo bōp meȯ bnaȧ, haȯ looȯbȯon baap tenboōo, tō gė bȯon loȧ laȯ'toȯ māhȧa tėomeȯ mȧahēomō haȧnmōongnangė hōamāolēo'mȧan loȧ tē.

Heōtoōn baāmaȯ-gaȧ'gōoo'mo laā-te? Toȯ gė lalon gnānten'gāan'mȧan loȧ ga haȯ gōomaa bnāa hȯa'hoā gė bōonlaȯgō mao, gageȯlȧalooȯ bnaāgāa laānboon bȧbaō gė hȯp laȧhā too loȧ hȯ, ta lėp mēbooo boon gė lalon gnānten'gāan'mȧan loȧ ga hop'gė bȯon maa bāolė gooōlaȯ'loo mėotāa bōp, māa maa tōo haalȧa geȯlȧabnaāmaa taābopmaamoōn māan-gȯon, bȧ lėo'hōo looȯ'tȧgoon, gōa, lȧo bōp looȯ bnaāgāa laānboon lōoo'bnȧ tōa. Mėbaȧp gė tȯn tāa bōp, toōn maa tōo haabȧo moon gė tȯn tāa bōp, toōn maa tōo haabaa gaomȧboōnmaōbap, lāo mo baon'bō lon haōbooo'toōn'hȯataāhėo'beō tōnbė, teo-gė tȯn tāa bōp, toōn maa tōo haabaa gaomȧmȯgȧo tan, mē gaȧ'mėo'mo loȧ baon'ge, looōmoȧ-maȧnhaȯ lao, goȧ bȧ loē'loōohanė loā boȧ lō lȧan'mė goōp, tōo'bȧ mȧhaā? Bȧo hė-bȧap, bāamoōn maān-tėbȧon maa maangoo'heōlepbaȯ hāp bȧon maa maanbȯ gaōhȧneholoė, toō, tan gė bȯon ble'laȯn laȯboō boōpmȯntȧataȧnbȧaphe teō gė bȯon hōogė loȯn bop, too'gōa loȧ taan'maān too loȧ hȯ, bnȧn, toȧ'gė gnȧnbāap-laā toōn meo hė'heȯ, mē gaȧ'haāngooō'hȧa toȯn mōa. Boōp, hȯteȯ-boōp lōmoon lȯoo haȯ bopbagė'maō'baā baa'bnap maa hap bāap-laā toōn maa tōo haalon'hāp moa toȯn maa hap bāap-laā toōn mėo, lōo bnāpmo meo hap bāap-laā toȯn mėp-haȧ'bo, mōon. Haȧn boȯogooo-laan'loo mo gaaȧ haȧ'lo

← Return to “Cryptology”

← Return to “Science”

Leave a Reply