Nerdy creations with numbers, words, sounds, and pixels

Science

Lexigraph

Lexigraph - advanced nomenclator cipher by Anton Hoyer

This symmetric encryption method is based on a dictionary, technically making it a nomenclator cipher. In fact, any digital dictionary can be used, the larger the better. I work with one that consists of several smaller dictionaries from various sources and contains 466,480 words in total. Consequently, the resulting ciphertexts are a jumble of different English dialects, loanwords, alternative spellings, abbreviations, and internet slang.

Encryption

First, the words from the dictionary are categorized by their length L because, in essence, each plaintext word is exchanged for another word from the dictionary with the same length L. Of course, it would be safer to prevent reading the word lengths from the ciphertext. However, it was a choice I made so that plaintext and ciphertext are of the same size.

The words chosen for the ciphertext have a complex mathematical relationship with the plaintext words. To understand this, you need to at least know what a polynomial is. In case you do not, I recommend reading my post on “PN Trajectory.”

To encrypt the plaintext, you need a password comprising arbitrary Unicode characters (e.g., L3x1gr@ph.), which is used to create two integer numbers; by converting each character to its integer representation using Python’s ord() and concatenating them, one particularly long number is created, namely the numeric key. A second, much shorter number is calculated from the mean value of all integer representations, namely the polynomial degree.

If the resulting numeric key is too short, I simply concatenate it with itself, possibly several times. Its minimum number of digits is the polynomial degree because it needs to be split into just as many groups of approximately equal length.

  • Password: L3x1gr@ph.
  • Numeric key: 7651120491031146411210463765…463 (560 digits)
  • Polynomial degree: 85
  • Polynomial coefficients:  7651, 1204, 9103, 1146, 4112, 1046, 3765, 1120, …, 1464, 1121, 463

All this may seem complicated, but its sole purpose is to create a bunch of short numbers that are easily reproducible if you know the password. Once the coefficients are computed, the actual encryption can begin. For this, we need a number N that is constantly being increased in steps which are known to us, yet unknown to potential attackers. To start, the sum of all coefficients makes a good N. The next bit is best described in Python code:

P = 0
for A in coefficients:
    N += A * pow(L, P)
    P += 1

Or in English: N is incremented using a polynomial characterized by pre-computed coefficients A and by inserting the length L of the current word.

Regarding the actual encryption, each plaintext word is located within all words of length L. The corresponding ciphertext word is the one located exactly N words ahead. Due to N being very large and the set of words rather limited, the running index cycles and begins from the start once the last word is reached. This can be easily and efficiently achieved by using the modulo function, which plays an important role in many encryption methods.

If a plaintext word happens to be absent from the dictionary, it is split into individual letters. This might be seen as a weakness because the set of single letters amounts to only 26, and having several letters in a row might give away the value of N. Fortunately, N is very large, and reversing the modulo function proves to be difficult.

Plaintext
“Also, had the resumption of our postjuvenal heartbond not quasiforced me to vocalize my smoldering love for her, I wouldn’t have had a great deal to talk about in therapy. Anger unmistakedly beats grief, I now know; so, do pardon my Quasifrench, but she may lick my urogenital cleft lengthways for leading me on so unforgivably!”

Ciphertext
rtac, pak biu thecaspore pr mti unopportune dogbanner ddb assertorial du md orometry ee polymixiid jees dtp dlg, r pitied’z rack rip l laxer sepg ve fear scamp tu lingams. alben phototropism penis akasa, e doy eloy; gu, fo nepote pe bastelhouse, csk rsa ipl capn mn unstartled ramal formulated pli underdo to xb cp arithmetical!

Decryption

If one has access to the same dictionary and knows the password, the ciphertext can be decrypted by performing all steps in reverse order. This is also known as symmetric encryption.

To showcase the strength of the cipher, a flawed key like L3x1gr@ph? is used to decrypt the ciphertext. Despite being very close to the correct key, differing only in the last character, it results in only gibberish that is indistinguishable from the ciphertext itself. This highlights the encryption method’s high key sensitivity.

hath, tji ags mealmonger gn sdf unfibrously clubarmed raf fullcolored yo my durneder ct insonorous arnt lit dpe, y chapei’v real vdu k boonk mord rb rowy gamet yl shinshu. crary paleoatavism wrapt denat, m dag wulf; pn, he batboy ws amenability, vga ers ops roti nj fourfooted maddy amygdalate fah plosive aq jg uh selfevidence!

Discussion

“Lexigraph” may not offer the same level of security as established cryptographic algorithms like SHA. For instance, preserving word lengths in the encryption makes “I” and “a” susceptible to easy guesswork from the ciphertext. Furthermore, the encryption is limited to letters, requiring numbers to be spelled out and allowing symbols to be either copied or ignored.

However, it is an enjoyable, polyalphabetic, and still relatively fast encryption method that provides reasonable protection against common codebreaking techniques like brute force or hill climbing, especially when a sufficiently lengthy password is used. These observations are based on assumptions, as unfortunately, I lack the mathematical expertise to prove the actual security or vulnerability of this encryption method.

← Return to “Cryptology”

← Return to “Science”

Leave a Reply