WordLoom

The idea for this project originated when I saw an elderly person on the train solving a word search puzzle. Not that I am into this kind of time killer, but I immediately asked myself how these puzzles could be created automatically with maximum word density. And when I say maximum, I mean, how to fill an N-by-N grid with words of length N, both horizontally and vertically?

Full 5-by-5 grids containing common words

The obvious approach would be to brute force it by simply taking a bunch of N letter words and iterating over them until all N columns of the grid coincidentally spell out N letter words. Eventually, we would find all possible solutions. However, it would take close to forever, which I will demonstrate just for the sake of it.

Proof 1: Let us assume that we have 50,000 words in a dictionary, out of which 5,500 are N=6 letter words, e.g., ACROSS. We also have an Intel i9-13900k CPU at hand, with 32 logical cores running independently, as well as a universe, which is approximately 435 quadrillion seconds old. I made a dummy program in C#, a compiled and thereby fast language, to find that my computer can process 18.8 million grids of size N=6 per second. Sparing you the math, to find all possible solutions, the total runtime would amount to 33% of the time since the big bang. I cannot wait that long, let alone pay the electricity bill of roundabout 13.5 trillion € (not considering what I would save on heating).

Proof 2: Obviously, the dictionary size is the limiting factor as it increases the total number of permutations to the power of N=6. To get realistic, let us reduce the number of words until computation can be finished within 24 hours. Sadly, this leaves us with only 109 words of length N=6 in total (ASSESS to SPORTS). Certainly, we know many more than that, e.g., KISSES, ABUSES, and STATUS. Needless to say, not a single solution can be found for these 109 words due to the extreme odds.

Therefore, I did not bother using brute force but instead developed a simple yet effective algorithm. Specifically, it saves the beginnings of all words in separate sets—hashed, of course, to guarantee performance. For example, for the word ACCESS, it stores A, AC, ACC, ACCE, ACCES, ACCESS. What might seem like a waste of memory proves valuable in the long run because many nonsensical permutations are skipped. In simpler terms, if ACCESS in row 1 and SQUIRE in row 2 already fail the word check because no known word starts with CQ to fill column 2, SQUIRE could simply be skipped along with all possible permutations for rows 3 to 6. The entire thing becomes a search tree, and bad branches are automatically sawed off without even looking at their leaves. Movement along the tree is made possible by a simple row index i that can be incremented or decremented as needed.

Even though this search algorithm improves performance immeasurably, I did not originally intend to explore the entire tree because it could still drive up my electricity bill. Thus, I used a second hack by sorting the words beforehand, putting valuable words closer to the top. Only how to determine the value of a word? At first, I calculated a score for each word that increased with each individual letter being abundant in other words. Not only does this put boring words like CARIES and PARIES on top, but it also took me a day until it hit me that there might be better ways. Afterwards, I calculated the score based on how many other words have the same letter in the same position (e.g., the score for PRAISE is increased by the fact that the word WRITER also has R as the second letter). This worked much better, especially since every row had its own sorted word list.

Before long, the algorithm created solutions faster than I could assess them. I decided on N=6 grids early on because N=5 created too many solutions while N=7 yielded not a single one using a dictionary of 50,000 words in total. N=6 seemed like the ideal middle ground with about 4,200 solutions in total. I believe I exploited the entire tree but since adding only a few more words to the dictionary might create numerous new solutions, I do not think I will ever be done exploring. In case you wondered, about 94% of the solutions are symmetrical, meaning that the words in the rows are the same as in the columns. Sometimes, only a few letters are changed, but in rare cases rows and columns differ a lot.

Full 6-by-6 grids with various degrees of asymmetry

Eventually, I found a total of seven (!) N=7 solutions, and after a brief moment of triumph and jubilation, I upped the dictionary to 100,000 words. While the upside is more possible solution, the downside is more exotic words. Throughout the project, I used david47k’s collections, which I suppose contains the most commonly googled words by English speaking users. He even features a mixed list of 1,000,000 that is of course unusable for this project because the majority consists of foreign words, misspellings, cryptic abbreviations, and aaaaaaaa. I wish I had found his work sooner because I needed similar collections for my cipher Lexigraph and for almost all of Love—Violence—Algorithms.

Full 7-by-7 grids containing more exotic words

Like for most of my projects, I initially used Python and only a single CPU thread. But luckily, I experienced incredibly frustrating hardware issues. I narrowed down the problem to either my CPU or my motherboard still being unable to handle 2×32 GB of RAM at 6000 MHz—or any frequency, for that matter, because who jumps over the bleeding edge of hardware often gets stuck between hour long memory tests, reboots, and blue screens. Now is a good time to say a big fuck you to Intel, GIGABYTE, and the entire industry. (I swear this was the first and the last time I swore on this website.)

Anyway, I am not nerd enough to know why the program ran smoother in C#, producing crashes after minutes to hours instead of seconds like in Python. After I changed the program to run on all 32 logical cores instead of just one, it never produced another crash. AI says, the heavier CPU load forced the motherboard to deliver more consistent and stable power to the CPU and RAM, which sort of makes sense. I have a new RAM now and I will not change a thing in my setup until it is ready to retire in early ~~2029~~ 2030.

Unfortunately, I was not able to find any N=8 solutions athough I cycled through the entire 100,000 words collection in about four hours. Even if there were any, they would certainly not contain much English. So perhaps it is time to lay this tiny project to rest after providing you with some of my favorite grids. Truth be told, I was too lazy to look through thousands of them, so I wanted to sort them using AI. Waste of time, as the AI was unable to quantify the currency of fifty thousand words. Therefore, I used a different collection of about 120,000 words, sorted by popularity on Google, to find the best 999 candidates, respectively (for N=7, there are only 154).

Next steps could include turning it into a game where you have to solve grids by putting words in the correct order. By the way, I still have not wrapped my own head around how these word grids actually work. But I am sure it makes for a fun puzzle game.

← Return to “Miscellaneous”

← Return to “Writing”

Anton Hoyer

Leave a Reply Cancel reply

Search

Feeling lucky?

Categories