Yes, I am that much of a geek. It’s not enough to learn to play a game. I have to learn how the game works, programmatically.
Oh, I am assuming you all know what Wordle is. It’s a word guessing game. There are five spaces. You enter a five letter word. The game tells you:
Which letters are in the correct spaces
Which letters are correct, but in the wrong spaces
Which letters are incorrect
With that information you make another guess, narrowing down till you get it right. Only, you only have 6 tries.
What got me going on this was the discovery of the words file on my Linux installation. It’s a simple text file and screams, “Use Me!” The trick then is to convert the game feedback into a method for filtering the words list.
I started out using the BASH shell to parse the words list, but found issues with variable handling, so I switched to PERL to build the query, but use BASH to execute the query.
My first task was to figure out the best start word. As I wrote the program though, I discovered that the start word isn’t nearly as important as I thought. Anyway, here’s what I did.
I found a list online of the how often each letter of the alphabet is used. I cross-referenced that with a list of the most commonly used words. From this I came up with the word RAISE. It works very well as a start word, but any word with at least a couple of the more common letters works just as well.
So we enter the first and the games gives us feedback. First we search for exact matches. At the same time we identify each of the spaces that aren’t matching.
I start with this pattern “^…..$”.
The “^” indicates the start of the string. The “$” indicates the end of the string. The five “.” indicate each of the five spaces, with the period matching any letter.
So the start pattern, all dots, matches every word in the words file.
For each letter shown to match exactly we substitute the “.” in that position. So if the first letter was an “r”, the pattern would be, “^r….$”.
Since the “^” and “$” are used for every pattern, we don’t need to enter them. I named the program wordle.pl. The command typed in the BASH shell would be:
This would return a list of all 5 letter words starting with “r”.
The actual query is: cat /usr/share/dict/words | grep ^r....$
A problem shows up from this query though. It includes capitalized words, which are not valid in the game. So I added this code to the end of ever query to exclude words with capital letters.
grep -v [A-Z]
Next we have to deal with the letters that are used, but in the wrong space. Since there can be multiple letters that are in the same wrong space across multiple guesses, we can’t use a single pattern. We have to modify the query to exclude words with each incorrect letter per space.
For entry I start with an empty comma-separated list. (,,,,) Then I add any incorrect space letters. If the letter A was in the second space, but shows as the incorrect space, I would enter “,a,,,”.
If on the next try the letter “l” showed as incorrect in the second space, the entry would now be “,al,,,”.
The query listed under Partial Matches will not help much because it does not say that the letters should be included in the word. That takes another step, but before we include these letters we have to list what letters to exclude.
Both excluded and included are simply strings of letters.
So the command, using the start word “raise”, would now be:
./wordle.pl r…. ,a,,, ise ra
Creating the query:
cat /usr/share/dict/words | grep ^r....$ | grep -v .a... | grep -v [ise] | grep r | grep a | grep -v [A-Z]
This returns a list of words that:
Start with the letter r
Contain the letter a, but not in the second spot
Do not contain the letters I,s or e
Contains no capital letters
Contains one, or more, of the letters r and a
After a few test runs I noticed a problem, duplicate letters. It wasn’t a problem if the answer contained duplicate letters, but If a guess had duplicate letters and only one is in the answer, this was not handled by the query.
So I added a list of non-duplicating letters. In the example below I’m excluding duplicates of the letter “r”.
./wordle.pl r…. ,a,,, ise ra r
This is the query that gets created, adding the command “grep -v r*r”
cat /usr/share/dict/words | grep ^r….$ | grep -v .a... | grep -v [ise] | grep r | grep a | grep -v r.*r | grep -v [A-Z]
With the query complete I ran a whole bunch of tests. The results were useless. All I was getting was an alphabetical list of words with no context.
In response I took the most common letters list, gave each word a numerical value and sorted the list so the highest value was at the bottom of the list. This helped a bit, but I noticed that there was too much emphasis on words with duplicate common letters.
I changed the program to count each letter just once. That improved things, but the list was highly ranking some pretty obscure words.
I added lookups to lists of most common words and most common 5 letter words. This made words much easier to pick from the list and greatly improved the success rate.
relax 7191 Common
reach 7235 Most Common
From the output I now choose in order of availabilty:
The highest ranked “Most Common” word
The highest ranked “Common” word
The highest ranked word
The program now does better at guessing than I do on my own, but it is not perfect. Three times it has failed because it did not know the word. However, in the case of the word “apnea,” I could not find it an any word list on windows or linux. I also doubt that I could have come up with apnea on my own.
I need to improve the interface. I’m used to using the command line, but most people want a GUI. I’m not sure if this will be a desktop app or a website.