ExecuteSQL, Level: Advanced, SQL, Version: FM 18 or later

Exploring Wordlespace with SQL and While

Recently we’ve discussed optimizing SQL queries in FileMaker, and had some fun with various SQL experiments. Today we’re going to explore some ways FileMaker can use ExecuteSQL and the While function to perform letter frequency and text pattern analysis on candidate words for the popular Wordle game.

The list of words comes from https://github.com/tabatkins/wordle-list and purports to include the actual answer words, as well as all allowable guess words. I don’t know how valid this list of words actually is, or, assuming it is currently valid, whether it is carved in stone or will change at some point in the future.

Demo file: SQL-Multi-Table-Experimentation-Wordle.zip

Some Notes

  • Today’s file is functionally identical to the one from last time; if you already have it, there’s no need to download this one.
  • No attempt is made to differentiate between daily Wordle words that have already appeared vs. those that have yet to appear.
  • SQL is case-sensitive in the WHERE clause; all our examples today use lower case letters so we may safely ignore the issue for the duration of this article.
  • For a general-purpose introduction to SQL in FileMaker, see Beverly Voth’s Missing FM 12 ExecuteSQL Reference.

About the Demo

Today we will be looking at the “words” table, which has 12,947 entries — each of which is a five letter word — comprising all eligible Wordle guess words, with the actual answer words indicated by a 1 in the “wordle” field.

All of today’s examples will involve typing or pasting queries into the data viewer, and just to get our feet wet, we can generate a list of all words, and a count of all words, via these simple statements.

And we can narrow things down to just the Wordle answer words by adding a WHERE clause.

Example 1: Overall Letter Frequency

If we take 2,309 as our total number of answer words, and multiply by five, that gives us 11,545 letters in “Wordlespace”. To determine how many times each letter appears, we can perform a rudimentary frequency analysis like so…

While ( [
letterList = "a¶b¶c¶d¶e¶f¶g¶h¶i¶j¶k¶l¶m¶n¶o¶p¶q¶r¶s¶t¶u¶v¶w¶x¶y¶z" ;
wordleList = ExecuteSQL ( "
             SELECT word
             FROM words 
             WHERE wordle = 1
             " ; "" ; "" ) ;
vc = ValueCount ( letterList ) ;
counter = 1 ;
result = ""
] ;

counter <= vc ;

[
letter = GetValue ( letterList ; counter ) ;
frequency = PatternCount ( wordleList ; letter ) ;
result = List ( result ; letter & " • " & frequency ) ;
counter = counter + 1
] ;

result

)

…producing an alphabetically sorted list that looks like this.

Explanation:

  1. Generate a list of the 26 letters in the English alphabet
  2. Generate a list of all words with 1 in the wordle field
  3. For each letter calculate how many times it appears in the word list

Note: if you aren’t familiar with the While function, or could use a refresher, see

Can we sort the results by frequency, rather than alphabetically? Yes, by wrapping the result from the above example in SortValues and using the “-2” (numeric, descending) argument.

Viewed this way, one gets the idea that “orate” might be a good starting word. We’ll return to frequency analysis in our final example.

Example 2: String Matching  

Suppose you’re curious re: how many Wordle words contain a certain substring; you don’t care where in the word the string appears, but you do want the characters to appear sequentially… for example the three-character substring “tor”.

ExecuteSQL ( "

SELECT word
FROM words
WHERE wordle = 1 
AND word LIKE ?

" ; "" ; "" ; "%tor%" )

Explanation: The “%” wildcard indicates zero or more characters. By placing the % on either side of the substring, we are indicating that the substring may occur anywhere within the word.

Example 3: Filling In Blanks 

Let’s say you’ve guessed B, A and E, as, respectively, the first, third and fifth letters, but don’t know the second and fourth ones. You can use the LIKE operator and the single-character wildcard “_” to display matching candidates.

ExecuteSQL ( " 

SELECT word 
FROM words 
WHERE wordle = 1 
AND word LIKE ? 

" ; "" ; "" ; "b_a_e" )

Explanation: The “_” wildcard represents a single character.

Example 4: Words Containing Four Vowels (A, E, I, O or U)

Five letter words containing four vowels are a rare occurrence in the English language, so let’s initially do this query without restricting the results to Wordle words.

While ( [
wordlist = ExecuteSQL ( "SELECT word FROM words" ; "" ; "" ) ; 
result = "" ; 
vc = ValueCount ( wordlist ) ; 
counter = 1 
] ;

counter <= vc ;

[
item = GetValue ( wordlist ; counter ) ; 
vowels = Filter ( item ; "aeiou" ) ; 
match = If ( Length ( vowels ) >= 4 ; item ; "" ) ; 
result = List ( result ; match ) ;
counter = counter + 1
] ;

result

)

Explanation:

  1. Generate a list of all words
  2. Iterate through the word list
  3. Populate “match” if a given word contains 4 or more vowels
  4. Build up the result list with non-empty match words

And if we want to restrict results to just Wordle words…

Example 5: Words Without A, E, I, O or U

What if we want to turn things around, and list Wordle words that don’t contain an A, E, I, O or U?

While ( [
theList = ExecuteSQL ( "
          SELECT word
          FROM words
          WHERE wordle = 1
          " ; "" ; "" ) ;
vc = ValueCount ( theList ) ;
counter = 1 ;
result = ""
] ;

counter <= vc ;

[ 
itemRaw = GetValue ( theList ; counter ) ;
item = If ( isEmpty ( Filter ( itemRaw ; "aeiou" ) ) ; itemRaw ) ;
result = List ( result ; item ) ;
counter = counter + 1
] ;

result

)

Explanation:

  1. Generate a list of all Wordle words
  2. Iterate through the word list
  3. Populate “item” if a given word contains no standard vowels
  4. Build up the result list with non-empty item words

Example 6: Position-Specific Letter Frequency

Returning to the topic of frequency analysis, in example 1 we looked at overall letter frequency. This time we’d like to generate a position-specific list, and once again, a combination of While and ExecuteSQL can get the job done.

While ( [
letterList = "a¶b¶c¶d¶e¶f¶g¶h¶i¶j¶k¶l¶m¶n¶o¶p¶q¶r¶s¶t¶u¶v¶w¶x¶y¶z" ;
vc = ValueCount ( letterList ) ; 
counter = 1 ; 
result = "" 
] ;

counter <= vc ;

[ 
letter = GetValue ( letterList ; counter ) ; 
f1 = ExecuteSQL ( "SELECT COUNT ( * ) FROM words 
WHERE SUBSTRING ( word, 1, 1 ) = ? AND wordle = 1
" ; "" ; "" ; letter ) ; 
f2 = ExecuteSQL ( "SELECT COUNT ( * ) FROM words 
WHERE SUBSTRING ( word, 2, 1 ) = ? AND wordle = 1
" ; "" ; "" ; letter ) ; 
f3 = ExecuteSQL ( "SELECT COUNT ( * ) FROM words 
WHERE SUBSTRING ( word, 3, 1 ) = ? AND wordle = 1
" ; "" ; "" ; letter ) ; 
f4 = ExecuteSQL ( "SELECT COUNT ( * ) FROM words 
WHERE SUBSTRING ( word, 4, 1 ) = ? AND wordle = 1
" ; "" ; "" ; letter ) ; 
f5 = ExecuteSQL ( "SELECT COUNT ( * ) FROM words 
WHERE SUBSTRING ( word, 5, 1 ) = ? AND wordle = 1
" ; "" ; "" ; letter ) ; 
result = List ( result ; letter & " • " & f1 & " " & f2 
& " " & f3 & " " & f4 & " " & f5 ) ; 
counter = counter + 1
] ;

result

)

Explanation:

  1. Generate a list of the 26 letters in the English alphabet
  2. Iterate through the letter list
  3. For each letter issue five SQL statements to calculate the frequency of that letter for each position
  4. Build up the result list accordingly

Some interesting takeaways…

  • “s” is the most common initial letter
  • “e” and then “y” are the most common ending letters
  • “a” is most common letter at positions 2 and 3
  • “e” is the most common letter at position 4
  • no words begin with “x”
  • no words end in “j”, “q” or “v”
  • no word has “q” in the fourth position
  • only one word ends in “u”

Closing Remarks

A good resource for all things Wordle is this community on Reddit.

2 thoughts on “Exploring Wordlespace with SQL and While”

  1. Initially, there was a list of ~2300 words for possible answers that the creator’s girlfriend set up. I don’t know if this carried over to the NYT version. Here is an interesting take on how to solve it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.