SQL Multi-Table & Miscellaneous Experimentation

INTRO

Today we’re going to pick up where we left off last month, and today’s article will assume the reader is familiar with the material we covered last time (in SQL Multi-Table Query Optimization).

This time we’re going to dig a little deeper into multi-table SQL queries, conduct some SQL experiments, and look at a way the While function can help speed up and/or extend the capabilities of an ExecuteSQL query.

Demo file: SQL-Multi-Table-Experimentation.zip

Last month we looked at some queries involving three tables.

This time around, we’re going to add a fourth table, Types, to the mix. And one advantage of adding this fourth table is that we can really see the difference that query optimization can make.

Note: as I advised last month, all the examples in today’s article are intended to be run in the data viewer. At the risk of stating the obvious, you should keep the “Automatically evaluate” box unchecked in the data viewer unless you know that the query you are about to run is safe (i.e., is well constructed, and that it won’t cause FileMaker to lock up for an inordinate amount of time).

FOUR TABLE QUERY

Okay, let’s roll our sleeves up and write our first query. We’ve been asked to produce a list of sales for Q4 2021 by state and type, sorted by sales amount from highest to lowest. We need to display the full state name, not just the two-letter abbreviation, so this query will involve all four of the above tables, and we’re going to write the query with all the “join” logic in the WHERE clause, as follows:

ExecuteSQL ( "

SELECT states.state, types.type, SUM ( sales.amt_sale ) AS c
FROM sales, types, customers, states
WHERE sales.date_sale >= ?
   AND sales.date_sale <= ?
   AND sales.id_customer = customers.id
   AND states.st = customers.st
   AND sales.id_type = types.id
GROUP BY states.state, types.type
ORDER BY c DESC

" ; " • " ; "" ; Date(10;1;2021) ; Date(12;31;2021) )

Note that I have “aliased” the sum of sales to “c” in the SELECT clause, and then can re-use that alias in the ORDER BY clause. And the result takes about two seconds to display on my computer. Not too shabby when you consider there are over 100K records in the Sales table.

If we only want to see the top ten, we can limit the results by adding a FETCH clause like so:

ExecuteSQL ( "

SELECT states.state, types.type, SUM ( sales.amt_sale ) AS c
FROM sales, types, customers, states
WHERE sales.date_sale >= ?
  AND sales.date_sale <= ?
  AND sales.id_customer = customers.id
  AND states.st = customers.st
  AND sales.id_type = types.id
GROUP BY states.state, types.type
ORDER BY c DESC
FETCH FIRST 10 ROWS ONLY

" ; " • " ; "" ; Date(10;1;2021) ; Date(12;31;2021) )

A MISTER YUK (POISON) QUERY

As I mentioned last month, the order of the tables in the FROM clause can make a huge difference in performance. If you’d like to freeze up your copy of FileMaker for 30 minutes or more, most likely to the point that you will lose patience and decide to force quit, make one small change to the above query and re-order the tables in the FROM clause like so. Actually it’s a really bad idea and I suggest that you NOT do this.

Note: I speculated on why the table order matters in the FROM clause in last month’s article.

CUSTOMERS WITH SHORTEST NAMES

Okay, let’s move on to our next example. We’ve been asked to produce a list of the top 20 customers with the shortest names in ascending order of length. There are various ways one might accomplish this, but, what the heck, let’s use SQL.

ExecuteSQL ( "

SELECT customer, LENGTH ( customer ) as len
FROM customers
ORDER BY len ASC
FETCH FIRST 20 ROWS ONLY

" ; " • " ; "" )

And sure enough, there are 20 rows in the result, but… what if there are additional customers with five-character names? Wouldn’t it be nice if there were a way to grab the top 20 plus any additional “fives”? Well, good news, there is. We can accomplish this in the FETCH clause by changing the ONLY to WITH TIES.

ExecuteSQL ( "

SELECT customer, LENGTH ( customer ) as len
FROM customers
ORDER BY len ASC
FETCH FIRST 20 ROWS WITH TIES

" ; " • " ; "" )

And our result now contains 22 rows.

BASEBALL TEAM SCHEDULING

Let’s change gears… have you ever been responsible for scheduling team (e.g., baseball) games? Did you know the SQL can help? I didn’t either until I found an example in a book called SQL Hacks by Andrew Cumming and Gordon Russell. We don’t have a “Teams” table in our demo file, but we can fake it by pretending our Types are baseball teams.

And what we want to do is have each “team” play one game at home against each opposing team, and one game away against each opposing team, producing a list that looks like this:

And this is accomplished like so:

"Home       Away¶----------------------¶" &

ExecuteSQL ( "

SELECT home.type, away.type
FROM types home
CROSS JOIN types away
WHERE home.type <> away.type
ORDER BY home.type

" ; " | " ; "" )

Explanation:

Create two aliases for Types, one called “home” and one called “away”
Link them via the CROSS JOIN operator, which is the SQL equivalent of a cartesian join operator in FileMaker
In the WHERE clause, prevent a team from being scheduled against itself
Sort the result by the “home” type

ROLLING DICE – VIEWING ALL POSSIBLE OUTCOMES

We can leverage this trick to solve a somewhat similar problem involving dice rolls. Say you want to generate a list of all the possible outcomes when you roll three dice. You know there will be 216 rows in that list (since there are 6 * 6 * 6 possible outcomes), and you could tediously construct that list manually, but why bother when you have aliasing + CROSS JOIN in your bag of SQL tricks?

Let’s jump over to a table we haven’t looked at yet, UUIDs.

For the purpose of this example we don’t care about the uuids themselves, but we can borrow the id numbers from the first six rows and use them to represent the dice values 1 through 6.

ExecuteSQL ( "

SELECT col1.id, col2.id, col3.id
FROM UUIDs col1
CROSS JOIN UUIDs col2
CROSS JOIN UUIDs col3
WHERE col1.id < 7
   AND col2.id < 7
   AND col3.id < 7
ORDER BY col1.id, col2.id, col3.id

" ; " | " ; ¶ )

…and note that the ORDER BY clause is not actually necessary because the IDs are already in sequential order.

ROLLING DICE – ALL POSSIBLE OUTCOMES WITH TOTALS

What if we want to see totals for each roll? Here’s one way we might accomplish this.

ExecuteSQL ( "

SELECT col1.id, '+', col2.id, '+', col3.id, 
   '=', col1.id+col2.id+col3.id
FROM UUIDs col1 
CROSS JOIN UUIDs col2 
CROSS JOIN UUIDs col3
WHERE col1.id < 7
   AND col2.id < 7
   AND col3.id < 7
ORDER BY col1.id, col2.id, col3.id

" ; " " ; ¶ )

In this example, we have hard coded the “+” and “=” signs into the SELECT clause. An alternative would be to use concatenation operators, but they’re intended to work with text strings, not with integers. ExecuteSQL makes this determination by looking at the field type, and since id is defined as a number field, we’ll need to work around this minor obstacle by using CAST AS VARCHAR, like so:

ExecuteSQL ( "

SELECT CAST ( col1.id AS VARCHAR )||' + '||
   CAST ( col2.id AS VARCHAR )||' + '||
   CAST ( col3.id AS VARCHAR )||' = '||
   CAST ( col1.id+col2.id+col3.id AS VARCHAR )
FROM UUIDs col1
CROSS JOIN UUIDs col2
CROSS JOIN UUIDs col3
WHERE col1.id < 7
   AND col2.id < 7
   AND col3.id < 7
ORDER BY col1.id, col2.id, col3.id

" ; "" ; ¶ )

Either way, this will be our result.

IDENTIFYING DUPLICATES

The whole point of using uuids instead of good old fashioned serial numbers is to avoid the possibility of duplication, but in this case I’ve taken the liberty of including a small number of duplicate uuids in the UUID table for demonstration purposes.

You might think that identifying duplicates would be a natural fit for SQL, but unfortunately ExecuteSQL is a bit, um, anemic in this regard. The following code works, but takes several minutes to execute across 10K records, so is, effectively, unusable.

ExecuteSQL ( "

SELECT uuid, COUNT ( uuid ) as thecount
FROM uuids
GROUP BY uuid
HAVING COUNT ( uuid ) > 1
ORDER BY thecount DESC

" ; " • " ; "" )

The result after waiting several minutes is…

The performance culprit turns out to be the HAVING clause, and I haven’t been able to find a pure SQL work around. (It would be nice if the HAVING clause could work with aliases — same goes for GROUP BY.) Fortunately we can use the While function in conjunction with ExecuteSQL to achieve our objective in a reasonably performant manner on 10K records. This produces the above result in less than two seconds on my machine.

While (

[ fullList =
   ExecuteSQL ( "
   SELECT uuid, COUNT ( uuid ) as x
   FROM uuids
   GROUP BY uuid
   ORDER BY x DESC
   " ; " • " ; "" ) ;
counter = 1 ;
row = GetValue ( fullList ; counter ) ;
result = "" 
] ;

GetAsNumber ( RightWords ( row ; 1 ) ) >= 2 ;

[ result = List ( result ; row ) ;
counter = counter + 1 ;
row = GetValue ( fullList ; counter )
] ;

result

)

And I think that’s enough for today. Next time we’ll look at some ways ExecuteSQL + While can help you perform letter frequency and text pattern analysis on a certain five-letter word game that is all the rage these days.

SQL Multi-Table & Miscellaneous Experimentation

INTRO

FOUR TABLE QUERY

A MISTER YUK (POISON) QUERY

CUSTOMERS WITH SHORTEST NAMES

BASEBALL TEAM SCHEDULING

ROLLING DICE – VIEWING ALL POSSIBLE OUTCOMES

ROLLING DICE – ALL POSSIBLE OUTCOMES WITH TOTALS

IDENTIFYING DUPLICATES

Published by Kevin Frank

Leave a ReplyCancel reply

INTRO

FOUR TABLE QUERY

A MISTER YUK (POISON) QUERY

CUSTOMERS WITH SHORTEST NAMES

BASEBALL TEAM SCHEDULING

ROLLING DICE – VIEWING ALL POSSIBLE OUTCOMES

ROLLING DICE – ALL POSSIBLE OUTCOMES WITH TOTALS

IDENTIFYING DUPLICATES

Published by Kevin Frank

Leave a ReplyCancel reply

Discover more from FileMakerHacks