Fast Summaries Revisited

If you do complex data analysis, then from time to time you probably need to group, summarize, and parse data into variables and/or fields. There are various ways to accomplish this, including the Fast Summary technique, which regular readers of this blog may be familiar with, as it has made a number of appearances here over the years.

Well today we’re going to look at a couple alternatives to Fast Summaries, with the help of some demo files, which you can use to do your own benchmarking.

2016-03-29_223920

Or, if you’d rather not do your own benchmarking, scroll down to see results from my tests. Bottom line: each of these three techniques can be fastest, depending on circumstances.

(Also, if you’ve been putting off exploring Perform Script On Server, a.k.a. PSOS, host the demo files on FileMaker Server 13 or later. The reporting routine optionally uses PSOS, so you can dive in painlessly and see what you’ve been missing.)

Ad-Hoc vs Structured Queries

Before going any further let’s make a clear distinction between “ad-hoc” vs. “structured” queries. By “ad-hoc” I mean the found set to be processed is generated via unscripted or unstructured activity, such as manually locating a set of records via some combination of find/extend/constrain/omit/etc… in other words, the query criteria can not be anticipated in advance.

The reason I want to draw this distinction is that if you primarily process ad-hoc queries, then it appears that, across the board, the Fast Summary technique is faster than the alternatives we’re about to examine, and you may want to stop reading now. (To be clear, the other methods can be adapted to work with arbitrary found sets, but performance cost will be unacceptably high.)

On the other hand, if most of the parsing and/or reporting and/or charting you do is based on data sets that fall into the “structured” category, e.g., purchase orders by vendor and/or month and year, sales by region and/or salesperson, web visits by week and year, customers by type (walk-in, phone or web), etc., then I invite you to continue reading and explore the demo files in this archive:

Fast-Summaries-Revisited.zip (10Mb, FMP 13 or later required)

2016-03-30_164248.png

The Basic Idea

Each demo showcases multiple techniques to accomplish the same goal: group and summarize the records in the order table by vendor, and parse that data first into variables, and ultimately into a “summary report in a text field” that looks like this:

2016-03-29_232144

To test, open a demo and click one of these three buttons:

2016-03-29_233217

To avoid possibly skewed results due to caching, I recommend closing and reopening the demo between each test. Also, if you’ve opted to host the files on FileMaker Server 13 or later, then you will be offered an additional option:

2016-03-29_232937

Which will be reflected in the results:

2016-03-29_2342

The Methods

A. Fast Summary (FS)

B. Multi-Find (MF)

  • Grab a list of vendor ids (sorted in vendor name order)
  • Loop through this list, do a find on each id, and push summarized values into variables
  • Aggregate the variables into a global text field

C. Global Relationship (GR)

  • Grab a list of vendor ids (sorted in vendor name order)
  • Loop through the list, push each id into a global field which is related to a second occurrence of the order table, count or sum values across this relationship and push into variables
  • Aggregate the variables into a global text field

Note: you can examine the code for all three methods in the “sub: generate report” script.

Test Results

All tests were conducted using FMP 14 and (where noted) FMS 14. In case it isn’t clear, “server side” means with PSOS, and “client side” means without.

2016-03-30_121623

Note: you can easily add as many order records as you wish in any of the demos by running the “create order entries” script, which will prompt you for the number of new order records as well as starting and ending years those orders should fall within.

Earlier Test Results

(No PSOS or Global Relationship, but some WAN benchmarking, as well as a comparison of FMP 13 vs FMP 14 performance.)

2016-03-30_182315

What about SQL?

2016-03-30_004438

Unfortunately SQL appears to not be a good fit for this particular challenge. Test and see for yourself.

Conclusions

At the risk of stating the obvious: these demos are intentionally bare-bones, and will not reflect the complexity of most real-world scenarios. Nonetheless I believe some conclusions can be drawn that will apply out in the wider world, at least under certain sets of circumstances.

1. While the Fast Summary technique can traverse found sets very quickly, it must first sort the records to be summarized, which imposes a performance penalty directly in proportion to the size of the found set. (But see my comments above re: ad-hoc vs. structured queries.)

2. For moderate record counts, between 5K and 40K records processed locally, and up to 80K records client-side on a LAN (i.e., without the benefit of PSOS), the Global Relationship technique is fastest.

3. For larger record counts, or when processing server-side (except for the tiniest found sets), Multi-Finds are a clear winner.

5 thoughts on “Fast Summaries Revisited

  1. Vincent

    Wow it seems you demonstrated that server side (at least PSOS), is much slower as Local.
    That’s a shocker. I thought speed would be mostly the same. That’s bad news.
    Could you try without PSOS, as a true server side script (a “scheduled” one)

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s