Today we’re going to do some experiments with the FilterValues function, which takes two return delimited lists for input…
FilterValues ( textToFilter ; filterValues )
…and produces the intersection of the two lists as output.
With the above in mind, when the function was first introduced back in 2004, I assumed that these two statements were equivalent:
FilterValues ( listA ; listB ) FilterValues ( listB ; listA )
But, as I was to realize in due course, they are not equivalent… for two reasons:
#1. As I would have known had I read the online help entry, the order of items in the first list determines the order of the items in the result, e.g.,
vs.
#2. When there is a significant disparity between the length of the two lists, deciding which list should come first will make a huge difference in how long it takes the function to return its result… and this happens to be the subject of today’s demo, which is based on a file Paul Jansen sent me several months ago and which I have modified in various ways and am sharing with his kind permission.
Instructions
1. Download and open the demo file: filtering-values-v1.zip
2. Click one of these two buttons
3. Enter some criteria, for example
4. Observe the results, e.g.,
Note: To eliminate the possibility of caching skewing the timings, data is generated fresh each time you click one of the “filtervalues” buttons.
Some Takeaways
- Unsurprisingly, the more work you ask FilterValues to do, the longer it will take to return a result.
- However, for a given pair of lists, Long/Short is significantly faster than Short/Long.
- For Short/Long, the amount of “overlap” between the two lists has an inverse effect on rendering time — the more overlap, the less time to render.
- Short/Long can take a painfully long time to return a result for larger list sizes, in which case the demo will attempt to prevent you from experiencing this pain by displaying a warning dialog and giving you an opportunity to bail out.
Note: Like all FileMaker functions ending in the word “Values” (plural), except for JSONListValues, FilterValues “helpfully” appends a trailing return to positive results. This has no significance for today’s demo, but can be disconcerting if you aren’t aware of the behavior.
Miscellaneous
Finally, there are a few other things going on in the demo that may merit a bit of exploration:
1. The use of NumToJText to ensure numbers appearing in the results dialog (and in #2 below) have commas as thousands separators.
2. The (mis-)use(?) of single segment button bars to produce dynamic labels.
3. Auto-enter calcs on the three global fields in the card window are an attempt to prevent clever users from engaging in Shift-key and/or cut-and-paste mischief…
- only items from the appropriate value list can be entered
- entry of multiple items is disallowed
- if field is cleared, the first item in the value list will be auto-entered
4. The use of the venerable and wonderful RandomNum_SFR custom function from Six Fried Rice (which they wrote about here and I wrote about here) to derive overlap values from the long list for the short list.
Hi Very interesting ! Now there’s left to create a FilterValuesOptimizedUnsorted (ListA;ListB) which will automatically swap ListA and ListB making sure A is always the longest
Yes, assuming you don’t care about the order of items in the result list.
…as implied by the name of your hypothetical CF — of course the items still would be sorted by whichever list happened to be longer, but I think your point is that you would use the CF only when the order is of the result items is irrelevant.
I always wonder why I never come up with the idea to test speed with these simple things. Good stuff this post!
this is a bit different but you can use this calculation to compare two lists say ALL and USED to get a unique set. –
Let (
[
all = demo::long_list ;
used = demo::short_list ;
// COMBINE BOTH LISTS, WITH USED VALUES FIRST
combined = UniqueValues ( List ( used ; all ) )
] ;
RightValues ( combined ; ValueCount ( combined ) – ValueCount ( used ) )
)
For any of the …Values functions that as you put it, FileMaker “helpfully” appends a trailing return, we use an equivalent custom function that strips that trailing return. That way you don’t have to think about it, if you are looping over the results or whatever. Pretty simple. Matt Larson is also clever with the names. He calls this “FilterValuesFixed”:
/*
Purpose:
Same as FilterValues() but without the trailing carriage return.
History:
2019-04-17 – Matt Larson (LuminFire) – Custom function creation.
*/
Let (
[
¢Values = FilterValues ( theTextToFilter ; theFilterValues )
] ;
Left ( ¢Values ; Length ( ¢Values ) – 1 )
)
A coworker just asked why not use a generic CF like “TrimTrailingReturns” and wrap that around the built-in “FilterValues” function.
My answer:
Well, you’d have to think about that when you use it, and remember that this is what you really want. With FilterValuesFixed, it will automatically come up as a choice in the auto-complete when you try to use FilterValues, reminding you of its existence, and letting you know that it is better. ;^)
Thanks for doing some legwork here. Very useful.
Initially I wrote a “clever” While() Calc to reverse the lists and perform the filter values in chunks to see if that would improve the times when the second list is longer. I was going to post it when I read the comments so far and was reminded of the desire to get the result in the correct order. This is what I came up with:
if The first list is longer:
finalResult = FilterValues ( listOne ; listTwo )
if the second list is longer first reverse the lists
result = FilterValues ( listTwo ; listOne )
Then run again with the result above as the second list
finalResult = FilterValues ( listOne ; result )
My calculation for this as a custom function is quite simple (which is always satisfying)
If( ValueCount ( listOne) > ValueCount ( listTwo ) ;
FilterValues ( listOne ; listTwo ) ;
FilterValues ( listOne ; FilterValues ( listTwo ; listOne ))
)
Even with the lists containing 50,000 and 1,000 items the times appear to be similar regardless which list is longer.
To get rid of the trailing returns, I generally use
LeftWords ( listOfValues ; 10^10 )
That’s fine as long as list items consist of only letters and numbers. Not fine if, for example, the final item in the list is “Account #”.
Ouch! I guess using # as the number symbol is more of a US thing, so it’s never come up for me in the UK.
Sigh! I rather liked that only one function was required. I guess I have some updating to do…
Left ( listOfValues ; Length ( listOfValues ) – 1 )
Very nice. Thank you Paul!
I have to admit I was surprised that using FilterValues twice did not have a significant impact on the time.
On the second invocation, your lists are of equal length, you have 100% overlap, and the initial result is in memory,
On reflection, it should not have been a surprise, I guess I couldn’t believe the solution could be so simple…
Well it’s been staring a lot of us in the face, but you were the one who figured it out. Well done old chap.