Successfully Find Duplicate Values Within A Set Of Records

The need to isolate duplicates within an existing found set has often been a source of frustration for developers. A couple weeks ago we examined a technique by Ralph Learmont demonstrating that, contrary to popular belief and experience, the ! (find duplicates) operator can, under certain conditions, be coerced into isolating duplicate values within a found set.

4-24-2015 3-24-21 PM

Here’s Ralph’s original statement of the problem:

There’s a quirk in the way Filemaker deals with duplicate records. This makes it difficult to find duplicate records WITHIN a found set of records. If you try to constrain the Find to the current set of records, you might discover extra spurious records appearing. These records have “partner-duplicates” outside of the found set. These unwanted spurious records will appear as single occurrences when you inspect a sorted column. Technically they are duplicates. It’s just that their partners lie “outside”, hidden in the omitted slab of records.

And why it’s a cause for concern:

I think it’s important to explain why finding duplicates in a found set is something that often appears to work… and it won’t be apparent there actually IS AN ISSUE unless one goes to the trouble of actually sorting results and checking for those “single-occurrences”.

While the technique was blazingly fast, it seemed to me there was something a bit voodoo-like about Ralph’s previous demo, as well as my attempt at simplification. Apparently Ralph wasn’t satisfied with it either, because he has provided a new demo (Find Duplicates in Found Set) showcasing a more straight-forward technique, and kindly agreed to share some of his thoughts about it as well.

5-9-2015 5-32-38 PM

[Ralph’s commentary]

I’ve been held up with various things, one being an compulsive obsession to get back and play with that temporary window trick which lets one use the constrain feature to properly find duplicates within a found set. It appears to trick (coerce) Filemaker into properly detecting duplicates WITHIN the found set while ignoring the “phoney” ones outside.

I revisited some demo files and pondered for some time about the mechanics of how this actually worked. I needed a more certain insight. As much as I enjoy getting something to work, it’s so much better if one can explain why…

So by trial and error and playing with mockups a different explanation was to become evident…

Just to go back to where you left off, you offered a simplification with this definition:

FoundSetAware =

If ( Get(WindowName) = $$windowname ; FirstName )

I agree. That works. I know you know, but just to remind myself — it has the benefit of not having to rename your window (temporarily), so that’s definitely an improvement.

I still pondered over the window-name dependency. I wasn’t convinced of my own “theory” that a window’s found-set properties go beyond “knowing” what records it contains. It’s common knowledge that a window “knows” the internal IDs of the found set of record it contains and Filemaker is able to export it in the form of a snapshot link, not as a list but in the more efficient range format for adjacent IDs (such as 252…374). But when we enter Find mode and then perform a constrain, why does constrain only work with “tangible” values but not for the “!” operator?

Just to digress, I’m an engineer with an aviation background and a love of model aircraft. In the early days we had what is known as “free flight models” and “chuck gliders”. There was a real art in getting these to fly for extended durations, and one of the procedures in preparing one’s model for competition was known as “trimming”.

To explain, these light-weight gliders were extremely dependent on having all the various build factors correct and held to tight tolerances. Stability was a “function” of many inputs, such as centre of gravity, the angle between the wing and the horizontal tail, whether or not there was a hint of twist (or curl) in the slim balsa trailing edges. One could carefully “adjust” any of these and go throw the chuck glider up into the sky and watch carefully how its behaviour had changed.

The lesson we learnt was to be patient and only adjust ONE thing at a time! That’s fairly obvious, but what’s not so obvious is that a combination of factors, say, “combination A” can yield the same outcome as a different “combination B”. For example, “more” nose-weight combined with a slight bit of “up” trim will appear to give the same flight characteristics as “less” nose-weight with no “up” trim.

In the FileMaker case, I achieved a good (wanted) result due to what I originally perceived as being a dependency on window name.

But that’s not the real reason!

The real answer is because the calculation is UNSTORED.

Although I’m still no closer to an explanation, I have tested the calc with this simplified definition, and it continues to work: SelectedField =

FirstName

(just make sure it’s unstored)

It’s so simple!  Just use an unstored calc that is equal to the field you want to inspect.

To avoid having many new calc fields (for all possible search cases) a generic approach/indirection appears to work almost as fast. Incidentally I’ve found too that the speed of sorting is only slightly slower when sorting unstored data (compared to actual data). Also the time taken to perform a find (or a constrain) is little affected.

In my subsequent work I am using Get(ActiveFieldName) to be the driver of “which field” is to be the subject.

In a script, I set variable $QualifiedFieldName from the active table and active field, like so:

Get(ActiveFieldTableName) & "::" & Get(ActiveFieldName)

And I now define SelectedField as:

GetField ( $QualifiedFieldName )

I then wondered if Filemaker’s Constrain found set combined with the ! operator was a function of whether or not a field is indexed. Initial testing appears to say it does. The fact that I am using an unstored calc field is secondary to the fact that the field is not indexed.

If that’s so, it doesn’t change the solution to the problem really. Yes it would also appear to work on a normal text/number/date etc. field provided that field is not indexed.

PastedGraphic-30

Because indexing is an area that people mostly don’t take much notice of, and because the default is to Filemaker automatically switch it on should it be needed, the solution still remains in providing a field that’s unindexed. That field can be a “normal” field or an unstored calc field.

5 thoughts on “Successfully Find Duplicate Values Within A Set Of Records

  1. Kevin Frank Post author

    Ralph, I really like the two “Live Demo” routines in your demo file. Nice work!

    Reply
  2. Myself

    Superb, many thanks for sharing your important discovery.
    What’s left needed is a way to flag / delete the extra occurrences, so only one remains. Any thought about that ? I used to do it with a relation, but obviously it can’t be generalized

    Reply
    1. Kevin Frank Post author

      It’s a big subject, and beyond the scope of this article. I suggest you ask your question in one of the numerous FileMaker forums that exist for this purpose.

      Reply
  3. thomann061

    Can you do a post on the new script step “Refresh Portal” and compare it to “Refresh Window” and maybe setting the primary key back to itself. I have no doubt that this is tuned towards filtering a portal…hence the name.

    Thanks!

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s