In part one, we explored various “anti-deduping” techniques. As you may recall, the challenge was to retain duplicates and omit unique entries from within an existing found set, as opposed to starting from all records… otherwise we could have just searched on ! (find all duplicates), but the ! operator does not play nicely with constrain, and cannot help us with this particular challenge.
Update 9 May 2015: it turns out that there is a way to reliably use contrain with the ! operator from within a found set — see Ralph Learmont’s technique + great demo + explanation here — Successfully Find Duplicate Values Within A Set Of Records
Also, last time, we were seeking to omit unique zip (postal) codes, which, in the example file, were always five digits long, i.e., of fixed length. Today our goal is to omit unique (a.k.a. “orphaned”) first names, and it turns out there are some additional challenges when the field in question is of variable length, which we will explore in today’s demo file: Anti-deduping, part 2.
Question: if we say we want to locate “all the Marias”, what do we really mean? Here are some possibilities… Continue reading “FM 13: Anti-deduping, part 2”