You might be wondering, why did she do that?
It starts off with taking responsibility for making a mistake.
evaluate: how did dupes get created?
A few months ago, I did a data migration of about 35k contacts. There was a high overlap of contacts already in the database, so I used a few different methods to match them up in Excel. I checked in with my team when I went to import about 5k new records. Does this seem in the ballpark for the number of new contacts we expected? The answer was yes, so I uploaded them. A few months later, my team let me know that they were having more problems with duplicates ever since the upload. Oops.
I quickly identified my mistake, and it’s an embarrassing one to write about! I used a VLOOKUP to match contacts… but I didn’t realize that the lookup table had relative cell references. So as the formula repeated down the last column, the lookup table got smaller, and fewer matches were found that should have been found. Since my data were sorted alphabetically, this also meant that there were more matches at the beginning of the alphabet than the end of the alphabet.
I’m literally eating a pie from the famous Philadelphia bakery, 2nd Daughter, right now and thinking “this is my humble pie!” I make mistakes, we all do, and it’s liberating to write about what to do next. Thanks for accompanying me on my journey!

analyze: how many dupes are there and what are the criteria for finding them?
Salesforce has certain features for identifying and merging duplicates, as this is an evergreen problem that we all face. I recommend reading up on these features and turning them on, if you haven’t already. This is a multistep process and can impact external forms, integrations, data migration, page layouts, and probably other things too, so don’t be afraid to take it slow and try everything out in a Sandbox if you can.
Here’s a link to official documentation from SF on the standard criteria for evaluating duplicates. This is really worth checking out and maybe even reading aloud to yourself to understand everything happening under the hood.
I did some digging to see if these tools could help me with my 2k duplicates problem and the sad answer was… no. Here’s why:
- Licensing constraint: There is a feature for “mass” evaluating duplicates, but it is only available in “Performance” and “Unlimited” editions of Salesforce, which we do not have.
- Technical constraint: Even though duplicates showed up on the Lightning page (using a Standard Lightning component), they did not show up when I was running a report of Duplicate Record Sets unless I edited a field that was involved in evaluating Duplicate Rules. I did not want to update 25k records again for a variety of reasons, so I came to the conclusion that I could not use Salesforce to find and merge my duplicates the “easy” way.
- Budget: There are many thirdparty platforms that have functionality for duplicate management, but my budget for this was approximately $0 (full disclosure – not counting my time).
- Mass delete: In the past, I’ve sometimes taken the path of deleting duplicate records that were meaningless. They were not attached to relevant data like donations, program enrollments, grants, activities, or anything else. I could use reports and cross-filters to find these records and just wipe the slate clean. I did not want to do that in this case, because the records were created a few months ago, and I did not have a reliable way to tell if the data on the newest (dupe) record had been updated and should therefore be maintained. Also, there were data in new fields that we wanted to keep, so merging was a better option than deleting. However, this is an option that I strongly consider readers to evaluate in the future! In some cases, it saves soooo much energy that can be better put toward solving other issues … or doing your hobbies!
In the absence of using on-platform tools, I decided to use my trusty spreadsheets and some elbow grease. Here are the steps I took to evaluate dupes:
- First, I ran an org-wide backup in case I screwed up even more than I already had, and wanted to revert back. You can only run these once per week, so you should either export a fresh one, or plan your deduping for soon after your weekly export is set to run so that you can store the most up to date backup.
- Then, I ran a report with just the Contact data that I needed for merge analysis. These are the fields that I wanted: CaseSafeId, First Name, Last Name, Organization Name (Account Name), Organization ID (Account Id), Email. I exported this to Excel.
- In my spreadsheet version, I sorted alphabetically by Last Name and First Name. This helped me be able to visually detect duplicates since they were likely to be in order (unless there were name typos, which does happen. But that is a different problem for a different day).
- I used spreadsheet formulas to create a hyperlink that would easily load the record I needed in Salesforce. This saved me a lot of time copying and pasting. Then, from each record in Salesforce, I used the standard Lightning Component to do the merge operation.
- I used the handy Command+W keystroke combination to quickly close tabs when I was done with them. This saved me some time with repositioning my cursor to hit the tiny X in each browser tab and also saved my computer some processing power.
- I did this about 2k times, usually in sets of 200.
what did i learn?
It might be shocking to hear, but I’m glad I did this exercise. I have often found meaning in minutiae and this was no different.
By spending time with the data, I noticed some other discrepancies that merited being sorted out. For example, I noticed that when my predecessor uploaded the original Contact data, there seems to have been a problem with their source data and all of the Zero’s were missing. This is not the same problem as missing leading zeros in zip codes because zeros were also missing from phone numbers and street addresses. It’s SUPER weird and I’ve truly never seen this issue before. I brought it up to my partners at the org and they had never noticed! Some of the fields with data issues were not publicly viewable on the page layout, but they would still wreak havoc in a report or ::shudder:: a mail merge attempt where you actually want to mail things to people.
I also noticed some interesting discrepancies with “region.” For some contacts, their address was clearly in one country (let’s say, China) but their “region” was assigned as Germany. This may be because they work remotely and their company’s HQ was in Germany. But is this a safe assumption in all cases? Again, I never would have noticed this if I wasn’t going through the data with a fine-tooth comb.
Finally, this exercise set me up well for the next data clean up project that we needed to do. We needed to create records that represent all of the locations of law firms that have multiple offices. I was much more adept with this because I had been looking at the name/location/context of Contact data. It went faster and I was more confident that I did not make mistakes. This context-awareness is REALLY important. Maybe even a topic for a future blog post!
I think there are ways to accomplish this faster, but I’m not convinced yet that they are better. Maybe I learned everything that I had to learn from the first 1200 (???) and the remainder were repetition. Sure, I would have preferred to do my hobbies with my leisure time but looking back, I feel a sense of completion and satisfaction with doing this work. What made it unnecessary was the fact that I made a silly mistake, not that the work itself was repetitive. I still believe that stewarding data is an act of care, and one that I don’t take lightly.
Now it’s your turn – would you have tackled this differently? Let me know in the comments and I’ll be sure to write back and keep my learning hat on!
Love this and agree: “stewarding data is an act of care.” Thank you for sharing!! Easy mistake to make and they are lucky to have you!
We’re almost done with deduping our data (the lingering effects of operating out of 2 salesforces for many years). The trickiest part for me/us has been: 1) our client records are person accounts, so I have to keep reminding myself the seemingly 4 duplicate records I see in the search results are actually just 2 duplicates and 2) Salesforce really doesn’t make it easy to merge records of custom objects.
What I initially started to do was export all the person account records and tag the identified duplicates with the same string to make it easier for review. I could then build a spreadsheet to import into Salesforce and bulk create records of the “Duplicate Record Items” object; one record per client (doing it this way would’ve let us pull in clients we know are duplicates but weren’t getting flagged by our matching rules). Once we’d have those, staff would have been able to take the “Duplicate Record Set” record assigned to them and just work through selecting the duplicates and merging as needed. Though I do love your method of using the record id and just adding the url to the record to the spreadsheet. I’m going to keep that in mind.
What we’re actually going with is having a project consultant of ours try to handle this with SOQL stuff (I have very limited knowledge and experience with that stuff, but it looks promising). Thanks so much for this blog, I really enjoy reading.