Part 1 in a mini-series exploring the great, big world of “de-duping” – or, making sure that you don’t repeat the same information twice in a spreadsheet or database.
- Part 1: read on below!
- Part 2: Re-member-ships
- Part 3: Dupes / dupes / dupes (Excel)
- Part 4: google google sheets sheets
As an identical twin doing double duty as a database manager, you can imagine that I consider de-duping data a delicate matter, and a serious one, too. After all, I was the product of accidental duplication! (My mom didn’t know she was carrying twins until a week before we were born!). If you need help with duplicate data, I’m your girl. If you need help with duplicate babies, well, you might be better off getting advice from my mom. *Not a family photo*
Raise your hand if you have ever needed to make a list of people. Ok, put your hand down now so that you can keep scrolling! I bet most of us started out making lists on an envelope or in Microsoft Word/Google Docs but eventually outgrew those formats in favor of a spreadsheet so that we could do things like sort, filter, format, alphabetize and more. (Juicy future TDAA topics!)
That’s all good and fine until you have the same person appear in your list twice. Cue the scary music! If you’re only keeping track of people you invited to your Family Reunion, it’s probably not a big deal. No one will object to being invited twice – and you can pretty easily spot Cousin Susie if she appears multiple times. The problem is when this happens on a larger scale – like if you phonebank 1,000 people to attend your action or gala, and some people get double or triple calls. You would be wasting time AND you might annoy your base – not a good look for a changemaker!
To make matters even more complicated – sometimes duplicates happen on purpose, like when you’re culling through donation data and anyone who donated more than once will keep appearing – really common if you have monthly donors (and I hope you do!). Believe it or not, de-duping gets even MORE tricky when there are false duplicates – like two different people who truly have the same name – think Jesus Garcia, Ahmed Ali. How do you know who’s real and who’s a dupe?
You guessed it, de-dupin’ is often an “iterative” process, meaning you’ll loop through the same data and de-dupe it a little at a time. There are lots of tools out there to help you with de-duping (to be honest, de-duping is a big business these days!). If you are working in Excel, there is well-documented de-duping functionality. Google Sheets can handle it too. These tools are good if your data have already been exported from your database (hashtag: single source of truth!) – but you might need to fix duplicates there, too – by merging duplicate records, setting up dupe-management rules, or using a third-party de-duplicating tool.
If you’ve made it this far, you might be wishing you had a stunt double, but I’m here to say, you don’t need a duplicate to handle duplicates! In the coming weeks, we’ll break down some of these tools into step-by-step tutorials so that you’ll be well on your way – and dupe free!
Maybe you want to get down and dirty with de-duping right away – I got some really helpful feedback that while a mini-series can be great, sometimes you want to be able to navigate to resources right away (and I think that’s a fair request!).
Below are some great reference points on de-duping. Each link has some pros and cons (I’ll let you decide!) and I’ll be covering most things in more detail in the coming weeks, but don’t let me hold you back! Just remember – deduping is often permanent, so be careful!
- De-duping in Microsoft Excel (product documentation from Microsoft Office)
- More on de-duping in Excel (independent blogger, with some creative ideas!)
- More on de-duping in Excel/Salesforce (trusted independent blogger)
- De-duping in Google Sheets (independent blogger)
- Salesforce Trailhead module on Duplicate Management (free training!)
- De-duping in Salesforce (product documentation)
- De-duping in NationBuilder (product documentation)
- Any database will have their own product documentation how to deal with dupes – and I would guess that these resources are all pretty good