Part 3 in a mini-series exploring the great, big world of “de-duping” – or, how to handle repeated info in a spreadsheet or database. In this installment: using Microsoft Excel to highlight duplicates and summarize data. To catch up on this series, check out Duped Again! and Re-member-ships.
You may have noticed that it’s been a little quiet over here at TDAA HQ. That’s mostly because I’ve been working up a sweat in the gym and scrutinizing every single two-bedroom apartment west of 43rd street and east of the Schuylkill River. Yes, I am (finally) moving in with my partner/GoogleSheets muse (as featured here and here)which leads to conversations like…
A: It’s time to start looking for apartments!
Me: Woah, woah woah, we need a spreadsheet!
A: *face palm* <3
Dupes / Dupes / Dupes
Repeat after me: “The only perfect database is the one without users.” Dupes happen. They have happened before and they will happen again. Most databases have a “merge” command that let’s you do manual de-duping, and some even warn you about possible dupes … but Excel and Googlesheets won’t do that for you! You’re on your own – frankly, with MORE tools at your disposal. It’s up to you to deploy them!
My go-to “de-duper” (yes, it’s a funny term! try saying it to yourself, or your kid!) is the “Remove Duplicates” feature in Excel’s Data toolbar. Honestly, before I talked with my mentor Michael, it was the only spreadsheet de-duper that I’d used. Remove Duplicates is quick and easy, however, it deletes your duplicates instead of temporarily hiding them for you. Usually I am already certain that I want to delete them (like if I am de-duping a list of email addresses), but other times, you might want to check first. That’s where these other features come into play.
The Great Reveal
Sometimes you don’t want to merge, hide, delete or summarize your duplicates. All you want to do is see which ones are which. Well guess what? There’s an app for that! Just kidding… but there is a feature you should know about and it doesn’t even require a download. You can use conditional formatting to reveal duplicates and turn those cells a different color. The steps are pretty simple. Here’s a (thorough) tutorial that I recommend for those of you who want to learn the skill.
Out of site, but not out of mind
You can take the “reveal duplicates” feature one step further to filter out duplicates. This will temporarily hide them from your list, but it won’t delete them forever! You have two options:
- Use Advanced Filtering (you can choose to hide/filter duplicates OR copy just the unique cells to a new tab or new column in your spreadsheet) (See video link, starting at 1:30)
- Filter by color (if you have already done The Great Reveal step above, your duplicates will turn red (or whatever color you choose) and you can subsequently filter them out
Sometimes you don’t want to hide/remove duplicates at all! Instead, you want duplicates to be grouped and added together (Excel calls this summarizing). For example, if you are looking at donation data, and my name appears twice – you want to add my two donations together, rather than remove one just because I was on your list twice.
The best way to summarize data is using a Pivot Table. Pivot Tables can be a world unto themselves, but it’s a good tool to tuck away in the back of your mind for when the time is right. Here’s a video introduction, featuring Excel for Mac (for a change)
Dupes are everywhere! I hope you’ll keep these tools up your sleeves for the next time you are faced with a dupe-ridden spreadsheet of your own. And if you get stuck, don’t forget, I’m here to help!