This week, I faced an old nemesis – that of combining lists of contacts and email addresses to create a unified, reliable, easily updated, shared resource of contact information. If I’m being honest, it’s a nemesis that fills me with a bit of nostalgia as this was one of the first skills that I shared with my community on this here blog. I feel a softness for this task because it reminds me that going back to basics is always an impactful choice AND I celebrate that over the intervening years, I’ve learned new and better techniques. I guess you CAN teach an old activist new tricks?
In this situation, I had spreadsheets from three sources:
- Google Contacts Database
- Online payment processor
- Mailchimp newsletter subscribers
In the short term, our goal is to consolidate all of them (with no duplicates), update the data, and then upload the data into Salesforce where we can use the brilliant Gmail Connector to help team members log their most meaningful interactions and keep the contact database up to date. First thing’s first, we need to wrangle these data sources into one common spreadsheet location.
In this tutorial, I will show you how to consolidate the lists and display how many times each person was crosslisted. Then, I’m going to share a few examples when this method might not be appropriate … and why you should try it when it DOES make sense!
First, I downloaded the full contact export from all three data sources into a spreadsheet or CSV format (which is also, effectively, a spreadsheet). I combined all of them into one “multi-tab” spreadsheet that looked like this:
Then, I did a little bit of setup which we will need for the next steps. In each tab, next to the “email” column, I made a new column and typed the word “Yes.” This signifies that the email is “in” that list, and we are going to want that later. Also, I used the “Named Range” feature to give each spreadsheet a “name” so that I could use that name to refer to it in a later formula.
Finally, I copy+pasted the email column from each tab into one, long column in the Contact Summary tab. At this stage, it doesn’t matter if you are copying duplicates, it just matters that we get all of the emails in the same tab and the same column.
Next, I used the standard spreadsheet functionality called “deduplicate” or “remove duplicates.” In this example, I used Microsoft Excel but the same feature is available in Google Sheets if that is your spreadsheet platform of choice! This reduced the whole list so that each email address appeared only one time. In my example, we removed about 300 duplicates. Hooray!
Intrigued? I wrote a whole blog series on deduping, summarized here.
Now I have one column of “unique” email addresses, so I’ve technically accomplished my goal, but I wanted to push the envelope and see what we could learn from the distribution of contacts across these lists. I think it might reveal some interesting insights about how many mailing list subscribers are also donors, or how many personal contacts made it onto the mailing list, etc. So basically, I needed a tally of the number of times each email existed in each list.
I decided to do that with a three-part formula sandwich.
- VLOOKUP searches for the email in each list, and tells me “yes” if the email is there. If the email isn’t there, it shows me #N/A (which looks like an error, but it’s really just the spreadsheet’s way of telling me that it didn’t find anything)
- IFNA looks at the VLOOKUP doesn’t find the email in the list, it will tell me “no”
- COUNTIF will add up the number of “yes’s”
I am really happy with the results here! They are relatively easy to interpret and they are useful because they represent the breadth and depth of each contact source. Also, we can use them to draw some more precise summaries (like how many contacts were on each list to begin with?). If the donors are not on the other lists, then how did they become donors? My gears are turning and I hope yours are, too!
Jazzed to learn more about VLOOKUP? Check out its even cooler cousin, XLOOKUP, here.
Now that we have a single list of contacts, we can start updating the data based on the categories that are important to us. Perhaps I’ll write another post on how to determine those categories, but for now, I’ll let your imagination run wild.
I think this process is pretty snazzy. It was quick, reliable, and accessible. I could run through the steps again if one of the lists gets a big influx of contacts, like, tomorrow. However, it’s not the right method for every scenario. For example:
- If your data has a lot of typos (for example, if it was hand-typed representation of sign in sheets), you are likely to have a LOT of bad email addresses that are off by one or two characters. The deduplicating algorithm will treat all of those email addresses as unique, which can give you skewed results. This approach works really well if you have a lot of trust in the integrity (accurate-ness) of your emails on file.
- If you tend to have contacts in your system that have more than one email address, this might not be the right approach for you! Instead, you can consider deduplicating based on a different attribute – but be careful of names, because there are likely to be two different people who have the same first and last name! It happens more than you might think!
- There’s a similar problem that can occur if you have multiple people who share the same email (this can happen with an elderly population – like two grandparents who share an email, or a household that has a “primary email” assigned to all contacts, even if the contacts are unique people with their own personal email.
- Extremely large dataset – at a certain point, your spreadsheet will poop out if you have too many rows (in the hundreds of thousands or millions). There are advanced tools that are designed for these types of situations.
- Large number of data sources – at a certain point, it’s just not efficient to do this for a dozen or dozens of data sources (think – voter databases, door databases, etc). Once again, this might be a good opportunity to try out some advanced tools!
Speaking of advanced tools, are there people in my network that use OpenRefine or Talend? I’m excited to learn more about them and I always love learning from my friends and colleagues. Get in touch!
Why this matters
As long as Google Forms are a thing, we are going to have surveys, event RSVPs, and all manner of contact lists being created exponentially. It’s important to have the skills to zip them back together, so that we can reduce work (why update the same info twice?), build our base, and learn from our community of supporters (how are they engaging with us?).
This is harm reduction! It’s just not realistic to expect that all data will be perfectly in sync all the time. Some data cleanup will ALWAYS be required, so let’s skill up and make it streamlined, empowering, and shame free.
Deduping manually is a royal pain in the tush!!! This is an area where a little bit of capacity building leads to a manifold impact: more time, less frustration, more connection, less procrastination, better data, etc etc etc! I want that for our movements and I want that for YOU! So, next time you have a few spreadsheets that you need to knit together, I encourage you to try out this technique, reach out for help, and at the very least, report back so that I can cheer you on! You got this and I got you 🙂