Site icon The Data are Alright

hay in a needlestack

A few years ago, I wrote about a “lady and the tramp” metaphor for solving spreadsheet formula data discrepancies. Like eating both ends of the noodle until you smooch your boo, sometimes the only way to solve the problem is to fix the data and the formula iteratively until they meet in the middle. Well, this week I’ve been DEEP, DEEP in this metaphor. Except without noodles. Harrumph.

Sometimes fixing data discrepancies can be more like finding a hay in a needlestack (It’s always a good day on TDAA when I get to borrow from the Spongebob archive). A pokey, painful, hyperbolic, inverted, not-at-all-fun-or-adorable situation. Ouch!

e-MERGE-ing discrepancies

For example, this week we found an error in my beloved Budget Merge spreadsheet opus.

Why does the tool show that we have more awarded grant payments than we actually do? When I checked the formula, nothing looked out of place. Plus, the same formula has been chugging along for years without complaint. (Gasp – the nagging implication that this could have been wrong the whole time. Ok, squishing that thought away for now). The formula basically looks at the Grant Award Date, Grant Payment Date, Grant Strategy, and if all of those attributes match the formula criteria, then add up the payment amount. Nothing seemed out of place there.

Next, I looked at the data records themselves. It turns out that a few grants had been prematurely populated with “award dates” before the decision had transpired. It’s an honest mistake – the staff member in question was trying to be proactive because the grant is expected to be awarded on a certain day. These few grants had a “status” of “pre-approved” but an award date that was pre-populated! This simple problem was enough to wreak havoc on our formulas – and it was not so easy to find that this was the problem, since Award Date is automatically entered when a grant is Awarded, it’s not a field that I typically check for accuracy! I had to create lists of payments and review them “by hand” (by eyeball?) until I found something that looked wrong.

Photo by Negative Space on Pexels.com

Unfortunately, like solving a Rubix Cube, changing one row to get us closer to a solution can sometimes make some other formulas go out of whack. Let’s look at an example where this happened in another project…

Conflicting Criteria

In one project I’ve been working on, we have a data coverpage with overall summaries, like #People Served, #Adults, #Cases Closed, #Average Case Timespan, etc. Then, on other tabs, we calculate some of those measures in more detail, for example, broken out by age or gender. At the end of the day, the detail tabs and the summary tabs are supposed to agree. However, the instructions use different definitions! This created a problem where if we changed one of the equations, it became out of sync with the other ones. If we changed the rows themselves to meet the criteria better, then some other formula summaries went out the window. What to do????

The answer, in this case, was to slow down and go deep. Instead of looking at the spreadsheets and trying to make the rows make sense, we went back to the database where we could see more notes and details about the people and situations that the data represent. In some cases, we found some anomalies and we could directly ask the staff what’s going on. In other cases, we realized that there is a process issue that needs to be addressed. Maybe data entry is happening too soon or too late in order to get the most accurate impressions. Maybe staff need a refresher on what certain drop down options mean. Maybe there is a typo which means that our “find and replace” operations aren’t catching everything.

Aw who am I kidding – all of these issues came up in this project! I kinda felt like I did 6 loads of laundry, came out with socks I didn’t even know I had, and then figured out how to match them up. Maybe this time I’ll finally start rolling my socks into pairs BEFORE they go in the hamper so that they come out together. Can this 28 year old dog learn new tricks?

Valid Volunteers?

In another event of Spreadsheet Olympics, the challenger (me) faced a spreadsheet of volunteer records. The goal? Figure out how many individual volunteers showed up or no-showed. So, what happens if someone is BOTH AT THE SAME TIME? Houston, we have a problem.

Now, OF COURSE sometimes volunteers come and sometimes they do not. That’s the way of the world, and that’s why being a volunteer manager at a nonprofit is such a complex, compelling, highly-skilled job. (Shoutouts! You know who you are!) So for this formula, we went for whether or not a volunteer had attended the majority of their scheduled volunteer shifts within a particular commitment. It should be a pretty cut-and-dry metric, right? For each volunteer, we can use logic to get a clear answer.

The problem, of course, was that some volunteers were in the list twice due to some data entry glitches. I could not deduplicate by Contact Name since sometimes there really are two people with the same name! So instead, I deduplicated by Volunteer Commitment. Each Volunteer Commitment is connected to a roster of Volunteer Shifts. So each Volunteer Commitment can accurately tell us the Volunteer’s Status. Over time, the Volunteer might have many Volunteer Commitments (like, one for each semester for our really loyal volunteers) – but in general, the system is set up so that they should only have one active commitment at a time. The organization doesn’t offer multiple programs, so there wouldn’t be a situation where someone has one Commitment for trail clean up and a different Commitment for “tots in nature.”

Since one of the Volunteer Commitments was essentially empty (none of the Volunteer Shifts were attended, since it was a duplicate or a mistake), but the other one was active, the same volunteer could essentially be counted as Showed Up and No Showed at the same time!

Oy vey, what a headache!

What to do when the data aren’t alright

Here are a couple of strategies I’ve used recently to help get on track:

Once again, I want to emphasize that occasional data discrepancies are normal and okay! They will happen in any human system, even more so if the system is complicated, even more more so when we are tired and resource-constrained. I see these experiences as a learning opportunity – learning how to refine our logic, our process, our data integrity procedures, and most importantly, how to take care of our systems and our people. I’m not going to say, “don’t freak out!” because heck, we all freak out sometimes! But I will say that I have confidence in you, us, and our ability to get through this. On the other side? Pride, relief, actionable insights/conclusions, and a good night of sleep.

Exit mobile version