I’m back at it – where “it” is transforming data into meaning for my synagogue. You last heard about this type of work here. It’s fun to write about these types of projects because they are exclusively spreadsheet based – my favorite tool in my toolbox! No CRM, no problem… right? Kidding!
I started with a data dump of all payments from our system, about 10,000 rows in a spreadsheet. From there, my job was to display the data in a meaningful way and develop a trend analysis. I started off with various pivot tables to group data by household, year, and fund. This was interesting, but what we really wanted to know was how behavior changed over time and for different groups of people, such as new members, returning members, or members who increased or decreased their giving year over year.
Fortunately, Google Sheets automatically offers a formula, =AverageIF() and =AverageIFS() (the latter lets you include multiple criteria).
It was easy to utilize these formulas for conditional averages, such as the average membership contribution from new members compared to all members or returning members. This saved me from having to create a zillion tabs/tables with different filters, only to use the =Average() formula. If I went the “zillion tables” route, I would have had to revise them all if I had a new idea for a criterion or if I made a mistake in my calculations. Using =AverageIF() let me do all of my calculations from the main dataset. Of course, the downside here is that your criteria are more hidden. So either way you go, jotting down some documentation is essential for a project like this.
Eenie, Median, Minee, Mode… If
The problem with the “Average” calculation is also it’s strength – it smooths outliers! However, for data like this where we are trying to find patterns in choices… so it’s useful to remove the skew that $0s and the highest contributions introduce into the calculation… average does not tell us the full story. I wanted to pair the Average values with Median and Mode. But Google Sheets does not have functions for =MedianIF(), =MedianIFS(), =ModeIF(), or =ModeIFS().
Even though there are other ways to solve this problem, I decided that this is a job for the =Filter() formula. =Filter() will create a new table (it can bring Rows AND Columns with it) with a subset of data from the source table, based on criteria you set. *Tip* if you are testing =Filter(), make sure you have enough blank cells below and adjacent to the cell where your formula is entered, because =Filter() will not return data on top of any cells that are already populated.
Once I had my =Filter() formula configured with all of the logic, all I had to do was “wrap” it in a formula to calculate the Mode (or Median) of the data returned in the =Filter() formula. For example, see below:
=mode(filter(D3:D515, I3:I515=0, D3:D515<>"" , D3:D515<>0))
In this way, I was able to quickly take the Median or Mode of a subset of data!
why is this important?
- avoid replicating datasets and proliferation of tables
- avoid tricky =ARRAY style formulas
- be able to ask better questions AND answer them
- be able to do this type of analysis -> better insights!
- learn about our community/constituents/base
Here are some scenarios where this type of formula could be applied:
- You have a list of electoral donations and you want the average by County or Zip code
- You have a list of swag requests and you want the average for each product type
- You have a list of event attendees and you want the median by week
- You have a list of volunteer shifts and you want the mode by engagement score
- You have a list of public school students and classroom teachers and you want the median class size for geographies with certain demographics
- You have a list of action attendees and you want to see if the average action size was bigger or smaller year over year
If we could apply this type of thinking to common social movement problems, we could learn about our patterns and make better predictions about the future. We could target outreach to people or segments who might be slipping between the cracks… smooshed into the all encompassing “Average()”.
This actually has quite a political spin to it. If our political praxis teaches us to look to the margins and the marginalized, then we need the appropriate data tools/spreadsheet formulas to ACTUALLY be able to do this. I encourage you to check out this way of thinking and apply it to some of your own data. If you do, please drop me a line. If you’d like to, but you need some tech support to get there, drop me a line as well! I LOVE to hear from you!