prepping synagogue data – and – introducing Jenks natural breaks!

I wear quite a few hats; those of you who know me well would find this unsurprising! One hat that I don’t have many opportunities to write about is my volunteer position as the treasurer of my beloved synagogue. As this community revved up for one of our Big Meetings, I got to put on a different hat, or what my mom calls “Queen of Data.” We had a list of households and their membership contributions (projected for the full fiscal year), and I wanted to group those contributions into meaningful categories so that I could help educate the community about how our radical, sliding scale dues system is working out.

Treasurer, Data Queen, Salesforce Admin… unfortunately none of these include much experience with statistics, which would be the right toolkit for this question.

There were a number of questions that bubbled to the surface:

What is the distribution of congregant sliding-scale dues contributions?
How many categories do we want to use to represent the distribution of contributions?
What is the right tool to calculate # of households per category?

In this blog post, I’ll share what I came up with and what may have been a better way to do it for the future!

The dilemma

We need to visually represent how much $$ congregants are contributing to my synagogue.

How to determine groups/bands/breakpoints

This is my first time attempting digital drawing!

In the doodles above, I showed several models for dividing the data into groups, and beginning to display those groups as charts.

All of them are imperfect (at best) and misleading (at worst) or just unhelpful. For example, in the pie chart and the bar chart, the criteria for each “band” is not clearly defined. In reality, I used the middle option (img 1). In the histogram (chart 3) and the dot chart (chart 4), we don’t have much sense of clustering or pattern at all.

ALL of these charts answer question 1 above: What is the distribution of congregant sliding-scale dues contributions? but none of them answered it adequately. I was especially perturbed by the question of breakpoints. If someone gave $500 and someone else gave $501, they would be in different “bands.” However, someone who contributed $501 is much closer to $500 than, say, someone who contributed $999. These groupings are thoughtful, but in this context, they seem arbitrary, and that obscures what we are trying to learn from the data.

How I did it

I decided early on that I did not want to let “perfect” get in the way of “good enough.” So, I decided to rely on the breakpoints that are publicly listed on the synagogue website. Next, I needed to group my data according to which category the household belonged in.

I was hoping to do this with Pivot Tables (couldn’t I group by logic or “buckets?”) but I couldn’t find a path to do so! When I grouped by amount, I got everyone who paid $500, followed by $501, $502 etc. Similar to chart number 3 above.

Instead, I used “Count If” logic like so:

=countifs('Working Data'!F:F, ">=501",'Working Data'!F:F, "<1000")

This formula (repeated 5 times for each “band”) required me to “hard code” the breakpoints between each band. Once the breakpoints were included, the formula would count every household who’s contributions fell within the range. I tripped up at the low end and top end of the ranges the first time (ie “>500”) when it should have been (“>=500”), so I missed everyone who gave exactly $500. However, I was able to identify my error by cross checking the amounts per band and subsequently realized that I did not have the total number of households included.

As a result, I was able to produce both Pie Charts and Bar Charts that modeled real data into categories with transparent (but somewhat arbitrary) breakpoints.

“actual image” of me shlepping data around (img from: https://visme.co/blog/bar-graph/)

This process worked well enough to produce charts for the Big Meeting, but I still wasn’t satisfied by the grouping logic. I had this itching feeling that there must be a better way – some type of formula, code, or logic that could interpret the data into meaningful ‘clusters’ and tell us how many fell within each cluster. This might be different than the breakpoints that we imposed on the model and would likely challenge our assumptions about our community’s giving. The problem? I didn’t know what this kind of math was called or how to search for it.

enter Jenks Natural Breaks method

Through texting my question to a couple of quantitative-oriented scholarly friends (thanks, Chloe!), I was able to get better search terms to find resources online. The calculation that we needed to do was called Jenks Natural Breaks. There are other cluster algorithms, one of which I had some experience with in college, but it seemed from my cursory research that Jenks was the way to go. (If you know differently, please leave me a comment/message, as I’m actively curious about this!).

Would I need some sort of fancy statistics software to be able to employ Jenks Natural Breaks? Not according to this fascinating blogpost! I can actually do it all in Excel (with the addition of a free Add-in). I have read said post 5 times (at least) top to bottom and I am still learning. The excel, statistics, and general logic approach are fascinating to me, and I encourage to you read it, too!

Lo and behold, I downloaded the add-in and took it for a spin. In just a few minutes, the algorithm was able to calculate more appropriate break points and how many households fell within each one. I’m choosing not to display the data in this public forum, but the similarities and differences between the breakpoints the algorithm came up with and the ones that we had established… well, it’s really really interesting. I felt such a GLOW of satisfaction! The data organized this way tell such an interesting story about how groups of congregants choose to contribute the amount that they can toward the sustainability of the synagogue. The numbers aren’t round and they aren’t predictable (by the human brain) but they strike me as telling the truth, a truth that I couldn’t get to without help from a computer. Sure, I could sort the numbers from big to small and look at them with my eyeballs, but I would miss the opportunity to understand the bigger pattern of how people choose to contribute.

And if we had many more members, eventually it would become impossible to make meaning with the eyeball method.

The feeling that led me to this inquiry was that (from a fundraising perspective) getting more people in the higher bands would be useless if it meant that people increased their giving from $599 to $601. Not that that would actually happen, but without good data modelling, it would be hard to tell IF that happened. By applying natural breaks, we can access information about alignment between the contribution levels that we articulated (intellectually) and the reality of people’s giving choices. Then, we can ask people to give in a way that moves each band up a bit, or moves people between bands in a more deliberate manner.

I want to be clear that there’s nothing inherently wrong with any of the models for grouping and visualizing data in this blog post. In fact, I have been perfectly satisfied with using all of them at various points! In this case, my natural curiosity and passion for this community fueled my desire to come up with something even more precise and illustrative, followed by my ever present desire to push “what is possible” in a spreadsheet (limited by technologically or my own acuity).

So tell me,

Have you faced a problem like this? Where you didn’t know the name for the spreadsheet thing you needed help with? If yes, you’re in good company. I’d love to hear about it in the comments!
Do you use giving tiers for managing fundraising? If yes, how do you determine those tiers? Do you think it would be interesting to apply Jenks Natural Breaks to your data? Hit me up in the comments! I can run your data through my excel extension!
Are you a statistician (LOL)? But actually, if yes, I would love to hear from you. I don’t have the capability to evaluate whether this spreadsheet add-in is doing everything it should be doing, nor if Jenks Natural Breaks is precisely the right tool (though I’m fairly confident about both). You can help me expand my learning, which would be so wonderful!

prepping synagogue data – and – introducing Jenks natural breaks!

The dilemma

How to determine groups/bands/breakpoints

How I did it

enter Jenks Natural Breaks method

Like this:

Leave a ReplyCancel reply

prepping synagogue data – and – introducing Jenks natural breaks!

The dilemma

How to determine groups/bands/breakpoints

How I did it

enter Jenks Natural Breaks method

Share this:

Like this:

Leave a ReplyCancel reply

Discover more from The Data are Alright