Imagine a graph. The X-axis is hoppiness and the Y-axis is maltiness. Graphing styles on this graph would be useful- you could visualize that American Barleywine and English Barleywine are both hoppy, but that American Barleywine is also more hoppy. You could also see that some styles would cluster together – an American Amber Ale and an American Porter would likely not be fair apart.
You can see how mapping these out and finding neighbors could be interesting.
But what if you have more than hoppiness and maltiness? Sure, you could add the Z axis for sour, or color, or whatever else. But you can’t easily visualize more than 3 dimensions.
That’s the problem I was having with the What to Brew data. For each addition, I have a number from 0 to 1 of how well it pairs with a given style. In other words, I have data in 96 dimensions. How can I visualize similar additions and see what additions might complement each other, or just explore the data?
Luckily, there are a few tools that can be useful here, and Google Tensorflow has made it easy.
Open up one of these links, and live data (updated nightly) from What to Brew’s database will load. The initial view will look like this:
This uses Principal component analysis (PCA) to reduce the dimensions of data, while trying to show as much variance as possible. You’ll want to click the “A” icon to enable 3D labels mode. Play around with this view mode, but I’ve found that PCA doesn’t really do a great job of clustering this dataset.
Instead, switch over to t-SNE (t-distributed stochastic neighbor embedding) and let it run awhile. This uses machine learning to reduce the dimensions. I found that raising the perplexity initially helped to get better results.
While this runs, you can try playing around with the “Color by” function, to highlight different style and addition groupings.
I’ve found that style groupings (top-fermented, wheat-beer-family, british-isles) don’t tend to group together. In other words, style groupings don’t appear to predict addition compatibility. This makes sense, as a grouping like “pale-ale-family” includes a fair amount of variety.
Addition groupings do tend to be more predictive of style compatibility. In other words, fruity additions tend to go with similar styles, and herbal additions tend to go with similar styles. This tends to work even with sub-clusters, like berry, tree-fruit, or citrus.
Take some time to play around with these- it’s a fun way to visualize a large amount of data.