seinfeld screen capture

Measuring Must-See TV

|

So Hot Right Now


Lately, I keep seeing the exact same picture. It's a picture showing IMDB ratings for Game of Thrones. The episodes are laid out in a grid with one column for each season and one block for each episode. Epsiodes are color-coded based on average IMDB ratings to create a heatmap that looks something like this:

Game of Thrones

Average IMDB Rating by Season

click cells for more info

4 stars10 stars
This heatmap does a good job of showing key features of the data. You can see the stark decline in ratings as cells shift from blue-green to orange-red and, similarly, you can pick out the highest rated episodes by finding the darkest blues in the sea of greens.
But, while heatmaps are good at outing outliers, they have some drawbacks. For one, it is difficult to compare differences in colors. It's hard to tell whether ratings are gradually increasing or decreasing throughout the early seasons. And even when the colors are clearly different (e.g., during the season 8 decline) it's hard to judge the magnitude of that difference. Just how much worse is maroon than orange than pale yellow?
After seeing one too many of these heatmaps, I thought it would be worthwhile to explore other ways of visualizing these data. Like the heatmap, they all have their strengths and weaknesses, but it can be helpful to see them first-hand.
Let's take a look.

Back to Basics


Behind all of these heatmaps is a fairly straightforward question: "How do TV show ratings change over time?" The simplest way to visualize change over time is a line chart. Here's one for Game of Thrones. Each line segment represents a full season; alternate seasons are shaded light and dark to help differentiate them from one another.

Game of Thrones

Average IMDB Rating by Season

hover over line for more info

4 stars 5 stars 6 stars 7 stars 8 stars 9 stars
Although the data in this chart are identical to those in the heatmap above, the story looks a little different. For example, if you look at any one season, you can see that user ratings tend to climb from the first episodes of the season to the last. Starting in season 5 you start to see more variation in the ratings for episodes. And, looking at the final season, you can see just how drastically ratings have fallen. The best rated episode of season 8 has a lower rating than the worst rated episode from seasons 1 through 7.

More is Better


If we want to know how TV show ratings change over time, it also helps to look at more than one show. So, with that in mind, I pulled the IMDB ratings from 20 popular shows for comparison.
To get a sense of just how much fans disliked the final episode of Game of Thrones, not one of the over 2,500 episodes I pulled had a worse rating than "The Iron Throne".
Here are a series of line charts for each show. All y-axes are scaled from a minimum of 4 stars (the lowest rating in the data) to a maximum of 10 stars. Because every show has a different number of episodes, the x-axes are scaled so that the first episode of each show appears at the far left and the last (or most recent) episode appears at the far right.

A few things stand out here. First, all of these shows are remarkably well rated. The median IMDB rating across all episodes is an 8.3 out of 10. The highest rated show is Breaking Bad (avg. episode rating of 9.0) and the lowest rated show is The Big Bang Theory (avg. episode rating of 7.9). 30 Rock
is the most consistently well-rated show. Over the course of seven seasons and 138 episodes the show averaged an 8.0 rating with a standard deviation of just .3 points.
This is remarkable when compared to shows like Buffy
which ran for about the same amount of time (144 episodes) and earned about the same average episode rating (8.1), but had more than double the variation (standard deviation of .8 points). Look to Joss Whedon for high highs and low lows; turn to Tina Fey when you need a gauaranteed laugh.
We can also see a handful of shows that suffer the same fate as Game of Thrones. The ratings for How I Met Your Mother are fairly consistent until a turbulent finale and the ratings for Dexter
collapse in the final season. In contrast, shows like Sopranos, The Big Bang Theory, and The Office (US)
end on a high note, with the final episodes receiving the best ratings of the series.

Popularity Contest


So far we've only looked at user ratings, but what about the other definition of TV ratings? How many people actually watched these shows? Here are the Nielson ratings for Game of Thrones courtesy of Wikipedia.

Game of Thrones

Average Number of US Viewers by Season

hover over line for more info

14 million
As user ratings fell, the audience for Game of Thrones soared. The final season had nearly fourteen times more viewers than the first season and the final episodes were the most-watched in the series.
For comparison, let's look at the other shows in our dataset. Most of the 20 shows have audience data, though I couldn't find complete Nielsen ratings for Buffy, It's Always Sunny in Philadelphia, Sopranos, or The Wire. Similarly, there are no viewer numbers for the final seasons of Arrested Development and Community because those shows moved from TV broadcast to online release.
Finally, because ratings vary wildly from show to show (e.g., Friends averaged over 25 million viewers per episode while Mad Men averaged just 2 million), the y-axes are scaled to each show's largest audience.

There are so many crazy trends here. You can see the growing hype for shows like Game of Thrones and Breaking Bad
as the number of viewers climbs higher and higher each season. You can also see the outsized impact that the Super Bowl has on ratings by looking at the massive spikes for episodes of Brooklyn Nine-Nine, Friends, and The Office (US)
that aired after the big game. Mad Men
really knows how to market a season premier. It's Always Sunny
audiences peaked several seasons ago. And no finale comes anywhere near the Seinfeld
finale (except for M*A*S*H and Cheers and The Fugitive).
The Seinfeld finale is such an outlier that it's worth putting all of the shows together on a single chart for comparison. In the chart below each show is represented by a single line that extends from the first episode to the last. The longer the line, the more episodes in the show.

Average Number of US Viewers by Season

hover or click lines to change show

80 million
Seinfeld clearly stands out from the crowd. The only show even close is Friends, whose early seasons actually brought in a larger audience than the early seasons of Seinfeld. But the audiences for more recent network comedies are considerably smaller (e.g., The Big Bang Theory averaged 14.2 million; How I Met Your Mother averaged 9 million). And non-network shows fare even worse (e.g., Breaking Bad and Mad Men averaged fewer than 3 million viewers per episode).

Bubbling Up


Up to this point, we've only been looking at one measure of "ratings" over time. But it would be nice to see both user scores and audience size plotted together.
Here is a bubble chart inspired by this redesign of Hans Rosling's Health and Wealth charts. The y-axis shows average IMDb ratings. The x-axis shows the number of viewers. Each episode is plotted as a bubble. The size and color of each bubble tells us when an episode aired. The first episode of a show appears as tiny pale gray dot. Each subsequent episode is slightly larger and slightly darker, until the final episode is shown as the largest bubble on the screen, colored dark purple.

Average IMDB Rating by Average Viewers per Episode

click bubbles for more info

< fewer viewers more viewers > 4 stars 5 stars 6 stars 7 stars 8 stars 9 stars first episode last episode
Although a bit strange to decode, this bubble chart brings out patterns that are otherwise hard to find. Each corner of the chart represents something different. Ideally, as the bubbles grow in size they should shift towards the top right of the chart (i.e., high ratings, large audiences). A shift towards the bottom right corner is a bit more tragic (i.e., growing audiences, terrible reviews). And a shift towards the bottom left corner suggests the end is near (i.e., bad reviews, shrinking audiences)
Looking at Seinfeld
< fewer viewers more viewers > first episode last episode
we see the bubbles shift from left to right (i.e., audiences kept growing) but the bubbles don't really move up or down (i.e., user ratings stayed the same). On the flip side, 30 Rock
< fewer viewers more viewers > first episode last episode
shows the opposite pattern. The bubbles don't move up or down but they do move from right to left, indicating that user ratings stayed consistent but audiences dropped from episode to episode.
The bubble chart for The Big Bang Theory
< fewer viewers more viewers > first episode last episode
is one of the most interesting. As the bubbles grow, they travel in a sweeping boomerang shape. The small bubbles (i.e., the early episodes) move from left to right, indicating that more people were tuning in over time. But as the bubbles grow even larger, they swing back to the left and gradually fall downwards, suggesting that later episodes were losing viewers and also dropping in quality from week to week.
That said, I'm still not convinced that the bubble chart is any better than two line charts plotted together.

Average IMDB Rating and Average Viewers per Episode

hover over line for more info

4 stars 5 stars 6 stars 7 stars 8 stars 9 stars 0 million 0
Average IMDB Rating
Average Viewers per Episode

Full Circle


There are undoubtedbly other interesting ways to think about these data. I'll leave that to you. For now, I simply ask that everyone stop making Game of Thrones heatmaps. Please. I beg you.
If you want to see more articles like this one, check out my site AgainWeWander. And, if you still want more heatmaps, well, who am I to judge. Have at it.

Average IMDB Rating by Season

click cells for more info

4 stars10 stars

Notes

All data for average user ratings courtesy of IMDB. All data for average audience size courtesy of Wikipedia. Data were collected in early June, 2020.

To build the final dataset, I started by creating episode lists for each show from Wikipedia. Wikipedia has a general template "List of [Show] episodes" that they use to catalog episode titles, sequence, air dates, and viewer numbers. I then cross-referenced this list against IMDB pages for each season.

Occasionally the two sites disagreed on episode counts, usually due to special 2-part episodes. In cases where Wikipedia included two separate episodes but IMDB did not, I collapsed the data into a single entry (e.g., the first few episodes of season 4 of The Office). An exception to this rule was made for the 30 Rock finale. Wikipedia lists two separate episodes for the finale, while IMDB collapses them into one. I included both records because Wikipedia lists a different title and different writers for each episode.

Finally, I renumbered episodes/seasons sequentially within the dataset to prevent gaps in charting. As a result, my episode numbers may slightly differ from other datasets.