/ Insights

Aggregating Pokémon Data with Python and Pandas

Most of the time, high-level decision-makers require aggregated data. For example, to understand sales trends, business analysts need to aggregate individual sales transactions by month, quarter, or fiscal year. Data aggregation is a key skill that can drive value for many organizations.


Supermarket sales aggregated by category

Pokémon is a video game where creatures (known as Pokémon) of different types battle each other for glory. To commemorate the release of the newest Pokémon games for the Nintendo Switch (Let’s Go Pikachu and Let’s Go Eevee), we will aggregate and analyze Pokémon data in order to answer the following questions:

  1. How many Pokémon of each type are there?
  2. Which Pokémon type is the most powerful?

First, we will complete our analysis using spreadsheets because spreadsheets are the most widely used tool for data analysis. However, Python programming provides more flexible and more scalable analysis options than spreadsheets, so we will complete the analysis using Python and the Pandas library.

Starting with spreadsheets

Let's start with a dataset of all Pokémon from Pokémon Database. To follow along, download the data here (right click and select "Save As...").

We will use Google Sheets, a free spreadsheet application, for our analysis. Regardless of the specific spreadsheet tool you use, the underlying concepts will be the same. Below is a preview of the data.


Each row represents a single Pokémon

Each Pokémon belongs to one or two types, and certain types are strong against other types. For example, Charizard is a flying, fire-breathing lizard, so it is both flying and fire types, and it is weak against water types.

Pokémon that belong to two types occupy two rows in our spreadsheet. In addition, every Pokémon has multiple stats to determine how it performs in battle. A description of each stat can be found in the Pokémon Database. For example, Blastoise has a higher defense stat than Charizard, so it will better withstand physical attacks. For our analysis, we will look at the type with the highest number for each stat.

A pivot table is a tool designed specifically to aggregate data, and it will be the easiest way to aggregate Pokémon by type in our spreadsheet. To create a pivot table, select your data, and select the Pivot Table option. Since we are aggregating by Pokémon type, we will add “Type” to rows in the pivot table options.


The resulting pivot table has a row for each unique type

Right now, our pivot table is blank, and we need to add values to it. Since we want to count the number of unique Pokémon in each type, we would add it to values in our pivot table options.


We are counting the number of unique Pokémon names for each type

Since there is no type that is definitively the “strongest”, we will look at the strongest type for each stat. This would be useful for Pokémon players who are building balanced teams with both offensive;y- and defensively-inclined Pokémon. To see which type has the highest median values for each stat, we will add additional options to values in the pivot table options.


Calculating the median of each stat for each type

At this point, we can see our results in the pivot table:


The highest values in each column are highlighted in green, and lowest values are highlighted in yellow

Here are some interesting observations from this initial analysis:

  • There are a lot of water-type Pokémon and very few ice-type Pokémon. Clearly, most Pokémon aren't living in winter conditions! 🌴
  • Dragon-type Pokémon seem to be the strongest while bug-type Pokémon are the weakest. 🐉 > 🐞
  • Fairy-type Pokémon have low attack. Guess we don’t have to worry about being attacked by the tooth fairy. 🧚
  • Steel-type Pokémon have the strongest defense. Does the aluminum industry have something to say about that? 🤔
  • Rocks are slow. Even the ground and grass are faster...🗿

Now to Python

Spreadsheets are great, and we were able to glean some fun insights from our pivot table analysis. However, using Python with the Pandas library is far superior to spreadsheet analysis.

Writing code with Pandas is significantly quicker than interacting with a spreadsheet’s GUI interface (did you see all of those screenshots above?).

To aggregate data with Pandas, you will need to complete the following steps:

  1. Import the Pandas library
  2. Upload your data to a Pandas DataFrame
  3. Complete the aggregation

For our Pokémon analysis, our commented code and output are below:


Unlike the spreadsheet analysis, there are no intermediate steps when aggregating data with Python and Pandas.

Modifying our analysis

The dataset we used contained all Pokémon. However, only a subset of Pokémon are available in each "Let's Go" game. Download the data with only the subset here (right click and select "Save As...").

To aggregate this new data with spreadsheets, we would have to repeat all of the manual steps involved in making a pivot table. However, with Python, we only need to modify a single line of code:


We only needed to change one line of code to modify our analysis

When only Pokémon available in "Let’s Go Pikachu" and "Let’s Go Eevee" are included, Dragon-type Pokémon still have the highest overall stats. However, they do not dominate as much as they did before in each of the stats, and overall, the stats are distributed more evenly across types.

What’s next

High-level decision makers often require analysts to make minor adjustments to view data in a slightly different format. In these cases, Python will save significantly more time when compared to traditional spreadsheet analysis.

I’d encourage you to try analyzing the data yourself. Below are other modifications you can apply to our Pokémon analysis:

  • Include only final evolved Pokémon
  • Exclude legendary Pokémon that are ineligible for Pokémon competitions

To learn more about data wrangling with Python and Pandas, take a look at Codecademy’s Data Analysis with Pandas course.

Get more practice, more projects, and more guidance.