If you use spreadsheet applications like Excel or Google Sheets at work, these problems might be familiar to you:
- It's easy to make accidental changes to data
- It's hard to replicate an old analysis on new data (so many steps to remember!)
- It's slow with large sets of data
- It's cumbersome to share giant spreadsheets over email
Spreadsheet applications are approachable, ubiquitous, and flexible. They can merge cells, add comments, pivot on data, format tables, and make graphs. But with flexibility comes risk. They can lead to more work, more waiting, more versioning headaches, and lower accuracy. Luckily, SQL can be faster, easier, and safer than Excel. Read on to learn what SQL is, and to see why it's a superior way to manipulate and analyze data.
Disadvantages of Excel
Let's start with a practical application. Take a look at this table of Pokémon and their characteristics:
Suppose that we want to use Excel to filter out everything but the name and type of every Grass type Pokémon in the table. In Excel, we have to use step-by-step instructions to get the data:
- open the Pokémon table
- add a filter to the "type" column
- filter for type = "grass"
- remove all columns except "name"
By this point we've edited the data (potential errors), we don't have a copy of the steps saved anywhere (hard to replicate), and we'll need to email the whole file to our fellow Pokémon trainers (painful version control).
As the Pokémon table grows, Excel—or Google Sheets—slows down. If the number of rows is in the thousands, it could really hurt our workflow. If you've ever been stuck staring at a "Pinwheel of Death" or an "Excel quit unexpectedly" message, you know the pain.
What is SQL?
SQL is just a language used in programming. When we say "use SQL," this is what we mean:
- Your data is stored in a relational database, which is made of tables. Those tables usually look like one sheet in Excel, with rows and columns.
- You retrieve data and perform analysis with queries, which are a sets of instructions written in SQL. You can save them the same way you save a text file.
- Instead of describing how to get the data—like in Excel or Sheets—your queries describe what data you want. You can run these queries with a SQL interpreter, which does the necessary retrieval and analysis steps for you.
Excel to SQL: Why make the switch?
SQL is much faster than Excel. It can take minutes in SQL to do what it takes nearly an hour to do in Excel. Excel can technically handle one million rows, but that's before the pivot tables, multiple tabs, and functions you're probably using.
SQL also separates analysis from data. When using SQL, your data is stored separately from your analysis. Instead of emailing a massive Excel file, you can send tiny plain text files containing the instructions for your analysis. Teammates each have access to the same data, so they can run your analysis on their own. They don't have to manage file versions or risk corrupting the data, and they can re-run it on any other data.
All of this contributes to the serious demand from employers for SQL skills.
How do Queries Work?
Remember the Pokémon table?
With SQL, we can write queries to declare exactly what we want from the data. For example:
This query says "I want the name and type of every Pokémon in the
pokemon table that is a 'grass' type."
Our resulting table would look like this:
Let's break this query down:
SELECTis the keyword that tells SQL to start a query
typeare columns in the
FROMis the keyword that tells the query to look at the
WHEREfilters our data on specified conditions
Compare this to the equivalent work in Excel. Rather than taking a series of steps that are prone to error, slow to execute, hard to replicate, and cumbersome to share, SQL queries can be faster, easier, and safer.
Can I still use formulas in SQL?
If you're an Excel power-user, you might be hesitant to give up familiar formulas like this:
SQL offers the same functionality, with greater readability, thanks to its structured and English-like syntax.
SUM is used to add multiple values, and
CASE is used to handle conditional logic. For example:
The first statement returns the sum of all weights in the
pokemon table. The second statement returns two columns: the name of each Pokémon and a size label ("small" if under 5, "medium" if under 15, etc.). Notice that SQL uses specific column names instead of abstract cell references, and words like
THEN instead of parentheses and commas.
You might use pivot tables as well. What takes you seven clicks in Excel would take you seven keystrokes in SQL—just type
This query will return the total weight of all of the Pokémon that fall under each type:
This SQL statement is more concise and concrete than the list of instructions you would need to describe the same pivot in Excel, and the statement can be run on multiple datasets with little effort. In other words, the SQL statement is easy to review and easily transferable.
Spreadsheet applications like Sheets and Excel are great for presentations, graphing, and small analyses, but they can be inadequate for tasks that require heavier lifting. SQL can complete most of those tasks faster, and it's more user-friendly from start to finish.
If you'd like to experiment with SQL yourself or you don't think Pokémon make a convincing enough argument, try our free "Why learn SQL?" course, written by Codecademy's data curriculum designer and Head of Data Science, Hillary Green-Lerman.
If you ready to learn the ins and outs of the language and practice it with real data, you can learn SQL from scratch in Codecademy's latest Pro Intensive. As you learn and improve with SQL, you'll be able to shift more of your workload from Excel to SQL, reserving Excel as a last stage to make attractive tables and graphs.
Sonny Li and Dan Layfield contributed to this article.