"I see Data Science as the intersection between statistics and computer science," says Sophie, a Curriculum Developer here at Codecademy. "People have been using data and trying to learn from data for a really long time." And as computers have become more powerful and people have turned to algorithms to help them understand data, the field of Data Science has emerged.
In her job here at Codecademy, Sophie is focused on our Data Science curriculum. "I'm trying to find out what skills different industries require so that people can come to Codecademy and learn the skills that they need."
She notes that the role of a Data Scientist might look different at different companies. Some data scientists are primarily focused on understanding and describing consumer behavior using statistical summaries and data visualizations. Other data scientists spend the majority of their time building and fine-tuning predictive models to solve all sorts of interesting problems. Many data scientists also collaborate with departments inside their company to help their colleagues perform their jobs more effectively.
Data Scientist Skills
If you're interested in a career in Data Science, you’re probably wondering what kind of skills you need to excel. For starters, you’ll want to have a good understanding of statistics, probability, and programming. But how does that translate into your day to day work?
Sophie shares that, "There are a lot of different skills because there are a lot of different steps to working with data." In this article, we'll dive deeper into some of the specific skills that you'll be using in your work as a Data Scientist.
1. Data collection
As a Data Scientist, a big part of what you'll be doing is collecting data — after all, you can't use data if you don't have any! Data scientists get data in a variety of different ways. They could use web scraping, use SQL commands to pull data from a database, or use an API, among other things.
When you're working with large data sets, it can be time-consuming to manually find and extract the data that you're interested in. That's why data scientists rely on tools like SQL to get the data that they need into a format they can use.
2. Data exploration
Once the data is collected, it's time to begin exploring. A data scientist might perform exploratory analyses to spot outliers, test the validity of their hypotheses, and visually inspect the data using charts and graphs. This can be done using a programming language like R or Python.
To effectively explore your data, says Sophie, "you need to know about different kinds of numerical and visual summaries and understand which ones are most appropriate for the data that you have."
3. Data cleaning and prep
"A lot of the job is usually data cleaning and preparing your data to do something with it," says Sophie. This includes looking for missing data and dealing with outliers — things that can yield inaccurate results if not addressed.
Data preparation is often time-consuming for researchers since they need to find and address errors, choose and transform features that are relevant to the problem they are working on, and reshape or merge multiple data sources together. So if you're looking into becoming a data scientist, it's important to understand that this is a big part of the job.
4. Asking the right questions
Sophie says, "A data scientist needs to be thinking about what kinds of questions they can answer with data and what types of data they need to collect in order to answer those questions." For this reason, in the field of data science, curiosity is an important trait.
Data scientists need to define their objectives to see the big picture of conducting a study. This is accomplished by identifying a problem statement and trying to answer it. The tricky part is figuring out which questions are the right ones to ask.
5. Analyzing data and building models
Once a data scientist has identified their question or goal and collected and cleaned their data, the next step after data exploration is often model building and/or hypothesis testing.
This process varies a lot depending on the data scientist's ultimate question or goal. For example, if a data scientist is trying to build a model to predict whether an email is spam or not, the model building process might include fitting different types of models, experimenting with tuning parameters, or even collecting and transforming data once again!
Last but not least, communication is key in the field of data science. "I think it's really important for data scientists to be good communicators," says Sophie. "Data scientists need to know how to distill their findings down to actionable insights."
In a business setting, data scientists are often sharing data with decision makers, executives, and management teams in order to provide insights that inform business decisions. Because these stakeholders don't often come from a data background, the data must be distilled down and presented in an easy-to-understand way.
Build your skills
If scraping, analyzing, and modeling data sounds like your idea of a dream job, a career in data science may be for you! Our Data Science Career Path is designed to give you all the skills you need to get started. Learn more and get started here.