/ Insights

Protect Your Personal Data By Thinking Like a Data Scientist

Data is a currency, and many of us spend it more often than we realize. When we use a “free” search engine that tracks our searches, or a social media platform that maps and analyzes our relationships, or even a grocery store rewards card that offers discounts in exchange for a record of our shopping habits, we exchange information about ourselves to get something we want. But how can we tell if these are fair trades?

Just as financial literacy helps us make sound choices with our money, understanding how our information gets collected and used equips us to evaluate the decisions we make every day about our personal data.

Recognizing When You're Trading Your Data

Sometimes, the data-for-goods-and-services exchange is made explicit, like at the Shiru Cafe, where students trade their data for coffee.

Often, though, what we are giving up is less obvious. We might expect our posts or search terms to be analyzed, but do we think about what sites can determine about us by recording our mouse movements? Or do we realize that companies might still be tracking our locations, even after we try to disable this feature? Or are we aware that the “free” new app our favorite restaurant put out that tells us their daily specials may be collecting more information than it displays?

In the age of Big Data, it may be safest to work under the assumption that the services we use are gathering all the information about us that they feasibly can. Our burden as consumers, then, is to be conscious of what data we put out there, and trying to think like a data scientist can help with this.

YourDataPieChart

A handy pie chart

Say we are big fans of a certain news website. If this site doesn't have a paywall, then we might expect that they are funded by advertisements. As targeted ads are more profitable, we may assume that we are giving up data about what kind of articles we click on in exchange for being able to read them. Is this the only trade we make on this site, though?

Pretend we work as data scientists for the site. What other kinds of data could we extract about users?

Perhaps we could record how long readers spend looking at each article and how far down they scroll, or even how long they spend looking at the headlines in each section of the homepage. But if that's being recorded, then aren't users exchanging their data not only for the privilege of reading articles, but also for the ability to see what articles are even available?

Recognizing these sorts of trades is a crucial first step toward making informed choices about giving up our data. Until we understand what our data can be used for, though, we are paying with a foreign currency without knowing the exchange rate.

How Your Data Can Be Used

Our data can be used for good causes, from advancing healthcare to reducing energy usage. It can also be used in ways we find kind of creepy or outright disturbing. Indeed, the same piece of our personal information might be used to help and hurt us by the same organization.

So when should we feel comfortable giving up our data for access to something we want? There are no easy answers, but we will be better able to assess each situation if we get a sense of what insights organizations can extract from our data and practice stepping into a data scientist's shoes.

TargetedAds

Let's revisit that hypothetical news site and walk through how thinking like a data scientist can help us better understand the choices we have to make.

First off, what type of questions might that site's data scientists look into? What kind of articles attract readers, maybe? Or, more specifically, what kind of articles attract readers who click on advertisements? Or, possibly, what are the traits of popular articles that could be adapted to make sponsored content more appealing to readers?

The next step is to consider if the site could realistically piece together the data we provide to answer these questions and, if so, how might they do this.

In our example, looking at trends in readers' page clicks and view times, or perhaps conducting sentiment analysis of comments readers leave, might allow the analysts to group articles based on how appealing readers find them. Then, combining this with data about the individual articles in each group, they might find attributes that seem to make readers like some articles more than others (perhaps using certain keywords or referencing particular companies, trends, or individuals).

By going through this thought process, we can refine our questions about how—or whether—to use this news site. Now, we might ask, “does the information or enjoyment we get from this site outweigh the risk that the site will use our habits to make the delivery of sponsored messages we don't necessarily support more effective?” Or, possibly, “are we comfortable clicking on so many opinion-based articles if that might be seen as an indication that the site should shift focus away from impartial reporting?”

Now, we are starting to ask the right sorts of questions to make informed choices about the data we give up. However, there is more we should consider. Even seemingly useless data we give up now can be stored until technologies and analysis techniques improve enough to make sense of it, meaning that we not only need to consider what can be done with our information today, but also what might be done with it tomorrow.

How Your Data Might Be Used in the Future

Today, the idea of using our data to generate and display individualized content that changes our opinions on important issues may seem like science fiction. In a few years, it might not.

With many billions of dollars being invested in machine learning and artificial intelligence each year, even my limited predictive abilities are sufficient to see that organizations will extract progressively more insight from our data. How fast these developments will occur is harder to say.

OffTheGrid

There is a wealth of reporting and commentary about the advancements in machine learning and AI, but without at least a baseline understanding of these fields, it is tricky to sort out the truth from the hype. The more educated we are about these areas, the better we will be able to predict the risk of our current data being used in troubling ways in the near future.

In Conclusion

If we use our data as currency, we need to know how much we are really spending to keep ourselves out of trouble. The apps, websites, and other products and services that harvest our data might well be worth the cost, but it is hard to be sure without making a conscious effort to recognize the full range of our data that is collected and thinking through how this data could be used.

Naturally, educating ourselves about data science and related fields should improve our ability to put ourselves in the shoes of different organizations' analysts.

Get more practice, more projects, and more guidance.