Big data may not know your name.But it knows everything else


Company likes Anxiom, LexisNexisAnd others believe that as long as they don’t attach their names and some other identifiers, there is no need to worry about collecting and sharing sensitive American data. After all, their reasoning is that these “anonymous” data cannot be associated with individuals and are therefore harmless.

But as i testify In the Senate last week, you can basically re-identify anything. “Anonymity” is an abstract concept. Even if the company doesn’t have your name (they may have), they can still get your address, Internet search history, smartphone GPS logs, and other data to lock you down. However, this flawed and dangerous narrative still exists and continues to persuade legislators, thereby compromising strong privacy regulation.

Data on race, gender, ethnicity, religion, sexual orientation, political beliefs, Internet searches, drug prescriptions, and GPS location history (to name a few) of hundreds of millions of Americans is being sold On the open market, And too many advertisers, insurance companies, predatory loan companies, US law enforcement agencies, scammers, and abusive individuals at home and abroad (to name a few) are willing to pay for it. There is almost no regulation of the data brokerage circus.

Many brokers claim that they do not need supervision because the data they buy and sell is “not personal,” simply because there is no “name” column in their spreadsheet detailing the mental illness of millions of Americans. For example, Experian, a consumer credit reporting company, Say The data it shares extensively with third parties includes “non-personal, de-identified or anonymous” information. Yodlee, the largest financial data broker in the United States claim All the data it sells about Americans is “anonymous.” But the company is obviously wrong to say that such “anonymity” can protect individuals from harm.

Of course, there are some differences between data with your name (or social security number, or some other clear identifier) ​​and data without it. However, the difference is small, and it keeps shrinking as the data set gets larger. Consider an interesting fact about yourself: if you share your favorite spaghetti and bacon to a 1,000-person auditorium, the other people in the room are likely to say the same. The same goes for your favorite color, travel destination, or candidate for the next election. However, if you have to tell 50 interesting facts about yourself, the chances of all these facts being applicable to other people will drop drastically. Someone passed the list of 50 facts, and then finally the mini-file could be traced back to you.

This also applies to companies with huge data sets. For example, some large data brokers like Acxiom will promote thousands or tens of thousands of individual data points on a given person. In that breadth (from sexual orientation and income level to shopping receipts and physical activity in a mall, city, or country/region), everyone’s collective image looks unique. At that depth (from Internet searches to 24/7 smartphone GPS logs to medication prescription doses), many individual data points in each person’s profile may also be unique. For these organizations and anyone who buys, licenses, or steals data, it’s too easy to link all of this back to a specific person.Data brokers and other companies also create their own data, except for a name to do this, such as Mobile advertising identifier Used to track people across sites and devices.

Re-identification becomes very easy. In 2006, when AOL released a collection of 20 million web searches for 650,000 users, it replaced names with random numbers. This New York Times soon Link Search for a specific person. (“It doesn’t need much,” the reporter wrote.) Two years later, researchers at the University of Texas at Austin match 500,000 Netflix users rated IMDb’s “anonymous” movies and identified the users as well as “their apparent political preferences and other potentially sensitive information.”When researchers checked a dataset from the New York City government (again without a name), they were not only able to Backtracking Identify more than 91% of taxis from the incorrectly generated hash code, and they can still Classification Driver’s income.

Ironically, data brokers claim that their “anonymous” data is risk-free, which is absurd: their entire business model and marketing campaigns are built on the fact that they can closely and highly selectively track, understand and micro Under the premise of positioning the individual.


Source link

Recommended For You

About the Author: News Center