De-identification is a process used to prevent a person’s identity from being connected with information. Organizations de-identify data for a range of reasons. Companies may have promised “anonymity” to individuals before collecting their personal information, data protection laws may restrict the sharing of personal data, and, perhaps most importantly, companies de-identify data to mitigate privacy threats from improper internal access or from an external data breach. This Essay attempts to frame the conversation around de-identification.
There are only a handful of reasons to study someone very closely. If you spot a tennis rival filming your practice, you can be reasonably sure that she is studying up on your style of play. Miss too many backhands and guess what you will encounter come match time. But not all careful scrutiny is about taking advantage. Doctors study patients to treat them. Good teachers follow students to see if they are learning. Social scientists study behavior in order to understand and improve the quality of human life.
“Big data” can be defined as a problem-solving philosophy that leverages massive datasets and algorithmic analysis to extract “hidden information and surprising correlations.” Not only does big data pose a threat to traditional notions of privacy, but it also compromises socially shared information. This point remains underappreciated because our so-called public disclosures are not nearly as public as courts and policymakers have argued—at least, not yet. That is subject to change once big data becomes user friendly.