November 18, 2008
"Identity thieves strike every 3.5 seconds and it costs consumers more than $56 billion ($6,383 per victim) annually to recover" (1). When you sell insurance, it surely pays to frighten people while making sure the US Congress does nothing about it. Let us hope AMEX actuaries at least are better at tracking the data bubble than those at AIG, who drove their company to ruin by underestimating the credit bubble.
Today however I plan to play Mr Zuckerberg's advocate. According to Saul Hansell, "Zuckerberg's Law" forecasts information sharing to double every year or two (*). But is what is good for Facebook good for us? I suggest it all comes down to reports, votes, profiles and connections.
By voluntarily reporting to a central authority, users can create a sensor array whose low cost and large scale are unmatchable. As Matthew L. Wald has it (**), "cars could become scouts for snowplows, pothole repair crews, police officers, ambulance drivers and even medevac helicopter pilots". Equip each car for instance with an anti-skid device which sends its position each time it is activated. With enough data to keep false alarms below some acceptable level, you can pinpoint slippery road conditions to warn and to spray.
The same idea is behind the new Web tool tested by Google.org "to detect regional outbreaks of the flu a week to 10 days before they are reported by the Centers for Disease Control and Prevention", as written by Miguel Helft (***). A sudden, localized peak in interest for Internet documents on flu symptoms can be tracked quicker than visits to physicians. Even false alarms might in that case uncover useful information. If parents are worried sick about the flu for no apparent cause, the school district superintendent might want to know.
Besides the issue of accuracy which attends any pattern recognition task, user reports raise privacy concerns. Neither Matthew L. Wald nor Miguel Helft fail to mention them. But as Mr Zuckerberg's advocate, I will stress that the value added by such reports does not depend in any way on violating user privacy. In the absence of any conflict of interest, privacy can and should be designed into the system.
Rather than privacy, the priority should be on identity and responsibility, the last two terms of my motto. Criminals have many reasons to fool noise filters and game reporting results for profit, e.g. a pharmacist who undershot his flu vaccine stockpile or a road service contractor who wants to cash his worth in salt. Lest a bad case of listeria ensue, individual contributions should be open to tracing after the fact whenever foul play is suspected.
Internet is also the ideal medium to find truth by popularity. It would be misleading to consider a vote as just another report. The intent may well be to tally information from users in the field so as to reach a decision but this process is carried out in a totally different spirit. When filing a report, the user acts as a fallible witness. When casting a ballot, the user expresses a personal right.
Public votes have no need for privacy but, whether secret or public, all votes have a special need for accuracy. The issue here is to prevent users from voting more than once, a matter of identity checking. Notice that the system needs only to verify whether "one is someone who has already voted", rather than to know "who one is". This meaning of the word "is" enables the voting system administrator to rely on any third party with a good user database. Sites geared to doctors often request the physician license ID. At most consumer sites, providing an email address is good enough to discourage egregious cheating.
Votes are a matter of opinion. While including opinions, profiles mostly cover objective facts, from personal attributes to behavior history. In view of our laziness and self-ignorance, the record of our activities may actually prove more reliable than any opinion we profess.
Profile data is often compiled into statistics for specific characteristics over the whole user population. Perhaps people may be more sensitive to privacy when it comes to profile information but the labeling is deceptive. This is collective reporting under another name. A car which gives the location of a pothole is also telling where the driver is. The challenge remains to limit how much personal information is revealed as a side effect. Remember that, despite its legal standing (2), there is no such thing as intrinsically "non identifiable personal information". Aggregate enough of it and you identify the individual concerned. Plot for instance one's car locations over time to reveal one's commuting between home and office.
Confidential data mining solves this problem by taking advantage of the error smoothing inherent in statistics. Add enough noise to mask each elementary report but in such a way the noise cancels itself over the whole population. On can also limit the total number of reports per user. Whether pothole locations or Internet searches, a couple of data points will rarely give one's identity away. For practicioners, Winnie Cheng, who does research on confidentiality at MIT, has a more complete list of privacy conservation measures from which to pick (3).
Individuals may of course freely choose to surrender their right to privacy. In her article about those who volunteered to have their genome made public (****), Amy Harmon quotes James Watson, of DNA fame, as saying "I put mine out there, but I'm 80. Randomly putting up young people's genomes could cause individual harm."
Besides being a way for users to contribute to society, profiles are an attempt to get society to accommodate users on a personal basis. David Carr reviews how the campaign of the US President-Elect reaped the benefits of social networking (*****) as "Obama supporters [...] traded their personal information for a ticket to a rally or an e-mail alert about the vice-presidential choice, or opted in on Facebook or MyBarackObama."
Again privacy is not an issue when users are motivated. The danger lies when advertisers reverse the process and seek to motivate users. When it's all about me, the richer my profile, the more on target my interaction opportunities. Yet, as users bear privacy risks and advertisers reap most of the benefits, risks and rewards are no longer aligned, a serious problem past fillips have endeavored to rectify. For the majority of users, personalization could be had in total privacy. To power market research, profiles should be bought from an appropriate sample, as Nielsen does from its panelists.
But beyond profile-driven personalized interactions between individuals and society, the promise of Internet arises from interactions between individuals as society. It's all about connections.
Were on-board car sensors to report on traffic rather than potholes, and drivers to visualize it in real time, networked users would interact among themselves on a global scale, hopefully to optimize everybody's travel. This is what Elizabeth Thomson describes as the goal behind the CarTel project (******), piloted by "Professor Hari Balakrishnan and Associate Professor Samuel Madden of MIT". Read also Susan Chaityn Lebovits on the Smart Biking Project, another MIT project led by Christine Outram with "a Facebook application called "I crossed Your Path"" (*******).
Forget cars and bikes to deal directly with people. As a social network tabulates and beams back human connections in quasi real time, the traffic over existing ones spawns new ones along the way. Meanwhile collaborative editing, blog writing and searching let individual opinions connect into self-modifying knowledge societies no longer limited by the past need to make do with small areas or long delays.
Social animals, are we not defined by the sum of our connections, in need of the reassurance of repeated contact? In the end the medium needs no message. The power lies with the ability to map and tap this web of connections, as the Beacon initiative tried to do.
However connected people do not like the social network to treat them as an entomologist observes an ant colony. I have already warned market operators of the need for more rigorous privacy. Given the fate of Beacon, Mr Zuckerberg should take an advocate to defend his future steps.
Much worse may to come to pass. As Miguel Helft quickly picked up (********), Google Flu Trends triggered a fit of rational paranoia. I fear lest every reward of our shared information society become tainted by the industry track record of putting profits before privacy.
- (*) ................... Zuckerberg's Law Of Data Sharing, by Saul Hansell (New-York Times) - November 7, 2008
- (**) ................ When the Roads Talk, Your Car Can Listen, by Matthew L. Wald (New-York Times) - October 30, 2008
- (***) .............. Aches, A Sneeze, A Google Search, by Miguel Helft (New-York Times) - November 12, 2008
- (****) ............ Project Lets Anyone Take a Peek At the Experts' Genetic Secrets, by Amy Harmon (New-York Times) - October 20, 2008
- (*****) .......... Obama's Personal LinkedIn, by David Carr (New-York Times) - November 10, 2008
- (******) ....... CarTel personalizes commutes by using Wifi to network cars, by Elizabeth Thomson (MIT Tech Talk) - October 8, 2008
- (*******) ..... A quantum leap in bike mechanics, by Susan Chaityn Lebovits (Boston Globe) - November 16, 2008
- (********) ... Flu Report Raises Privacy Concerns, by Miguel Helft (New-York Times) - November 17, 2008
- (1) Javelin Strategy & Research 2008 Identity Fraud Survey Report,
quoted by Michelle Koehler, Director, Cardmember Insurance, AMEX Assurance Company (November 2008 direct mail campaign)
- (2) for example look at HIPAA, in the documentation for my lecture on the Handling of Medical Records.
- (3) for more details, see MUPPET: Mobile Ubiquitous Privacy Protection for Electronic Transactions,
by Winnie Cheng, Jun Li, Keith Moore, Alan H. Karp (HP Laboratories), March 22, 2007