This one’s going to make international headlines. Around 2.30am, I was repairing my son Joe’s Windows XP install when Zoli pinged this story. He says:
AOL, in blatant violation of its users privacy just released the log of 3 month’s worth of searches by 650,000 users. Not to the DOJ, but for open download by anyone. The claim:
“This collection is distributed for non-commercial research use only. Any application of this collection for commercial purposes is STRICTLY PROHIBITED”
…
AOL, you betrayed your users. If they are any smart, they will boycott your services.
Yuk – that’s really, really bad. Zoli and I engaged in a Skype IM about this – by 4.35am (I was still fixing Joe’s machine!) – the link to the page showing the file had gone to a blank page. I won’t link there. I’ve not downloaded the file which is 2GB unzipped.
Techcrunch thinks this could lead to evidence of criminal activity and refers to AOLs ‘utter stupidity.’ Paradigm Shift says:
The big affiliate marketers will make millions off this, i’m already busy processing the data, and after taking a quick peak at the data its an absolute gold mine for PPC and SEO.
So much for explicit prohibition for commercial use.
Among other things, Zoli and I speculated that:
Spammers will have gotten hold of the data and have a field day
It is possible to reverse engineer the searches to discover a LOT of personal details about people.
Questions:
- Zoli estimates maybe 1,500-2,000 downloads by the time AOL woke up to what they’d done. What’s the real number?
- How long was the file in the wild?
- Could illicit copies end up on eBay?
- Could market data derived from the file end up on eBay or as part of a market intelligence offering? Almost certainly the second if not the first.
- What will be the impact on AOLs stock price?
- Might shorters speculate on the impact?
- What about a class action lawsuit? For once I think there are decent grounds for one of the ambulance chasers to send out its hit squad – they may even get what they need from the file
- Will AOL be able to track who got the file?
- What is the potential for wholesale identity theft among those 650,000 AOL users?
- Who takes responsibility for this at AOL and how many heads roll as a consequence?
I’m sure there are plenty of other questions. These were what sprang to mind over a 30 minute IM.
BTW – this has nothing to do with security per se but everything to do with stupidity and ethics. It’s up there with Gerald Ratner as a gaff of monumental proportions.
UPDATE: Jason Stamper has penned a well-crafted and detailed analysis about some of the information that can be deduced from the file. Amazing stuff.





Comments on this entry are closed.
Get into the conversation
Hi Dennis
You beat me to the punch but I have downloaded the data and blogged about what I found of interest here, if your readers are interested in more on this story: http://www.businessreviewonline.com/blog/
All the best
Jason
In regards to question 1: The file has already been mirrored on at least 8 sites, that are active as I type this. AOL cannot contain it's spread now.
I am also interested to know if keeping this data is legal since it was released to the public. Can AOL force web sites to remove it? I'm not sure, but I don't think they can.
It's all ok now. It was entirely innocent apparently, but naturally AOL executives are 'angry and upset'.
"We're angry and upset about it," AOL spokesman Andrew Weinstein said. There. Matter sorted.
Good question Justin. I think AOL would struggle to force websites to remove it after they made it public themselves.
We were also wondering how long it would be before the class actions set sail against AOL, but on the other hand, what would you sue them for?
Potentially compromising your privacy? Or would you argue that you are identifiable from the data and so they definitely compromised it? Either way would you want to put your name to the case and thereby confirm that your search history is among those released?
As an aside, I've just written a follow-up blog pointing out that in most cases it is actually rather hard to be sure you have found someone's identity from the data, because you cannot assume that one search history belongs to one person – people in a household share one AOL account (and may even let a neighbour use their Net access), so many of the search histories are actually compilations of several users' searches. That makes the argument that each search history creates a picture of who the searcher is – and thereby compromises their security – less water-tight.
AOL has apologised but I wonder the damage to their already battered reputation: http://news.com.com/2100-1030_3-6102793.html?tag=…
Lots of damage. Try explaining to your granny that AOL published search histories but don't worry – they probably weren't hers and they didn't put her name on them. She will probably still worry
I would expect to see search engine companies like Google start to put a promise on their home page that they won't publish your search history (except if ordered to by the courts).
Jason, don't know if you've already seen this, but the first person has been identified: http://www.nytimes.com/2006/08/09/technology/09ao…
Perhaps rather carelessly, NY Times published her ID number in the article, so we can all now look up the data to see what Thelma Arnold from Lilburn, Ga. searches for.
Thanks for the comment on my blog Justin… yes I did see that story. I think they didn't mind giving her search history because they knew there was nothing scandalous in it. I suspect they could find other people with more embarassing search histories but deliberately chose someone who didn't, so that they themselves could not be in the firing line of a possible defamation case.
I'd also argue that although it is clear you can find people, proving that it was they that did all of the searches in their search history would be another matter if they can show the computer is shared….
Incidentally did you see that someone has put a web interface on the data? http://www.aolsearchdatabase.com/
Capitol Hill's got it, David Berlind talking about class action, numerous mirror sites. Nice to see MSM has woekn up. This sucker won't lie down.
A site where you can search the data is here:
http://www.datablunder.com/logitems/query/
Crikes! That's scary what you can do there.
A *quick* site where you can browse the data is here:
http://www.frogspy.com
Get into the conversation