Update-11/4/11: This article reflects our policies as of Fall 2010. Since that time, we’ve continued to update our policies in line with industry best practices, including those developed by the DMA, NAI and IAB. As our technology and products continue to evolve, we’re always committed to certain fundamental privacy principles: that users have control over their data, that data collection and use be made as transparent as possible, and that online behavioral tracking data should never be merged with a person’s real-life identity. For our most current privacy practices in our online advertising data business (a division within Rapleaf called LiveRamp), refer to LiveRamp’s current privacy standards here.
If you ever need to drop data in a browser cookie, you generally have two options: dropping the data directly, or dropping a unique ID (or UUID, for universally unique identifier). In the latter case you’d have to store a mapping from UUIDs to data on your server, and whenever you see a cookie you’d query this map to acquire the data you want.
The UUID approach is nice from a technical perspective because it limits the size of the cookies you drop: a UUID is only 16 bytes.1 Cookies get sent during browser requests, and may be uploaded multiple times during a browsing session. If a cookie is large enough, it can dominate the size of the request and noticeably hurt the user’s browsing experience. This issue is mitigated somewhat by the fact that cookies can’t be larger than 4K—but then you run into an upper limit on the amount of data a cookie can contain, and the UUID approach becomes attractive once again.
UUIDs are also convenient because all the data lives on the server, simplifying the task of updating that data. If the data lives in the cookie, then we cannot update it until we have an opportunity to drop another cookie on the user.
Because of these features, UUID’s are used by almost every ad network and advertising technology company today. However, although UUIDs are attractive, we’ve prohibited the use of UUID’s here at Rapleaf due to privacy concerns. UUIDs are, by design, uniquely identifying. If you use UUIDs, it means you have a mapping from UUID to data on your servers.
Unique Identifiers Are Often Personally-Identifiable
Here’s a simple example of how a UUID system might work. Let’s say we have the following database of information:
Now imagine we want to drop a cookie based on the email jsmith@example.com. Rather than putting the actual data in the cookie (e.g., gender = male and whatever other information there might be in subsequent columns), we could simply drop the UUID 0800200c9a67. If we see this cookie later, then all we need to do is take the UUID, find its row in the database, and grab the data associated with that user.
If that data contains any personally identifiable information (like a user’s name or email address), it’s completely trivial to map from a browser cookie to a person’s identity. In fact, many companies are doing this today. They claim to not include personally-identifiable information in cookies, but in fact they store UUID’s that map directly to email addresses or hashed email addresses—making it trivial to reconstruct the browser’s identity.
For example, from the UUID 0800200c9a67, it is trivial to derive that user is actually jsmith@example.com—so the UUID itself is personally identifiable. The danger of this system is that the ad network can merge the data about what sites you visit back into a database attached to your email address, name, and address, building a permanent data set of what sites you’ve visited.
And even if you can’t map a UUID to personally identifiable information, there are still privacy issues. Specifically, a UUID can act as a unique identifier for a particular browser. This means that you can know a user’s browsing history, even if you don’t explicitly know who the user is. By piecing together enough pieces of information on a user, you can often figure out that user’s identity—making it possible for a rogue company (or government) to link browsing behavior to specific individuals.
At Rapleaf, we actively avoid collecting data on browsing history: we don’t want to know it, it’s not our business to know it, and we want to control the amount of information we know about the user to ensure that they maintain anonymity online. Full stop.
Privacy-Centric Alternatives
That’s why we store data on the cookie itself. We don’t put any personally identifiable information in our cookie, so there’s no straightforward way to know who a browser might belong to. Likewise, we don’t put a UUID in there, so there’s no straightforward way to determine browsing history.
Now, we recognize that this system isn’t perfect. Given enough data on a user, it is often possible to de-anonymize that data back to a particular user. If you read our post about Anonymouse a few weeks ago, you’ll know that we’re spending a lot of resources on solving this problem. Once a cookie has been anonymized, this should provide a strong guarantee on the user’s privacy.
There’s one last alternative to UUIDs that combines the best of both worlds—the privacy advantages of putting data directly in the cookie, as well as the technical advantages of using UUIDs. After a set of cookies have been anonymized, each cookie will belong to an equivalence class with several others. For example, if we perform 10,000-anonymization on the data set, then each cookie will look identical to at least 9,999 other potential cookies.
Now, instead of storing all the data in the cookie, what if instead we simply stored an equivalence class ID? This gains us all the technical advantages of dropping a UUID, since we’re only dropping a single key in the cookie. But from privacy standpoint, it is fundamentally different from a UUID. An equivalence class tells us nothing about an individual user; if we have 10,000-anonymized the data set, then by design the user could be any one of 10,000 people. It is impossible to gather a browsing history, since multiple browsers can and will have the same equivalence class ID. Of course, this relies on a strong degree of confidence in the anonymization algorithm, and this is a change we have not yet implemented—but we think it’s a promising idea.
1 There’s nothing special about 16 bytes. All that’s necessary is that the ID is large enough to be uniquely identifying within the domain of the ad network. I used 16 bytes because that’s the size specified in the UUID standard.









2 Comments
Very interesting post. It’s great that your share your methodology.
Curious about how you handle timestamps and expiration dates for the cookies that you write, insofar as those are essentially unique identifiers as well.
Great question! This is something we’ve thought a lot about and are actively working on.
We don’t get a cookie’s creation or expiration date from a request—we can only read the key/value pairs inside the cookie. Even so, we do take the extra step of randomizing the expiration date on the cookie to sometime within 90 days of dropping the cookie. And we don’t log this expiration date, of course.
Additionally, here are a couple general principles we’re working toward:
- We only store information at the level of precision needed (e.g., if we’re ever logging timestamps for debugging purposes, there’s no need to store it at the level of milliseconds)
- We delete logs as soon as possible (i.e. once they have served their debugging and billing purposes)
Hopefully this answered your question! It would be great to chat further if you have additional ideas about this; we’ll reach out to you separately to get your thoughts.
7 Trackbacks
[...] Read the full blog post here Share this post: [...]
[...] IDs also means people may no longer be anonymous. A more privacy-centric solution is to store all the segments of a person directly on a cookie. The data can be encrypted and secured so that only the cookie-placer can access [...]
[...] IDs also means people may no longer be anonymous. A more privacy-centric solution is to store all the segments of a person directly on a cookie. The data can be encrypted and secured so that only the cookie-placer can access [...]
[...] IDs also means people may no longer be anonymous. A more privacy-centric solution is to store all the segments of a person directly on a cookie. The data can be encrypted and secured so that only the cookie-placer can access [...]
[...] IDs also means people may no longer be anonymous. A more privacy-centric solution is to store all the segments of a person directly on a cookie. The data can be encrypted and secured so that only the cookie-placer can access [...]
[...] IDs also means people may no longer be anonymous. A more privacy-centric solution is to store all the segments of a person directly on a cookie. The data can be encrypted and secured so that only the cookie-placer can access [...]
[...] Read the full blog post here Tweet This entry was posted in Online Advertising, Privacy Related Uses. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL. « Why Flash Cookies Should Be Banned for Advertising Rapleaf CEO’s Thoughts on the Importance of Online Anonymity » [...]