Clean Rooms and Identity Graphs


In this episode, PubWise founder and CTO Stephen Johnston explores the world of data clean rooms, including the importance of privacy and what they mean for publishers of all sizes. Also answered are questions like: What is a data clean room? How can publishers gain the trust of consumers? What solutions are there for publishers to leverage their own first party data? Listen in as Stephen provides off the cuff insight in a thought-provoking conversational format.


Tony Winders: Hello, and welcome to another edition of On the SPOT. My name is Tony Winders, and today we’re putting PubWise Founder and CTO Stephen Johnston on the spot about data clean rooms, recent Google news and more. Stephen, I always look forward to our conversations, and it’s great to be with you again.

Stephen Johnston: Thanks for having me; I enjoy the conversation too.

Tony Winders: I may be the last person on the planet to care about data clean rooms, but this article I read yesterday in AdExchanger about Amazon having their Unbox conference in New York and some of the coverage that I read on AdExchanger had to do with the Amazon Marketing Cloud, which seems like their version of a data clean room. That got me going down that rabbit hole of data clean rooms and really just getting up to speed on it, so it seems to have a lot of relevance to the first party data conversations that we’ve been having. So, I just wanted to pick your brain about data clean rooms, and how they show up in your orbit and what you think about them and where they belong in the future of a privacy enabled web.

Stephen Johnston: First, we have to identify what a clean room is. There’s a bit of “old is new” here, and I think when people hear the description, they may think, “Hey, this has been done before.” Because what’s underpinning the data clean room concept is something of a philosophical change, which I think still requires a lot of consumer trust and a lot of ecosystem trust, that may be a higher hurdle than people realize. But, a data clean room is a place to store data. And instead of it being a data management platform, what would make it a clean room is this idea that it is clean access in and out. That means that it is privacy clean or clean from the standpoint of someone can’t just pick through it. There’s access control and so the access is clean. It’s at least what they’re trying to imply. If you put all this data in there, then you can ask questions about it. And instead of getting specific answers about a user, you get answers on a population level or other techniques to anonymize it, but it’s really beyond anonymization. It’s really privacy centric, and you can run campaigns against it, so I think it is an interesting topic.

Tony Winders: Maybe they’re back in the news because of the sunsetting of cookies in Chrome and the possibility that data clean rooms could be used as a proxy for privacy enabled targeting. That’s kind of how I thought about it, but they’ve been around for years, so what were they doing four or five years ago? 

Stephen Johnston: But were they really being used in ad tech in this way? I would say ‘no.’ You have data rooms that have been used for sharing data around acquisitions or IPOs. The concept of a clean room is conceptually this sort of hermetically sealed, access controlled room with something special in it. They probably have been around for a while, primarily existing at the boundaries of two different organizations that wanted to share data and wanted to do it in an access control kind of way. The idea that we would broadly deploy these in the industry in order to do advertising targeting, I think, is the new concept. And it is absolutely in response to all this stuff around privacy. Part of this is this sort of three sided thing. The conversation prior was buyer and seller (buyer and seller of the ads), and the viewer of that ad or the consumer of that ad was somewhat out of the equation. And so, third party cookies basically took data, created targeting profiles, built up that profile around the user, and the users sentiment about that was ignored. The clean room is at least part of this trying to get back into the triangle – ad consumer, ad seller, ad buyer – and making this idea we can bring you targeting; that’s the benefit to the ad consumer, the visitor to the site or the user of a property. Theoretically, the benefit is we can target you with things that are meaningful. The downside, historically, has been people are going to have access to this data, and then can it be used in ways you haven’t approved of? That access control is at least conceptually being beige. It’s going to be used specifically for this campaign. It’s not going to identify me specifically. It’s accurate in terms of targeting me, so that if I really am a mother of two, who is trying to find the best toothpaste, then maybe I will find toothpaste ads. That’s a valid use of targeting, if that then gets used outside to build a broader profile so you can pitch formulas that I’m not interested in or any number of things, that’s where people get a little bit more privacy concerned. What else do you know about me that’s not immediately in the purview?

Tony Winders: Do you have a sense of technically how a data clean room could be used for ad targeting without cookies? Technically, how could you serve an ad to an individual based on what you know about them? What’s the technical mechanism, if not a cookie?

Stephen Johnston: The primary piece is to add the data in and add its boundaries, then something is performed on it, to make it so that when you get it back out, or other people get it back out, they can get the targeting without necessarily exposing all the specifics of what went in. So it’s almost like a math function. The aspects of putting it in, removing some information, anonymizing it, and then the access controls to get it out doesn’t provide that information back, but provides it in a statistically valid or targeting valid kind of way. All of the particulars that you’re asking about, are the full range, the gamut of traditional means. Then you get pools where, we’re going to tell you they’re in the pool; we’re not going to tell you everything about the pool. To some degree, a clean room is like, can this be the place that you trust? Because right now you don’t really know where it is. And because to some degree at the boundaries, all that information is available, you’re trusting the holder of the clean room to throw it away; you’re trusting them enough to keep it in some way and re-augment it later. 

Tony Winders: You answered my question well. To also kind of answer my own question, cookies could still be used to fire ads based on who the individual is. It’s just not a third party cookie at that point, it’s a first party cookie, and you’re inside the clean environment. If Amazon wants to serve an ad across Amazon’s ecosystem, and they know who the person is, then that’s fair game, right?

Stephen Johnston: I think, to some degree, people have overlaid the idea of identity with the idea of a cookie. Cookies are a transport mechanism. A cookie is like a truck and the identity is what’s in the truck. For some reason, because people started to become concerned about their identity information being out around the web, they started maligning the truck and in another metaphor, don’t shoot the messenger. We’re definitely shooting the messenger here; the cookie is just transporting the identity. In the case of a clean room, yes, you can back package it up and transport that identity of whatever says they’re part of this through some other mechanism. But, they’re at least trying to create ways to say, “I have this lower resolution identity, can it at least tell me reliably that they’re interested in this thing my campaign’s about without exposing the other 70 things you know about the person?” Then, through this sort of leaky, non-secure, non-clean method, you can build up a profile about a person on your own. The idea that we know things about you isn’t necessarily what scares people. It’s the, ‘do you know all the things about me?’ Contextualize the web as social interactions. You go to a thing for coin collecting, or I do bonsai. I don’t just walk in the room and tell everybody, “Oh, by the way, I’m taking medication for XYZ, and I’ve got a rash on my elbow and my kid is throwing up at home.” You talk about Bonsai. Then you go to a different social context, and you may not talk about bonsai, and now you’re talking about this thing. Those things are relevant to make that social interaction happen. I think it’s good to think of these things in that context. What they’re trying to create online is a channel by which you can communicate relevant context about you, without exposing it all or providing some connectivity. That, theoretically, is the clean room. Now, the clean room has all of that information and at some point, it saw it all. But the people that have a clean room are saying, “Yeah, we’re throwing away all that unique stuff and now we’re just creating these pools of things.” It’s an interesting technical challenge. It’s more of a trust challenge. The technical aspect is about access controls and that sort of thing. But fundamentally, access controls only get you so far when the data comes in at the boundary and has to be properly handled, to make sure that it can never come back out.

Tony Winders: It seems like the data clean rooms, because of their complexity and the need for scale, are only going to empower the oligopoly even further. What are the small publishers to do in the context of data clean rooms? If there were a trusted source that could be the data clean room for every small publisher out there, then that would be interesting. But right now, I only hear it being the walled garden conversation; where else might it be?

Stephen Johnston: Well, I think that’s where Amazon is trying to say, “Look, we’re going to have a clean room solution, and we’re offering it more broadly and play in that space and you can trust us.” If everybody spins up their own clean room, who are you sharing the data with, and how valid is it really? How do you expand that out? You really do want to expand your targeting capability out into the wider web; you just want to do it in a way that’s privacy centric that says, “Hey, I do know a thing about you. But knowing one thing about you doesn’t open you up to me knowing everything about you.” Not everybody feels that way, and I think at least part of this conversation has me being pragmatic. We need to know something about you. Otherwise, we’ll just roll the dice on an ad campaign, and I’ll be getting ads for women’s shoes all day, and I don’t wear women’s shoes. The theoretical, non pragmatic, privacy-privacy-privacy people just say that “ targeting is hogwash, you shouldn’t use it at all.” The middle ground is some ability to get the information in a clean and safe way. It’s difficult. The classic response to this from challengers or small companies is consortium. If you go to your business book and look at the list of things, how do you respond to needing group leverage when your group isn’t very big? (The answer is) Consortiums, buying groups, and those sorts of things. So, the consortium has to trust someone, like Amazon, trying to step in, to understand what Amazon is saying about how they’re going to use the data. And I would imagine what they’re going to say is that they’re going to target it. So I would ask questions like, “Are you going to compensate providers with the data? If the information they brought to you is used in a broader campaign, is the site going to get credit for it?” That’s what I’d like to see as we go forward. We’ve got this clean room; 12 people participated by putting data in, so that you can find those audiences more broadly on the web or on these different properties. How are you compensating that they added data to the pool that made another campaign more effective? And with third party cookies, that definitely didn’t happen. Once they had the identity, it got reused over and over and over and over again, and then it got synced. It got synced with all their IDs. Now everything you knew, I know, theoretically. That’s really just plugging around the identity. This is where you get into some weird conversation about it. Hardly anyone really knows what all those identities were because people have this image, like it pulls up a piece of paper that says “Stephen, blah, blah, blah, blah. It’s not really how it works. The solution is going to have to be a trusted group. I would say generally, the industry needs to find something other than Amazon or Google to do that.

Tony Winders: Well, the IAB is going to release its first draft of clean room standards in December. Not that they want to be that trusted third party, but if any organization was in a spot to do so, it would be them.

Stephen Johnston: What you’re really asking from a clean room perspective, is that a company maintains a technical implementation of morality. And that morality says you’re going to treat the data properly. That’s difficult. The IAB’s had great success with technical specifications that describe how things communicate. But, if we could essentially legislate morality, we’d solve all the world’s problems.

Tony Winders: Indeed. What if a trusted organization, like PricewaterhouseCoopers in association with a trade body like the IAB and a technical entity, like PubWise, or Amazon, or whomever, maybe there is a world in which that would be a consortium of sorts. Before we have to go, Stephen, in related news this week, Google pushed back its Universal Analytics sunset until July of 2024. And the requirement that everyone start using GA4, which coincides with the support for cookies in Chrome is delayed until the end of 2024. I’m wondering if those two things are related, but also how it relates to this conversation about data clean rooms.

Stephen Johnston: Yeah, I think every time I hear Google say, “We’re kicking cookie deprecation and then kicking GA4,” what I hear is that the market does not yet have a replacement for third party cookies. One of the largest, most informed organizations out there in the world doesn’t feel like the market is ready for those things to actually go away. Look, as an entrepreneur, as a person building a product, that just means opportunity to me. I understand that for a publisher, that’s stress, it’s like kicking this again. You should be hearing that there isn’t a solution, yet. From our perspective, how do you measure potential solutions? How do you monitor potential solutions? How do you plug in the ones that claim to be the solutions? Clean rooms are a component of that. We don’t want to be the clean room provider; that’s not really in our product roadmap right now. Connecting publishers to clean rooms is(in our product roadmap). How they interact with them, how they provide the right data, how they can prove attribution and all of that absolutely is (in our product roadmap). Whether they’re plugging into Amazon’s or Google’s or PricewaterhouseCooper’s third party consortium, then so be it. We’ll make all of that happen.

Tony Winders: I appreciate all the insight. I never quite know where our conversations are going to lead. I just throw a topic at you, and we weave around, and I always learn something. I hope that our audience is getting a lot of value out of your technical expertise; I know that I am. We’ll leave it there for today. This has been On the SPOT with Stephen Johnston, Founder and CTO of PubWise. For more of Stephen’s insight, I hope you’ll listen to other episodes and share our podcast and our blog with your followers and your networks. We’re at I look forward to seeing you next time.Thanks, Stephen.

Share on facebook
Share on twitter
Share on linkedin
Share on email


The Secret to Publisher Monetization