Thanks for inviting me here to speak with you today.
The purpose of my talk is to share a new possibility for the future regarding users’ personal data that most have not yet explored. It sits between the two extremes of a familiar spectrum.
On one end, “Do not track” using technology and a legal mandate to prevent any data collection.
On the other end, “Business as usual” leaving the door open for ever more “innovative” pervasive and intrusive data collection and cross referencing.
There is a third possibility that aligns with peoples’ privacy needs as well as offering enormous business opportunities.
A nascent but growing industry of personal data storage services is emerging. These strive to allow individuals to collect their own personal data to manage it and then give permissioned access to their digital footprint to the business and services they choose—businesses they trust to provide better customization, more relevant search results, and real value for the user from their data.
With other leading industry thinkers, I have come to believe that there is more money to be made in an ecosystem that allows users to determine which businesses have access to what data,and under what terms and conditions, than there is under present more diffused, scattershot, and unethical collection systems. Today I will articulate the broad outlines of this emerging “personal data ecosystem” and talk about developments in the industry.
Those of you who know me will find it unusual for me to have such a keen focus on making money on user data and emerging business models.
I am, after all, known as the “Identity Woman – Saving the World with User-Centric Identity”. Since first learning about issues around identity technologies online in 2003, I have been an end user advocate and industry catalyst.
In this role, I have convened the Internet Identity Workshop every six months for the last six years. This collaborative industry event brings together independent developers, major web companies, companies who sell identity solutions to businesses, and a range of others interested in our work.
User-centric digital identity technologies are focused on making individuals/ end-users/ citizens empowered with their own identity online – to have it be persistent, autonomous, be under their control.
Many of you doubtless have recognized in our logo the allusion to the famous New Yorker cartoon, “On the Internet, nobody knows you’re a dog”
To me our logo symbolizes three human rights which we believe are worth fighting for:
- The Freedom to be who you want to be online, i.e., the right to anonymity.
- The Right to curate the information about yourself that can be found online.
- The ability to express verified claims about yourself and share detailed information when you want with people and organizations.
Today the personal data ecosystemin which almost everyone unknowingly participates pays no heed to these rights. Each of us emits information about ourselves, our activities and intentions, in various digital forms. This information is collected by a wide range of institutions and businesses with which people interact directly; then it is assembled by data brokers and sold to data users.
This chain of activity happens with almost no participation or awareness on the part of the data subject – the individual.
The Wall Street Journal in its “What They Know” series of articles has outlined several examples. Life insurers are beginning to explore how risk can be assessed with social networking data to screen people with unhealthy lifestyles.
RapLeaf gathers transactional and social networking data linked by common e-mail addresses to build comprehensive profiles of people. Some of its clients, such as political campaigns that solicit e-mail addresses then use the RapLeaf service to get the name of the person and personal details behind the e-mail address.
Just this week, the Washington Post reported that banks are now inserting advertisements and coupons when people are banking online to make them offers based on their purchasing behavior.
Clearly there is money to be made in collecting and selling personal data.
The community of technologists focused on user-centric digital identity has labored since 2004 on technologies like OpenID and Information Cards that have had limited success. What I have come to realize is that unless businesses see a return over and above the cost of adopting new technologies that are aligned with end-user interests, nothing will change. Being “good” for the end user is not sufficient motivation to spur an industry transformation. The business value we were missing with this these early generations of user-centric technologies must be built in to the emerging Personal Data Ecosystem.
Here I’d like to highlight a key distinction: There is a fundamental difference between being watched and being seen.
Being SEEN is an act of mutual social recognition – I see you, you see me, we see each other seeing each other – we are seen.
Being WATCHED is uni-directional. It is done without the subject – I may say “the victim’s” knowledge.
Being STALKED is what happens to someone when the watching activity is aggregated, that is when someone is followed through time and space without their awareness.
The behavioral marketing, advertising and search industries are stalking people all over the web – collecting information about them and their activities without their awareness. In reaction to these wide spread industry practices, many consumer advocacy groups have proposed the development and mandating of “Do Not Track” technologies and systems.
Governments both here and in Europe are increasingly looking at formally regulating these industries, and because of this, self-regulation is emerging. Some web companies are moving to limiting data retention. For example, Google now anonymizes data after 9 months. Previously, they anonymized at 18-24 months.
My contacts in the data broker industry concur that the writing is on the wall. They see their current business practice of selling massive amounts of aggregated data about people being regulated into uselessness. Soon they only will be allowed to hold severely limited amounts of information about people for a short amount of time.
In short, the present model of massive personal data retention and cross correlation is not sustainable. Not only are governments moving to regulating these activities, but more to my point today, it is not even the best way to get useful information about people – information that can point to services they want.
The community I am leading is approaching end-user empowerment and control by simply asking this question:
What if individuals were given the tools to collect and manage their data – their digital footprint?
By a “digital footprint” I refer not only to explicit information such as named preferences -who they call or are linked to and what they buy, but also implicit information – where they click, how long they spend on pages, and the order they move through the web.
What if there were tools and services for collecting and integrating different kinds of digital data streams – data sets that only the individual could integrate?
[This list of Personal Data Types from the World Economic Forum, Rethinking Personal Data Pre-Read Document by Marc Davis published in June of 2010]
Identity and Relationships
- Identity (IDs, User Names, Email Addresses, Phone Numbers, Nicknames, Passwords, Personas)
- Demographic Data (Age, Sex, Addresses, Education, Work History, Resume)
- Interests (Declared Interests, Likes, Favorites, Tags, Preferences, Settings)
- Personal Devices (Device IDs, IP Addresses, Bluetooth IDs, SSIDs, SIMs, IMEIs, etc.)
- Relationships (Address Book Contacts, Communications Contacts, Social Network Relationships, Family Relationships and Genealogy, Group Memberships, Call Logs, Messaging Logs)
- Location (Current Location, Past Locations, Planned Future Locations)
- People (Copresent and Interacted with People in the World and on the Web)
- Objects (Copresent and Interacted with Real World Objects)
- Events (Calendar Data, Event Data from Web Services)
- Browser Activity (Clicks, Keystrokes, Sites Visited, Queries, Bookmarks)
- Client Applications and OS Activity (Clicks, Keystrokes, Applications, OS Functions)
- Real World Activity (Eating, Drinking, Driving, Shopping, Sleeping, etc.)
- Text (SMS, IM, Email, Attachments, Direct Messages, Status Text, Shared Bookmarks, Shared Links, Comments, Blog Posts, Documents)
- Speech (Voice Calls, Voice Mail)
- Social Media (Photos, Videos, Streamed Video, Podcasts, Produced Music, Software)
- Presence (Communication Availability and Channels)
- Private Documents (Word Processing Documents, Spreadsheets, Project Plans, Presentations, etc.)
- Consumed Media (Books, Photos, Videos, Music, Podcasts, Audiobooks, Games, Software)
- Financial Data (Income, Expenses, Transactions, Accounts, Assets, Liabilities, Insurance, Corporations, Taxes, Credit Rating)
- Digital Records of Physical Goods (Real Estate, Vehicles, Personal Effects)
- Virtual Goods (Objects, Gifts, Currencies)
- Health Care Data (Prescriptions, Medical Records, Genetic Code, Medical Device Data Logs)
- Health Insurance Data (Claims, Payments, Coverage)
Other Institutional Data
- Governmental Data (Legal Names, Records of Birth, Marriage, Divorce, Death, Law Enforcement Records, Military Service)
- Academic Data (Exams, Student Projects, Transcripts, Degrees)
- Employer Data (Reviews, Actions, Promotions)
What if individuals were given the power to SEE themselves to collect and aggregate information that no one else could ethically integrate.
What if the individual could choose to retain all the information they wanted for as long as they wanted?
This is a graph that I first saw Marc Davis (who is now a partner architect at Microsoft) share to explain today’s current data environment and a future where people are in control.
This red dot shows us what’s happening today: –some data aggregators are necessarily self-regulating by limiting the amount of time they keep data, and governments are limiting data retention and anonymization practices.
The green dot shows us what WOULD happen if people were given the capacity to store and manage their own data – if they could keep as much data as they wanted for as long as they wanted. Digital footprints of a lifetime could be shared with future generations.
In a user-centric model where the individual can aggregate information about themselves, new classes of services – more specific to the individual, based on data accessed with user permission, can emerge.
The foundation of this eco-system is personal data storage services that are totally under the control of the individual. We can already see early examples of these in the marketplace.
Statz is a startup that supports you pulling in your information from different service providers:
- Mobile phone records
- Energy and utility records
- Health and fitness
- Shopping and payments
Statz gives you instructions on how to go into your mobile carrier or electric company and export your statements – often this involves a dozen steps and is very labor intensive – not something easy or that everyone will do.
Greplin Does Personal Cloud SearchWhen people set up their accounts they give the service access to a range of accounts – LinkedIn, Gmail, Basecamp, Flickr, etc. Then you use their engine to search across them.
Personal.com has raised 7 million in venture funding and although it does not yet have any services their website articulates clearly how personal data under the control of the user is valuable.
There are two open source projects currently active in this space – The Higgins Project and Project Danube which have code for personal data store services.
Mydex is a Community Interest Company based in the UK that has begun a community prototype that connects individuals’ personal data store accounts to local government agencies.
At the World Economic Forum in Davos next week they are having a workshop on “Rethinking Personal Data” and Harvard & MIT are showing early prototypes of personal data collection from mobile phones into personal data stores and then enabling different types of sharing.
Kynetx is developing a new language that looks at data from personal data stores and public datasets and can do real time matching based on rule sets created by the individual to surface relevant content.
With a rules engine working on your behalf with your data you could imagine a GPS Navigation system that accesses your calendar, gift recommender/reminder that looks at your friends wish lists and birthdays – it could link these all together and make helpful recommendations of where to stop to pick up a present for a friend with a birthday next week.
This example shows how this is nascent but emerging industry of personal data stores presents opportunities for the media and advertising industries. Once people have their own personal data store and services, they can collect data over years about their behaviors, likes, interests, purchasing habits, etc. These large stores of personal data are a gold mine of information about potential commercial intent.
The Personal Data Ecosystem presents the opportunity to do targeting through access to personal data that is more accurate, more detailed, more comprehensive, data that was not accessible before, or that could not be combined with other data before. Giving individuals choice about where they store their personal data and who has access to it, and under what terms and conditions, grows trust. This trust is hugely valuable, because over time more and better services that combine and utilize valuable personal data can be offered.
It supports new forms of advertising and marketing by enabling trusted relationships between customers and vendors that enable “relationship marketing” and opt-in, user controlled sharing of data, permissioned communications and offers, group buying, recommendations, social and viral marketing, more efficient commercial exchanges.
Individuals could through their personal data store choose to connect to media, product and services companies directly – choosing to proactively share intent information and personal information about themselves with a direct link – just as vendors manage their relationships with customers using CRM tools. The vision is that individuals would manage their relationship with Vendors using VRM tools – meeting in the middle and creating new value for both.
On the one hand, we have industry voices calling for the current personal data ecosystem to be left as it is. While there is continuing technological innovation among the proponents of maintaining the regulatory status quo. People are being stalked more and more effectively and more ads are being served to them while they view media.
On the other side, advocacy groups are pushing for legislation that would prohibit tracking users who ask not to be tracked. However, making the choice to go this direction means that all the value in the personal data is lost because it is never collected.
The Personal Data Ecosystem offers a middle path and a better future where everyone wins, and where users’ interests are balanced with those of industry.
Industry is controlled by regulations that don’t permit them to operate without notifying users that they are collecting and only collecting with users’ permission. This ecosystem gives them a way to get very good data with permission.
Users get tools to allow them to collect personal data themselves and then share what they want.
Innovators build for this ecosystem and basically make it worth users’ while to participate by enticing them with exceptional value for sharing their personal data with trusted partners.
I and others in industry are forming a Collaborative Consortium for the Personal Data Ecosystem supporting the very different industries and stakeholder groups to collaborate and innovate. A center of cooperation what will be a very competitive marketplace for services.
- We have begun hosting conversations amongst Personal Data Store Providers about interoperability (contact me if you would like to know more & get involved).
- We have an aggregate blog of leading thinkers and companies active in this space. I am co-hosting a podcast covering this emerging market with Aldo Castañeda
- We are collaboratively documenting the emerging field in a structured wiki, including people, projects, companies, standards, publications, videos and events.
- We are currently raising funds for our first major project to do a Value Network Map and Analysis of both the current data ecosystem and the emerging ecosystem market model where individuals collect aggregate and control data in their own data stores.
We are gathering at a range of events including:
- Identity Collaboration Day in San Francisco February 14th
- STL Partners New Digital Economics Conference in April – Deep dive on Igniting the Personal Dat Ecosystem on April 7 day three of their conference.
- The 12th Internet Identity Workshop in May 3-5 in Mountain View.
For more information about how you can get involved can see visit our website: personaldataecosystem.org.