Data is currently in the spotlight, be it open data or big data. It is important to consider these two types separately, even if they share a common object: data.
Big data focuses primarily on the possibilities offered by exploiting a volume of data in exponential growth. Whereas with open data, the creation of value depends on the ability to share data, to make it available to third parties, rather than on sheer volume. Open data responds to a set of technical, economic and legal criteria: it must be freely available online, under a format that allows re-use.
The term open data appeared for the first time in 1995, in a document from an American scientific agency. It dealt about the disclosure of geophysical and environmental data. To quote the authors of the report: “Our atmosphere, oceans and biosphere form an integrated whole that transcends borders.” They promote a complete and open exchange of scientific information between different countries, a prerequisite for the analysis and understanding of these global phenomena.
The idea of common good applied to knowledge had already been theorized, well before the invention of the Internet. Robert King Merton was one of the fathers of sociology of science. The theory that bears his name shows the benefits of open scientific data. As early as 1942, Merton explained the importance that the results of research should be freely accessible to all. Each researcher must contribute to the “common pot” and give up intellectual property rights to allow knowledge to move forward.
Information technologies have also given a new breath to this philosophy of commons. In her research, the 2009 Nobel Prize of Economics Elinor Orstrom showed the specificity of information commons. They are very similar to public goods, because their use by one person does not impede their use by others. However, these are public goods of a new kind: not only their use doesn’t deplete the common stock, but it enriches it.
Long before being a technical object or political movement, open data was rooted in the praxis of the scientific community. Researchers were the first who perceived the benefit of openness and of sharing of data.
But it is the encounter between this scientific idea and the ideals of free software and open source that shaped open data as we know it today.
In December 2007, thirty thinkers and activists of the Internet held a meeting in Sebastopol, north of San Francisco. Their aim was to define the concept of open public data and have it adopted by the US presidential candidates.
Among them, were two well-known figures: Tim O’Reilly and Lawrence Lessig. The first is familiar to the techies: this American author and editor is the originator of many vanguard computer and Internet movements; he defined and popularized expressions such as the open source and Web 2.0. Lawrence Lessig, Professor of Law at Stanford University (California), is the founder of Creative Commons licenses, based on the idea of copyleft and free dissemination of knowledge.
Participants of the Sebastopol meeting mostly come from the free software and culture movements. These movements are at the heart of many innovations in the field of computers and the Internet over the last fifteen years. Some of these innovations are now familiar – think of the collaborative encyclopedia Wikipedia. Other open source creations are less known to the general public despite playing a fundamental role in online services: for instance, the Apache software for the servers is used to host most websites.
Some activists and entrepreneurs who already used public data were attending the Sebastopol meeting too: Adrian Holovaty (the founder of EveryBlock, a localized information service) and Briton Tom Steinberg (initiator of the FixMyStreet site). One of the youngest of the group was no other than the late Aaron Swartz, inventor of the RSS and free knowledge activist. Together, they created the principles that allow us today to define and evaluate open public data.
These principles enclose both a basic idea and the means to achieve it. The basic idea is that public data are a common property, in the same way as scientific ideas. The means to achieve this idea concerned primarily the sharing and use of this common good. They are directly inspired by the approach and practice of open source, built on three concepts: openness, participation and collaboration. Free software development was the first field to experiment with this culture. Each programmer who collaborates is invited to do so through public platforms that share source codes. He can learn from the work of others, but in exchange, he must republish his production: this is how a collective expertise is created. Work between developers is based on a peer-to-peer collaboration model (based on competence and reputation) rather than on formal hierarchy rules.
In Sebastopol, Tim O’Reilly’s contribution on open government shed a new light on the relation between the open source movement and the emerging principles of open data: in his own words, we must apply the principles of open source and its working methods to public affairs.
In 2007, it sounded like a dream. But the result has exceeded by far their expectations. A little over a year later, President Barack Obama took office in the White House and signed three presidential memoranda. Two of them concern open government, of which open data is one of the pillars. These presidential memos explicitly set the culture of open source at the heart of public action by claiming its founding principles: transparency, participation and collaboration.
Openness is a concept common to open source, open government and open data. It is both a philosophy of action and a profession of faith, a practice and a goal. But its application to the field of public data is not obvious.
Today, public data is defined by law. It concerns public and private players involved in the context of a public service mission. Governments and communities are not the only ones involved: a transport company operating subway lines on behalf of a community is equally important.
Laws such as the FOIA (Freedom of Information Act) regulate the access rights and reuse rights concerning these data. In France, for example, the 1978 CADA law (Commission on Access to Administrative Documents) ensures the access to data and specifies the conditions for exercising these rights.
The open data movement attempts a complete reversal of logic: by default, public data and information must be published online ─ before even being claimed by third parties.
This shift to a default open mode represents in itself an important cultural change of paradigm for most public and private entities. “Letting go” is one of the main traits of open innovation. It isn’t a natural reflex in many organizations because they are concerned about the use ─ or rather misuse ─ that can be made of open data. Isn’t there a risk of denaturing them by interpreting them? Some data are difficult to grasp if you don’t know the context of their primary use: isn’t there a risk to open them?
Open data is also the fruit of its time, with increasingly urgent imperatives of transparency and accountability. It is interesting to note that the players that are most affected by this crisis of confidence are also those who are most sensitive to open data: politicians and public bodies, companies in the field of energy, environment, transport, banking... Transparency is perceived as a response to a period of mistrust, or distrust, towards institutions and their representatives.
Openness is a request order sent to data owners, both public and private. On the stage of the 2009 TED conference, Sir Tim Berners-Lee, the inventor of the hypertext links and ultimately, of the Web, made the following rallying cry (for the supporters of data Web) or war cry (for data holders): “We want raw data, now!” (video: Tim Berners-Lee, TED, 2009)
Open data responds to various political and economic issues. Democratic gains are expected from data openness (better transparency of public action, citizen participation, response to the crisis of confidence towards politicians and institutions), but also the creation of economic value through the development of new activities based on open data.
As we can see, the concept of open data is as heterogeneous in its origins as in its goals. This diversity of aims is sometimes a problem, when it comes for example to choosing which open data will be disclosed to re-users.
A file that includes bus timetables, for example, has a high reuse potential and allows the development of many useful services (if not immediately profitable). However, it isn’t easy to demonstrate the democratic benefit of opening such data. In contrast, many governments and communities publish open data concerning their budget. This is a positive transparency effort, for sure, but the re-use of such material may not be as easy in practice: public finances are a complex subject to handle.
After the early days of discovery and pioneering initiatives, open data now faces a double challenge, both in terms of demand and supply.
The availability of data is still under construction: most owners have granted priority to the sets of data that were easiest to open (technically, legally and politically). Data that were perceived as sensitive, or those with a higher social or societal impact, remain largely outside the scope of open data.
In response, conflicts regarding closed data are increasing in many areas and we see the emergence of sector-specific initiatives in favor of openness.
For instance, in France, the “Transparence Santé” (Health Transparency) collective unites a patchwork of players: patients and consumers, researchers and academics, but also corporate players... Their request to the primary health insurance fund is to open all health data – anonymously, of course. This initiative is a pioneering new step: open data is not a goal in itself; it becomes a means for a cause with broader interests.
Beyond the difficulty of data availability, there is also the issue of interoperability. Even if they are available in several territories, data are rarely comparable from one city to another. This doesn’t make it easy for the development of large-scale services. The development of common reference sources is already underway, including in the field of mobility and transport.
From the point of view of demand, the situation is also quite heterogeneous. The first open data re-users were mobile application developers, encouraged by many dedicated competitions (such as NYC Big Apps at New York) and the important coverage that was given to open data initiatives in geek technophile communities.
Ownership by the largest number runs quickly into the difficult question of the culture of data. Many skills from various sources are required: identifying data sources, being able of processing and handling them, exercising a critical view on the conditions of their production and of their opening, the control of basic statistical concepts... The traditional mediators, including the press and specialized associations, are beginning to make out the benefits of open data.
Less than six years after the Sebastopol meeting, the idea of open data has clearly made its way into society and its speed of dissemination continues to challenge us. However, the ambition of open data has also changed: the goal isn’t anymore to change the world through data but to try, more modestly, to modernize public action.
Open data is no longer the transparency tool that its promoters had imagined. Lawrence Lessig dissociated himself as early as in 2009 from the idea of radical transparency. Other participants of the Sebastopol meeting showed their support to WikiLeaks, whose praxis is significantly different from that of open data! Researcher Beth Noveck, who took part in the creation of the open data policies of the first Obama administration, has also expressed doubts concerning the ability of open data to improve per se the governance of public affairs. Open data is not enough to open organizations and promote new practices of more transparent and open governance.
Nevertheless, this change of ambition does mean that public players are completely disinterested by data and their potential – quite the opposite. As a sign of the times, the New York mayor has appointed the first Chief Analytics Officer of the city, a function rarely seen before outside Internet start-ups.