Gathering, transforming and publishing open data

Open data is a huge resource which can make life easier for many people. When data is opened up, the owners of the dataset may no longer need to answer questions about the data because journalists and members of the public can find the answers to their queries themselves. Businesses and entrepreneurs can also use the data to develop commercial applications while the public can gain access to a whole host of information.

Open government data also significantly increases the transparency and accountability of the public sector organisations who generate the data.

However, to publish open data in the first place there is a whole process which needs to be undertaken:

  • Data needs to be collected together, possibly from a variety of sources
  • Standards for publication need to be defined
  • The data needs to be transformed to comply with those standards and into a format that can be easily used
  • The data needs to be published in a central resource where people know to look for it

Publishing open data can be as simple as putting a pdf of a dataset online such that people can download and look at the data. However this does not allow anyone else to use the data and develop tools which show the data in different ways – at least not without a huge effort to recreate the dataset in a reusable format.

In an ideal world, all open data would be published as open linked data, meaning like datasets from different sources could be merged and compared.

In between these two options are a number of ‘half way’ stages which represent improvements over the basic pdf approach, but do not provide fully linked data. These stages can be identified in the accepted five star model. Each of the start stages is described in the table below.