The White House announced to great fan fare that it would be releasing data feeds of its public information collected by hundreds of agencies within the US government. They have launched the first iteration of this service.
My interest in this feed is to use this data to build mash-ups, visualization applications, and web applications that use this data. There are many such applications that use existing data feeds such as Amazon data, Google Maps data, Microsoft Live data, etc. in interesting ways. Amazon in particular has had an XML based REST API available for years (in fact Amazon sells a book on how to use it) For example, Frucall provides a mobile comparison application that uses a variety of publically available APIs to compare products across retailers and present them to your mobile device.
So with hundreds of data feeds, imagine the visualization, comparison, calculation and searching applications that could be built on US government data. As well, imagine then directing these applications to mobile devices, XBOX 360, IPhones, Blackberries, etc. in formats that work ideally for those platforms.
Unfortunately, based on what has been launched, this vision is not really feasible. Here is what I can see from what is currently available:
- There are only about 50 data feeds. Given the number of US agencies, this is a very small sample of the data available.
- The site provides data in a variety of formats such as XML, CSV, KML (google earth) and ESRI (GIS data). The feed that would be useful for apps is the XML feed, as explained by their own site (“Better suited for consumption by automated applications capable of handling raw XML files”). There are only 8 feeds in XML format available.
- The data is not a feed, they are files in most cases. If you click on the links, you get either a zip file containing some XML data or a site with some links to XML files.
- In some cases, the XML files are RSS feeds. While this might be great if you want content, the “data” is really just unstructured content.
- There are no APIs, no REST and each feed has a completely different format. There is zero consistency even for things like unique identifiers, delivery format, or XML structure. There are also generally no available XML schemas – you just get raw XML.
- Its not clear how data is published in a timely fashion. In most cases, you get a directory of files but there is no information on how often they are published.
So while its great to see the information being public, the US government has a long way to go before they have a useful API that could be used for application development purposes. Given that Googgle. Expedia, Amazon, etc. have had these APIs for many years, its a lot of fanfare and not a lot delivered for this service debut – dumping a bunch of files onto a directory was a victory in 1999, not in 2009.