Home > data, Data center, Data.gov, Semantic web, Triplestores > Data.gov: 6.4 billion triples

Data.gov: 6.4 billion triples

Data geeks will be keen to learn that running a sample of data sets drawn from data.gov, a still-unfinished opus, through a tool developed by Rensselaer Polytechnic Institute, resulted in 6.4 billion triples. That would put data.gov fairly high up in the mid-pack of major triples databases, or triplestores.

(The most triples in a single season ever hit by a major league baseball player is 36, according to baseball-reference.com. That was by Pittsburgh Pirates right fielder John Owen “Chief” Wilson in 1912.)

Triples are linked pieces of data in what semantic web experts call subject-predicate-object format, searchable by SQL and other means, notably the Sparql query language — a piece of sematic web technology developed by the Worldwide Web Consortium. This link at W3 gives a fairly understandable explanation of what this all is, and why it is important.

Last week, on its first anniversary, data.gov received a major upgrade. The site is a signature initiative of federal CIO Vivek Kundra, but the work has been done by a team at the General Services Administration. The program director is James Rolfes. Marion Royal, the deputy program director, is the full-time leader on the project. At the Management of Change Conference earlier this week in Philadelphia, Royal accepted an Intergovernmental Solutions Award from ACT-IAC for the team. I don’t think I’m speaking out of school to say Royal, having worked with his team nights and weekend to make the anniversary update deadline, used conference attendance as a part of a short R&R break over the weekend.

For users of data.gov, the existence of the triples has two practical benefits: First, certain data searches will return results without the user having to download entire datasets. Second, it will ease the development of third-party applications using data.gov datasets.

Royal told me the presence of the triples puts data.gov into what data mavens and semantic web gurus consider Web 3.0 territory. “We’re not 100 percent there, but we’ve laid the foundation to build on,” Royal said. Here’s what data.gov says about itself.

Lots of other enhancements leaven the site, including:

  • Statistical information on which posted data sets were requested by some citizen (as opposed to by the posting agency). Analysis showed that some 200 of the 272,677 datasets there were citizen requested.
  • Visualization tool from ESRI for previewing geospatial datasets
  • More metrics, such as download statistics.

GSA will expand the hosting options for agencies who want to have their datasets available for downloading or searching from other than their own servers. It is negotiating now for a blanket purchase agreement, but there is no contract award just yet.

Communities are starting to develop around data.gov, both in and out of the federal government. Data points of contact are more formalized for 253 donor agencies, Royal said. They are starting to communicate among themselves.

It looks as if data.gov will fulfill its obvious mission of providing data sets to the public, although in that function it is unlikely to be a barnburner on the order of magnitude of YouTube or FaceBook. But it appears to also be developing into a laboratory for new data technology and usage, where people in-the-know can try things.

For many years, vertical slices of industry, such as federal IT contractors themselves, have been served by experts who have in-depth knowledge of a sliver of government information. Those experts take raw government data — which has always been available, at least in theory. With expertise and their own passion, they’ve added value to make into information useful to a specialized constituency. But those activities have always been stovepiped, the resulting information available for pay to vertical organizations with a need to know.

Data.gov is supposed to change all that, and, as Kundra has put it, “democratize” data. I was skeptical at first, but I think now they’re on to something.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: