Please enable Javascript for full use of this site
Discussion Ends in
| 8119 views
· , AI Professor & Advisor to the Data.gov project

I was puzzling over how I wanted to respond until I saw the blog in the Guardian - http://www.guardian.co.uk/news/datablog/2011/apr/05/data-gov-crisis-obama - which also reflects this flat line as a failure, and poses, by contrast, the number of hits the Guardian.com website gets. This is such a massive apples vs. oranges error that I figure I should start there.

So, primarily, let's think about what visits to a web page are about -- for the Guardian, they are lots of people coming to read the different articles each day. However, for data.gov, there isn't lot of repeat traffic - the data feeds are updated on a relatively slow basis, and once you've downloaded some, you don't have to go back for weeks or months until the next update. Further, for some of the rapidly changing data, like the earthquake data, there are RSS feeds so once setup, one doesn't return to the site. So my question is, are we looking at the right number?

In fact, the answer is no -- if you want to see the real use of data.gov, take a look at the chart at http://www.data.gov/metric/visitorstats/monthlyredirecttrend -- the number of total downloads of dataset since 2009 is well over 1,000,000 and in February of this year (the most recent data available) there were over 100,000 downloads -- so the 10k number appears to be tracking the wrong thing - the data is being downloaded and that implies it is being used!!

Could we do better? Yes, very much so. Here's things I'm interested in seeing (and working with the data.gov team to make available)

1 - Searching for data on the site is tough -- keyword search is not a good way to look for data (for lots of reasons) and thus we need better ways - doing this really well is a research task I've got some PhD students working on, but doing better than is there requires some better metadata and approach. There is already work afoot at data.gov (assuming funding continues) to improve this significantly.

2 - Tools for using the data, and particularly for mashing it up, need to be more easily used and more widely available. My group makes a lot of info and tools available at http://logd.tw.rpi.edu - but a lot more is needed. This is where the developer community could really help.

3 - Tools to support community efforts (see the comment by Danielle Gould to this effect) are crucial - she says it better than I can so go read that.

4- there are efforts by data.gov to create communities - these are hard to get going, but could be a great value in the long run. I suggest people look to these at the data.gov communities site, and think about how they could be improved to bring more use - I know the data.gov leadership team would love to get some good comments about that.

5 - We need to find ways to turn the data release into a "conversation" between government and users. I have discussed this with Vivek Kundra numerous times and he is a strong proponent (and we have thought about writing a paper on the subject if time ever allows). The British data.gov.uk site has some interesting ideas along this line, based on open streetmap and similar projects, but I think one could do better. This is the real opportunity for "government 2.0" - a chance for citizens to comment just on legislation, but to help make sure the data that informs the policy decisions is the best it can be.

So, to summarize, there are things we can do to improve things, many of which are getting done. However, the numbers in the graph above are misleading, and don't really reflect the true usage of data.gov per se, let alone the other sites and sites like the LOGD site I mention above which are powered by data.gov.
· 1 edit

about 1 year ago

· , Founder of Food + Tech Connect

I would make data.gov and the community of practice forums more portable to allow citizens to access the database from other, personally relevant sites through widgets or iFrame codes. I would also enlist media companies (tech in particular), startup incubators, and venture funds to help access how many of the startups they cover / fund are using data sets made available on data.gov. I have been fortunate to work with some wonderful open data evangelists at the USDA on helping them identify and connect with startups using the Farmers Market data set released last year. Through media and events, I've helped to curate opportunities for them to learn about how people are using their data and what additional data sets people would like to see made available. Lastly, I think there needs to be a paradigm shift focusing on the internal value the government gains from opening up data sets.
· 1 edit

about 1 year ago


Each app needs a profile page more like kickstarter where people can follow the progress and make suggestions -- not just rate. It should also list the "ideas" that people have which would include that dataset to get people to vote-up (like uservoice). (Similar to @jake #2, ex: "show farmers markets with local honey").

Most of the data is boring, and presented in a boring way. I'm not surprised it only gets used once.

From a purely SEO perspective, each dataset should list the apps that use it (cross reference).
See: http://www.data.gov/raw/4383 / http://www.data.gov/tools/4034

Such a system would allow the agency, if properly funded, to run competitions to build the features people are requesting and let the market / end user decide which is the most useful. It would also help prioritize which new datasets to collect.

Side note: compete.com isn't a great way to validate the usefulness / usage of a data provider (see infochimps.com as comparable), especially since most access will be direct to developer (data dump) or direct to end App (API call). See: http://www.data.gov/metric/visitorstats/monthvisitorstats

@jake #1 http://www.data.gov/suggestdataset
· 1 edit

about 1 year ago

· , data.gov.uk

So on data.gov.uk we do something like this (e.g. see http://data.gov.uk/apps/where-does-my-money-go which links through to http://data.gov.uk/dataset/coins the dataset on which it's based).

I like the idea of embedding or linking apps' description pages through to something like challenge.gov with bounties both for full systems/tools but also for additional features/etc. Of course the code/etc. would need to be on github or wherever for the new feature to come, and deploying would then be the bottle neck. It's also a bit awkward to see where people's thoughts for new apps would go (on data.gov.uk we have "ideas" pages for new apps, but we'd want to bring the new-app and newly-extended-app community functionality together if we were to do this).

Also, at least for data.gov.uk (I won't speak for the US Administration ;-)) we're mostly about other people making great apps and tools, rather than us doing it ourselves. Because of this, getting directly involved might be a bit too much like us muscling into the community.

On data.gov.uk we currently have a general comments thread, but it isn't explicitly focussed on improvements/changes to the app (thought it's sometimes used for that). I guess we could split the comments into two types, one of which gets voting up/down (which is certainly a good way of gauging a quick community opinion), but I'm not sure that smells quite right as a user experience...

There are also plenty of other great (and similarly underused) sources of data.

http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public
· , PhD candidate at U-Md & WV

I think the initial question is some what flawed. Its based on a couple of assumptions. The first assumption being that you can accurately measure a site like data.gov by site traffic and that by using that measurement data.gov is doing poorly. The second assumption is that data.gov traffic is currently flatlining.

Assumption #1
A graph like this needs some context. What site traffic numbers were people expecting? In context with other government sites, data.gov traffic doesn't seem that bad. Even the most popular US gov sites rarely top 2 million unique visitors (see usa.gov) with most gov sites averaging far less traffic than that with many more averaging far less traffic than a site like data.gov. Even in comparison to other websites that serve as a data market place or a data commons (i.e. infochimps) data.gov traffic fares well in comparison. Finally I'm not even sure that site traffic is an effective or appropriate way of measuring a site like data.gov (see other comments)

Assumption #2
The site traffic graph of data.gov attached only displays the most recent year's worth of data. If you were to extend the graph back farther in time you would see a much different picture. You would see that data.gov averages the same amount of traffic more or less for the last couple of years with the exception being a sudden spike in early 2010 (see http://bit.ly/iiIuSw ). The very tail end of this spike is being displayed at the begin of the graph that is provided. If you look at the longer picture you will see that traffic didn't suddenly drop off or "flatline". Site traffic has remained fairly constant minus that sudden spike.

Ignoring those flaws I still believe that the proposed question being asked is a valid one. I think the real problem is that we still as a community have yet to have a robust conversation around who the actual or potential users of open government data are. Who are the expected users for data.gov? Citizens? journalists? researchers? educators? transparency advocates? commercial entities? entrepreneurs? government? Why are those specific groups of users participating or not participating? and are open government data sites like data.gov meeting their actual needs. As a community when talking about open government data we tend to lump all users into a single category. Different user groups will have different interest in open government data and will have different barriers to using it. So I think the next step is for us to "unpack" the word user and have a serious conversation about who the intended users for open government data are and what their needs/issues might be?

We also have to have realistic expectations for the users of open government data. Open government data sites like data.gov have several barriers to entry. In addition to time and interest, using raw data requires tools and skills (data literacy, statistical literacy, access to and understanding of software to process this raw data, etc). Most of these skills are well out of the reach of the average citizen. Without changing or improving these conditions I think that open government data sites like data.gov are not directly suitable for the average citizen.

Of course we can talk about changing the interface, adding different data, improving existing metadata, providing different formats, etc but without understanding who the users are and what their needs might be I think we can't really hope to successfully increase interest in data.gov.

We also need better metrics....
· 1 edit

about 1 year ago

· , Open Source Strategy Lead at Microsoft

Today much of Gov 2.0 and social media initiatives like data.gov are a 1-way effort. The real change will come when the governments adopt an open gov platform at the *core* of the government business, and have processes to not only communicate at and share data at citizens, but a way to re-capture data & knowledge, and re-integrate it to complete the cycle.

The reason why data.gov is having budget challenges is because it's not being seen as mission-critical, not at the core of government's business. As a first step of dumping data into a dataset, or even exposing it as an API it's only going to take us this far; what will make a real difference is the re-engineering gov IT infrastructure to support "open data catalogue" data feeds and workflows/processes to capture feedback, meta-data & other data supplied back by citizens.

I blogged about it today here: http://www.port25.ca/2011/04/04/beyond-transparency/
· 1 edit

about 1 year ago

· , Gojee CEO

Site traffic shouldn't determine funding or non-funding. Applications built with the data should be a more significant driver. The NYCBigApps (http://nycbigapps.com/ ) competition is an excellent example of an ecosystem spawning around open government data. We should run hackathons in every major tech hub in the U.S. based on this data as a starting point.
· 1 edit

about 1 year ago


This response is spot on.

Lets start by hacking the creation process and relational database structure of IDentity in order to restructure citizenship as a self-possessed asset class rather than a social liability with an ever increasing cost factor. Lets then hack state corporation commissions and align default Human IDentity definition with corporate ID definition practices to transform every IDentity into a new corporate asset class on universal Terms. Once that is done we can hack the tax code to optimize efficiencies of individual corporate tax payers. And finally we can hack the Constitution to guarantee ownership of IDentity as a pre-citizenship event and base construct that precedes the establishment of any other socio-economic infrastructure in the Universe for all time.

On that foundation, open data matters...

I would eliminate the idea of 'open data' altogether, and change it's structure to make citizenship status meaningful as an asset. As currently defined, citizen liabilities do not care any more for themselves or the data they should be leveraging in pursuit of valuable outcomes than a slave cares about the output of their plantation. Why should they when they are structured as IDentity-slaves?

Open Data as presently conceived is a trojan horse and a fool's errand.

Stakeholder ownership must be defined first. IDentity ownership is more important than open data. Reconstituting the structure of our government and administrative bureaucracy to create asset value from our Individual citizens matters more than open access to the data that these social liabilities are part of creating.
· 1 edit

about 1 year ago

· , Founder, Realist Idealist Labs

1) Allow for developers/journalists/advocacy groups to specifically request a data set they want released and organize around it. I.e. make it possible to create a much more public FOIA request.

2) Related to the aforementioned: List even the data sets which aren't yet available with a permalink and and "vote" or "request" this dataset button so that the public can help make datasets that are of more value to them a priority.

3) Moratorium on shape files.
· 1 edit

about 1 year ago

· , Founder, Realist Idealist Labs

...another way to say this is, make Data.gov more community-oriented - and I don't mean by trying to have discussion boards on the site, which aren't very vibrant by any stretch.

Right now there is no reason for me to share or point people to anything that's in Data.gov. If I could request a dataset, get 200 of my friends to say that they want it to (and simultaneously describe why we want it or what we would do with it), that creates the social feedback loop necessary to help OMB and the agencies prioritize what to focus on - as well as bring real value to people who want to use gov data.

I think they could've made the data seem more like a cohesive set. For instance, a total stats leader board of all datasets. Also then perhaps a large directed graph that showed the complexity and size and relevancy of all the data. I think it used to be you couldn't quite grasp a large amount of data, because what could you do except make a big web site with general to specific breadcrumb like browsing.

However, there are products like those coming out of the SIMILE project, and ways to visualize large graphs, but they didn't have this. With faceted lists, you really can find that needle in a haystack (pun perhaps intended with the MIT project Haystack).

I think what they needed were data scientists but all they could find were web designers, which is maybe fair given that the field as a whole isn't a commodity just yet.
· 2 edits

about 1 year ago

I think they could've made the data seem more like a cohesive set. For instance, a total stats of all datasets. Also then a large directed graph that showed the complexity and size and relevancy of all the data. I think it used to be you couldn't quite grasp a large amount of data, because what could you do except make a big web site with general to specific breadcrumb like browsing. However, there are products like those coming out of the SIMILE project, and ways to visualize large graphs, but they didn't have this. With faceted lists, you really can find that needle in a haystack (pun perhaps intended with the MIT project Haystack). I think what they needed were data scientists but all they could find were web designers, which is maybe fair given that the field as a whole isn't a commodity just yet.
about 1 year ago


Yes, Data can be translated into the Information that Citizens care about.

But the current approach resembles the proverbial "hammer looking for a nail", in that it started with Data and THEN tries to convince (sell) Citizens about its value to them.

The leaders in the White House's "Open Government" program should have shown how to act in the "new way", i.e., by being the Better Listeners who ask Citizens (and not guess) what types of government information were the most valuable to THEM!.

And then, working backwards, find out what needed to happen (re: data) to provide that information. Instead, the approach was to release data and THEN hope that someone would make information that Citizens would find useful (similar to the "build it and they will come" wishful thinking).

After all, who is the best judge of a "High-Value data-set": (1) a federal agency, (2) OMB, or (3) the Citizens themselves?

Oh, wait a minute. This is Washington, D.C. They're in charge so they think they know better than we do. Even when it comes to implementing a program (OpenGov) that is supposed to promote open-mindedness and humility!

We'll know there's a new wind blowing in OpenGov when more of its so-called leaders tone down their "happy-talk" and show us that they are engaging with, by really listening to, the End-users (who, BTW, are not the data-crunchers).
· 1 edit

about 1 year ago


I know very little of the politics of Data.gov, but turned to them for two light-weight data projects now, I can say the following:

1) Data needs to have *much* better metadata. What we get right now are just dumps which frequently need to be cleaned, have odd column headers, and usually require sifting through several before getting to what is actually needed. The description pages/titles themselves are confusing and full of technical parlance.
2) Data should be live. For freedom of press/speech reasons, there should always be a method to just dump the data, but to be truly useful, we need a way to keep the live data in a centralized repository – data.gov – that can be polled via API.
3) Start simple, then go complex. While it's nice that I can find the EPA toxic reports from 1999 in a 25MB csv file, I think most of us want less esoteric data. For instance – a list of zip codes by county by state, or registered business by city, or counties by the number of Federal dollars they get. If you start with the data most people care about, and present it first, you'll likely see much more use.
· 1 edit

about 1 year ago


Government cultures are usually attempting to increase value, improve careers, and expand services. Sounds reasonable enough until executed over long periods, during which time organizations tend to experience all kinds of challenges, from apathy to fiscal ruin and lack of innovation. This project was a breath of fresh air for many of us, but for the seasoned vets in gov and those who observe closely, it had problems from inception.

As my old friends in gov remind me often, the agencies control the budgets, and the WH CIO and CTO are (in my words) more like leaders on the playground than the school board, or even teachers. Without budget authority in the federal government, one can do very little, and seasoned managers don't pay much attention frankly. In knowledge systems, which is where semantics really came from-- culturally in research and to some extent funding, we discovered the hard way that in the U.S., cross agency efforts must first be on the WH agenda. We did manage to get a rational KS on the WH agenda for the first time post Katrina, but nothing happened. With the new admin we then saw the MUCH SMALLER data.gov effort that fails to address organizational issues, without which we don't even begin to address the underlying challenge. This is a far more complex issue than just data.gov

If it were possible to sep data.gov from the fed gov't structural challenges, and it's not, then I would recommend a couple of things, but first some context in casual format.

This project like many others was sold as a government efficiency program. Typically, however, government was not reduced due to efficiencies gained, but rather expanded, and therefore lost credibility with those they serve-- the generic tax payer who collectively owns the data, so immediately we are back to the core fundamental issue of the role of government, and who is in control-- who works for whom.

The data created and produced by federal agencies, with the exception of course of any that cannot be made public due to legitimate safety and/or security concerns, is fundamental to the very mission of government-- its reason for being. Most forget that a government job is not an inalienable right, but is rather a service dependent upon the whim of the people in a democracy, their representatives, and more important of all-- the economic ability that empowers choices. Only in a very misguided culture of power and turf battles that result in crisis creating silos can it be viewed that government employees have power over citizens relating to public data, but that is the culture in many of the 400+ entities I have engaged with. Most of us have met true servants in government attempting to get things done and transform -- heroes in my book, even if often martyrs.

Traffic, or popularity, has absolutely nothing to do with the value or business case, unless one is selling advertising.

The solution is obvious-- make the data available, preferably as the fundamental mission of the agencies within existing budgets, and allow the private sector to create value from it. In our existing regulatory structure, that leadership must come from the WH.
· 2 edits

about 1 year ago

Government cultures are usually attempting to increase value, improve careers, and expand services. Sounds reasonable enough until executed over long periods, during which time organizations tend to experience all kinds of challenges, from apathy to fiscal ruin and lack of innovation. This project was a breath of fresh air for many of us, but for the seasoned vets in gov and those who observe closely, it had problems from inception. As my old friends in gov remind me often, the agencies control the budgets, and the WH CIO and CTO are (in my words) more like leaders on the playground than the school board, or even teachers. Without budget authority in the federal government, one can do very little, and seasoned managers don't pay much attention frankly. In knowledge systems, which is where semantics really came from-- culturally in research and to some extent funding, we discovered the hard way that in the U.S., cross agency efforts must first be on the WH agenda. We did manage to get a rational KS on the WH agenda for the first time post Katrina, but nothing happened. With the new admin we then saw the MUCH SMALLER data.gov effort that fails to address organizational issues, without which we don't even begin to address the underlying challenge. This is a far more complex issue than just data.gov If it were possible to sep data.gov from the fed gov't structural challenges, and it's not, then I would recommend a couple of things, but first some context in casual format. This project like many others was sold as a government efficiency program. Typically, however, government was not reduced due to efficiencies gained, but rather expanded, and therefore lost credibility with those they serve-- the generic tax payer who collectively owns the data, so immediately we are back to the core fundamental issue of the role of government, and who is in control-- who works for whom. The data created and produced by federal agencies, with the exception of course of any that cannot be made public due to legitimate safety and/or safety concerns, is fundamental to the very mission of government-- its reason for being. Most forget that a government job is not an inalienable right, but is rather a service dependent upon the whim of the people in a democracy, their representatives, and more important of all-- the economic ability that empowers choices. Only in a very misguided culture of power and turf battles that result in crisis creating silos can it be viewed that government employees have power over citizens relating to public data, but that is the culture in many of the 400+ entities I have engaged with. Most of us have met true servants in government attempting to get things done and transform -- heroes in my book, even if often martyrs. Traffic, or popularity, has absolutely nothing to do with the value or business case, unless one is selling advertising. The solution is obvious-- make the data available, preferably as the fundamental mission of the agencies within existing budgets, and allow the private sector to create value from it. In our existing regulatory structure, that leadership must come from the WH.
about 1 year ago


Sixteen High School Seniors Suggest Improvements to Data.gov
http://docs.google.com/document/d/1fuwUslfDyzlL9-r_CgXifIpBQURrsgC0SK5D6JK8Y44/edit?hl=en
· 1 edit

about 1 year ago


Increased community building and outreach - It has started but they need more community building and sustainable community building. Lots of stakeholders to reach (from journalists to developers to businesses to internal agencies to citizens) and lots of types of outreach needed (ongoing engagement, app development, design contests, etc)

Like most things I don't think it's a technology problem but an outreach/marketing/engagement problem ( James/Danielle hit on a lot this as well)
· 1 edit

about 1 year ago


Simply improve discoverability of benefits via more practical usecase demonstrations that loosely couple information (visualization oriented reports) from actual data sources. This pattern is exemplified by these example links:

1. http://semanticommunity.info/Build_Recovery.gov_in_the_Cloud
2. http://www.delicious.com/kidehen/meshup+logd_demo -- various Linked Data demos that levarage Data.Gov and other Govt sources .
· 1 edit

about 1 year ago


I'd eliminate the distinction between online data and offline data. If the government creates data that it considers "public," it should be published online in a structured format—full stop.
· 1 edit

about 1 year ago


1) Provide context: Present dataset as possible answers to big problems and let people discuss findings on the site itself (or, at the very least, curate/aggregate conversation around the issue).
2) Develop incentives for developers to use the data. Incentives like leaderboard are nice, money is better.

· 1 edit

about 1 year ago


Put it all in CouchDB.
· 1 edit

about 1 year ago


Simplify it. Make it a minimally designed page featuring well-considered and descriptive tags which don't require much navigation effort for the people interested in the material.
· 1 edit

about 1 year ago