Love Your Data Week: Data Rescue

This week is “Love Your Data Week“. The event organizers hope it will raise awareness for the need to better curate research data in order to encourage more collaboration, transparency, and reproducibility.

However in the US, “Love your data week” comes during a major political crisis that threatens all of our data. Already, the Trump administration has altered (redacted) educational and scientific information related to climate change.

Motivated by this threat, a grassroots “Data Rescue” movement has quickly organized researchers, librarians, software developers, and the interested public. This movement is in a race against time to find, retrieve and archive threatened Federal information before it gets corrupted or destroyed.

Much of the Data Rescue efforts have understandably focused on climate change and other environmental data. However, these represent the tip of the iceberg in terms of need. The Federal government also creates research and educational information relevant to many other social, cultural and historical topics.

 

Data Rescue for Culture, History, and Social Sciences

For past several weeks, our team here at Open Context has run Web crawlers and other software to archive some of the “long tail” of Federal information. For example, we’ve directed much of our focus on Web resources created by the National Park Service (NPS). It is through the national parks that many Americans (and international visitors) learn about America’s rich and diverse natural and cultural history. The NPS provides vital educational information describing and documenting that history, including the information about the experiences of historically-underrepresented communities.

We worked under the guidance of Jolene Smith and Kate Ellenberger, both experts in public archaeology and history. They prioritized and documented lists of Web resources likely to be threatened by the new Administration. We used these lists to seed a “quick and dirty” (this is not a type of software we have much experience with) Web crawler that downloaded Web pages, submitted pages to the Internet Archive’s Wayback Machine (the world’s leading repository of Websites), and then repeated the process with new links discovered in the archived pages. Kate and Jolene also manually downloaded hundreds of resources that the Wayback Machine could not reach.

We’ve been running multiple machines day and night and have successfully archived thousands of Web resources from the National Park Service and other Federal Agencies. Here are just two examples that we saved:

 

Civil Society and Protecting Knowledge

This weekend, we started to scale up and more broadly coordinate our Data Rescue efforts by participating in the Data Rescue- San Francisco Bay Area event hosted at UC Berkeley. Here’s a picture of a room full of software developers that coded crawlers to archive a broad array of data from the Department of Energy, NASA, and other agencies.

Software developers at UC Berkeley (School of Information, South Hall) coding to crawl and archive federal databases.

The picture illustrates a tremendous groundswell of coding talent volunteering their time and software expertise to saving our nation’s scientific knowledge. A committed and engaged public truly “loves their data” and recognizes how data plays a key role in understanding.

A Call for Support: We Need Human Expertise

Our experience with Data Rescue highlights the key role role that civil society must play in order to ensure the long term survival of knowledge in a digital world. These volunteer coders would not have accomplished much if nonprofit institutions like university libraries and the Internet Archive did not exist. Similarly, our work with Open Context and other Alexandria Archive Institute projects has helped prepare us with the skills, professional networks, and capacity needed to quickly respond to this crisis.

Our job now is to expand our capacity, and especially our capacity to prioritize and document the content that needs to be saved. Jolene Smith and Kate Ellenberger demonstrated the clear need for human expertise to more effectively direct our Data Rescue efforts. Learning from this, we need to hire human experts, particularly graduate students and other researchers and educators with deep domain knowledge about the US government’s role in educating the public about US history, archaeology and culture.

 

Update Note:

We updated this post after receiving permission from Kate Ellenberger to publicly acknowledge and recognize her for her tremendous efforts and guidance.

2 thoughts on “Love Your Data Week: Data Rescue

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*
Website