The Office of Science and Technology Policy (OSTP) recently issued a Request for Information (RFI) welcoming comments and recommendations for ensuring long-term stewardship of, and broad public access to, digital data resulting from federally funded research.
Our main recommendations are below. We also provided answers to the specific questions listed in the RFI. The full document is on Google Docs here. We welcome the community’s feedback and participation on this document. Feedback is due January 12, so please feel free to chime in by then.
- Cultivate a distributed information ecosystem: Integration, synthesis, analysis, and visualization of scientific data can foster tremendous opportunities across the commercial, not-for-profit and academic sectors. Innovative approaches to information retrieval, search, aggregation and other applications of scientific data should be encouraged widely by many players. Agencies should foster an “open playing field” encouraging innovation in scientific data-management and fresh ideas to advance new workflows, organizational forms, and technologies. To cultivate an open playing field, agencies need to promote the free flow of scientific data across multiple platforms and applications using widely used open and non-proprietary standards and formats.
- Cultivate a robust preservation infrastructure: Qualified digital libraries and digital archives are needed to maintain the integrity and longevity of scientific data. But not every participant in science data sharing needs to be a repository. To encourage innovation and experimentation, “sustainability” should not be required of every dissemination, visualization, analysis or aggregation platform. Rather, sustainability efforts should focus on digital libraries and archives. Since our understanding of how to best preserve digital data continually evolves, policymakers need to encourage innovation and collaboration across a broad spectrum of public interest organizations, particularly libraries and museums dedicated to playing stewardship roles. Multiple models, approaches, and organizations should play a role in scientific data stewardship to encourage continual learning and innovation in data longevity practices.
- Encourage data professionalism: Federally-funded science both creates and reuses data. Scientific integrity requires proper publication (including documentation) of data, and proper attribution and sourcing of reused, reanalyzed datasets. Data publication (including various models of peer-review and disciplinary archiving) and citation practices need to be mandated for federally funded research.
- Require non-proprietary data: The purpose of public support of science is to expand human understanding, not to subsidize particular commercial publishing models. In general, primary scientific data should be as free from intellectual property and proprietary encumbrances as possible. Such encumbrances create legal risk and complexity that inhibit innovation around scientific data. Datasets should be in the public domain or under an open copyright license (such as the Creative Commons Attribution Licence) to widely encourage innovative approaches to data preservation and reuse.
- Data ethics: At the same time, the general need for minimized legal encumbrances should be balanced with data privacy and sensitivity issues. Human subjects research ethics, environmental and public health, and cultural property and indigenous rights needs require consideration. Defining ethical practices for data preservation, dissemination, and reuse will require broad-based, multi-stakeholder negotiations for different types of data in different scientific domains.