DINAA Poster Symposium Sneak Peek

DINAA-posterHere’s the first of several posters about the DINAA project that will be presented at the SAAs this week in Austin.

About the Poster: Yes, this poster is printed on fabric. With a tip from a colleague on Twitter, we discovered Spoonflower, a company that prints on fabric. What a result!! The fabric poster is on wrinkle-free material, the colors are accurate, and the printing is as sharp as if it were on paper. The poster folds up to the size of a wallet, so you can literally pack this thing inside a shoe in your luggage. And the best part about it is that it only cost $25 and arrived a week earlier than scheduled. Wow! Here is the blog post with simple instructions on how to make your own.

About the Research: The poster’s content is equally as exciting. The DINAA project publishes the most comprehensive record of settlement in North American spanning the Pleistocene through recent historical past. Site definitions and descriptions from project partner SHPOs are used as open government data to form a robust base layer of information. As of the spring 2014, our team has successfully integrated and published records created by state government officials documenting over 270,000 archaeological sites from eight states east of the Mississippi. The data include rich chronological, legal, and environmental metadata used by government officials and the research community alike. The poster discusses the challenges of integrating and visualizing data at vastly different scales—from the scale of continents to the scale of individual object records at a given site. It also presents how the project is dealing with visualizing both space and time, with time as a type of metadata that presents special complications in navigating and visualizing archaeological data.

Attending the SAA meeting? Come see more at the DINAA Poster Symposium [session 81]- Thursday, 24 April, 2-4 pm (Ballroom F)

Workshop Recap: State Site Files, DINAA, and Archaeology

We recently concluded a workshop for the DINAA project, held at the University of Tennessee (UT), Knoxville Office of Research on March 19th and 20th. The workshop brought together more than 30 participants, including managers and researchers from universities and state and federal agencies across Eastern North America, as well as graduate students from UT and Indiana University.

workshop participants

Participants of the DINAA workshop on archaeological site file data management and sharing

The DINAA project aims to provide a foundation for distributed Linked Open Data initiatives in North American archaeology using securely-shared public data. The result will be a free and open framework of millions of pieces of never-before-compiled information documenting human settlement on the continent, and use that record to address questions of critical importance to our society’s future.

In time for the workshop, the DINAA team successfully integrated and published archaeological data from 8 states east of the Mississippi. DINAA has made publicly available over 270,000 of the anticipated half a million site records to be published by the conclusion of the grant. The data include rich chronological, legal, and environmental metadata used by government officials and the research community alike.

Here are some links to browse and visualize data published through DINAA:

  • Map based browsing of data through Open Context
  • Testing demo of an alternative “Heat Map” visualization of DINAA data. (Please note: this is just a proof-of-concept demonstrating how Open Context’s APIs can support alternative visualizations, it is not a fully functional interface)

Sharing Good Practices

Representatives from 9 states, museums, libraries and federal agencies attended the workshop and spent two days discussing the challenges of working with archaeological site files, and tools they’d like to develop to make their work more streamlined. They share common goals in working to make their content reach wider communities. For most participants, this was the first time they had met their peer site file managers from other states and had the opportunity to discuss data management. The participants were thrilled to have the chance to learn how their peers work and to get ideas for ways to improve their data documentation, presentation, and management. One of the key outcomes of the DINAA workshop was this cross-pollination of site management expertise across state-lines.

In addition, workshop participants explored a variety of methods and tools popular in open science and digital humanities applications. These ranged from linked data methods, ontologies, web mapping and GeoJSON, and most popular of all, OpenRefine’s data clean-up tools.

Sustaining a Commitment to the Public Good

Public funds (NSF grant, and partner university grants) support DINAA and all project outcomes are freely and publicly accessible without restrictions, but this is a short-term funding source. The workshop conversation discussed longer term sustainability concerns. A theme of the discussion focused on the need for independent financing to maintain DINAA’s orientation toward serving the public good. “Open data” are by no means anti-commercial. DINAA published data can be freely used not only by allied nonprofit efforts such as tDAR, but also by commercial entities and efforts aligned to particular industries (such as the GAPP initiative). Nevertheless, Josh Wells and Eric Kansa talked about how foundational information resources such as DINAA should not be dependent on commercial financing, which would necessarily privilege particular commercial agendas. Thus, alternative forms of governance and financing sustained by broader communities are needed.

Learn More

We also want to thank Ethan Watrall and Andrew White for their remote participation and presentations. We’ll soon share links to their slides and the resources they discussed. Additional results of the DINAA workshop will be presented at the upcoming SAA meeting in Austin, TX. A link to a pdf of the poster will be added to this post shortly.

Publish or Perish 2014, Resurrected Online

On February 13-14, the IFHA Project at UC Davis hosted the first Innovating Communication in Scholarship (ICIS) conference. The theme this year was Publish or Perish – the Future of Academic Publishing and Careers. Open Context’s Eric Kansa attended the conference as a panelist in the session “Beyond Journals & New Forms of Digital Publishing.” For those of us who couldn’t attend, the conference organizers have done a fantastic job of broadcasting the entire conference in a variety of ways. If you can find a few spare hours, here are some ways you can experience Publish or Perish 2014…

    Session Videos: On the main conference page you will find the agenda and links to over ten hours of videos from the six main panel sessions: The Changing Nature of the Journal; Beyond Journals & New Forms of Digital Publishing; Innovations in Peer Review; Changing the Value Proposition of Publishing; Altmerics: Do they Measure Anything Useful?; and Assessment.
    Tweets: You can also experience a snapshot of the two days of the conference through the sequence of tweets organized with Storify (Day 1 and Day 2). This was a highly-tweety crowd and the series of comments really shows the value of the 140-character tweet to highlight golden nuggets from events.

The ICIS project is made possible by an award from the Interdisciplinary Frontiers in the Humanities and Arts (IFHA) program at UC Davis.

Celebrating a Year of Open Data

2013 has been a really big year for open data. In February, the White House Office of Science and Technology Policy announced a new mandate for open access to peer-reviewed outcomes of federally-funded research, including publications and data. The various agencies have been exploring how they will enact this new policy, and have welcomed input from the public.

Beyond these developments on the federal level, many institutions have shifted gears to promote the free exchange of data. New developments in archaeology include the adoption of a data management policy by the Shelby White and Leon Levy Program for Archaeological Publications, and special panel discussions relating to open access and publishing at the upcoming AIA and SAA meetings. On a broader scale, the Nature Publishing Group recently announced Scientific Data a new, open access, publication for descriptions datasets (they also provide an excellent video about data publishing). The tragic loss of open access advocate Aaron Swartz in January may well have galvanized a move toward more openness over the course of the year. His case cast a spotlight on the misalignment of scholarship and the exchange of ideas with the laws governing copyright and computer networks. His loss underscored some of the ethical stakes associated with access to knowledge.

We at Open Context have been vocal advocates for open data publishing for some time now. In short, we believe that open data publishing not only makes research more effective, but it better aligns archaeology with the public spirit. We’ve been promoting these perspectives through publications and presentations (see some examples here and here). Our most recent call for open access to research content appeared in The SAA Archaeological Record this fall (the article is available Open Access from SAA). This year also saw a White House honor for Open Context’s Program Director Eric Kansa as a Champion of Change for his contributions to Open Science (see the NEH announcement).

We’re also striving to practice what we preach. Open Context published 18 projects this year. Fourteen of these are already cited in conventional publications. A few examples:

As the ecosystem of open data grows, the various participants are finding innovative ways of leveraging the power of the Web. For instance, online publications like Internet Archaeology and the Journal of Open Archaeology Data are establishing extensive networks of partners to archive data that links to their publications. Open Context is listed by both services as a recommended system to host datasets related to their publications. The direction this is going is making sure the linking is two ways—a link from the dataset to the paper, and a link from the paper back to the dataset.

We are delighted to see data publishing catching on and look forward to what 2014 will bring!

Florida and Georgia Site Files Launch DINAA Project

DINAA-logo-final-colorThe continent of North American has a long and rich history of human occupation spanning more than 13,000 years. The Digital Index of North American Archaeology (DINAA) is a multi-institutional project to help make the history of settlement in the Americas accessible to everyone. The two-year project, which is funded by the National Science Foundation, began in September 2012. DINAA’s central task is to publish records of archaeological and historic sites compiled by State Historic Preservation Offices (SHPOs) while keeping secure sensitive locational and ownership information about these properties. SHPOs do a tremendous public service by safeguarding our national heritage. They protect everything from historic battlefields to sites with evidence of how the first Americans explored and settled the continent over 13,000 years ago. DINAA provides greater access to cultural resource inventory data so that researchers, students, and the public can learn from our rich history.

DINAA uses advanced data management techniques to integrate site records together so that researchers and the public can get the “big picture” and explore where people lived across North America from the earliest times to the present. This is done in a way that accommodates large-scale research, while still protecting primary information. We are beginning public “beta testing” of the first two SHPO databases in the DINAA project- Florida and Georgia. These two states alone represent nearly 100,000 sites spanning some 10,000 years. Publication of these datasets involves significant user interface and visualization challenges. We welcome community feedback to improve how the data are presented and navigated.

Here are some starting points:

  • Faceted search of periods. On the left, different periods are listed as search options. The map shows sites in Georgia and Florida, with counts of sites visualized in geospatial tiles. The geospatial tiles are similar to those used by a variety of Web mapping systems to enable efficient indexing and retrieval of geospatial information at a variety of scales. We are using a QuadTree approach to define geospatial tiles and assign each archaeological site to a tile using the methods defined here. To reduce risks associated with sensitive location data, we limit the precision of the geospatial tiles. DINAA limits geospatial resolution to only 15 – 20 km squares.
  • Faceted search of Archaic sub-periods. The DINAA project annotates all data with a controlled vocabulary to facilitate data integration. The controlled vocabulary defines common concepts applied to each SHPO dataset and enables searching, browsing, and visualization across these datasets. Currently, the DINAA controlled vocabulary (“ontology” if you want to sound fancy) only covers major time periods / archaeological cultures. As the DINAA project continues, we will expand the controlled vocabulary to cover other important aspects of SHPO data.
  • Morrow Mountain Projectile Point / Knife. This displays a map of a diagnostic artifact type in Georgia.
  • Keyword search for the phrase “Chattahoochee Brushed”. This displays a map of another diagnostic artifact type in Florida and Georgia. The map is similar to this resource at the University of Georgia developed by Mark Williams, Victor Thompson and Llyod E. Schroder.

Recommendations on Ethics, Sustainability, and Open Access in Archaeology

In the article On Ethics, Sustainability, and Open Access in Archaeology available in the September 2013 issue of the SAA Archaeological Record, co-authors Eric Kansa, Sarah Whitcher Kansa, and Lynne Goldstein provide recommendations to the SAA for improving access to research results in archaeology.

The authors welcome comments on the following five recommendations:

  • Gain experience with Open Access.
    The SAA needs to better understand the opportunities and costs associated with Open Access. It needs to experiment and learn exactly how to run a sustainable peer-reviewed Open Access publishing service. This experience will give the SAA the needed understanding to better articulate policy recommendations to our financial backers. The SAA need not do this alone. It can partner with other societies, university library groups developing scholarly communications infrastructure, or other commercial or nonprofit Open Access publishers.
  • Refrain from lobbying against or weakening Open Access.
    Both the AAA and the AIA joined with monopolistic publishers like Elsevier in lobbying against Open Access (Kansa 2012). These actions debase these scholarly societies and put them into the camp of commercial giants that promote oppressive intellectual property laws that further commoditize knowledge; harm research, teaching, and free-expression; and endanger their own memberships.
  • Seek legal protections for researchers, students, and the public
    The SAA also can make a public statement calling for a more equitable and just balance in computer-security and copyright law and in the interpretation of such laws with regard to scholarly works. Legal frameworks governing publication need to better reflect our values and protect researchers, instructors, students, and other members of the public in accessing and using published research.
  • Encourage quality and prestige in Open Access archaeology.
    Even if the SAA does not launch its own Open Access titles, the SAA leadership should encourage greater professionalism and professional recognition for Open Access. The SAA should encourage senior scholars to join editorial boards of Open Access journals and should provide peer-review and other services to Open Access titles to increase their prestige, acceptance, and quality.
  • Publicly endorse Open Access as a goal to work toward.
    The SAA can issue a public statement that Open Access represents a goal for the organization, even if it is currently not financially feasible. The SAA needs to investigate funding and organizational requirements to sustain quality Open Access publishing and make it a goal to build the public support and financial resources needed to adopt publication models that better promote the common good of public knowledge. In other words, if we cannot finance Open Access with currently available funding, the SAA needs to make sustainable Open Access to peer-reviewed publications the goal of future fundraising and public policy campaigns.

Kansa, Eric. 2012. Openness and Archaeology’s Information Ecosystem. World Archaeology 44(4):498–520. DOI: (Open Access preprint:

DINAA Project at Digital Humanities 2013: Open Context and North American Site Files


While Sarah and Eric Kansa are busy finishing up field work and “data wrangling” at Poggio Civitate, our colleagues and collaborators will discuss the Digital Index of North American Archaeology (DINAA) project at the 2013 Digital Humanities Conference.

Josh Wells (PI) will take part in a session Current Research & Practice in Digital Archaeology (organized by Ethan Watrall) to give an overview of DINAA and our progress thus far. The DINAA presentation is titled: An Introduction to the Practices and Initial Findings of the Digital Index of North American Archaeology (DINAA).

We’ve made some good progress in enhancing Open Context to support map-based browsing of data. This will be an important feature for navigating and visualizing data compiled by the DINAA project. We’re testing (WARNING! NOT READY FOR PRIME-TIME) some of these map-based browsing features here:

  • Example 1: Counts of items classified as cattle (Bos taurus)
  • Example 2: Percentage of cattle (Bos taurus) at sites in the Near East compared to all items with a biological classification

Again, we’re still only in early stage testing now, and there are lots of interaction bugs and issues to solve to make this feature more useful and less frustrating. This feature uses Leaflet (and open-source Web mapping library) and GeoJSON (a very popular and open geospatial data format).

We should also note that GeoJSON has some of its roots in developments Sean Gillies made for the Pleiades gazetteer of ancient world places. GeoJSON is just one of the great outcomes of Pleiades, and a major contribution of the digital humanities toward Web technologies. With DINAA, we’re very excited to follow in these footsteps!

Open Context Honored by White House as a Contribution to Open Science

We’re proud to announce that today the White House is recognizing Eric Kansa as a “Champion of Change” in Open Science. We are honored and gratified that the White House has chosen to recognize the research community in the humanities and social sciences, including archaeology, the discipline where we focus most of our efforts. We are also honored that the White House chose to recognize an “#AltAc” (alternative academic), a growing global community of scholars working outside of traditional academic career path.

“Openness” is still struggling to take root in many areas of the humanities and social sciences. Many of the models like PLoS or that have proved so effective in other areas of the sciences still need to be adapted to fit the funding and professional context of the humanities and social sciences. We hope this kind of recognition helps to further galvanize efforts to improve accessibility and equity in the humanities and social sciences, and we hope this helps to build bridges with other fields of research.

Over the past 10 years, we at the AAI have promoted data sharing in archaeology, and we have developed Open Context to that end. However, our efforts are not limited to Open Context. We have also worked to improve communication in archaeology and other areas of the humanities and social sciences, and have joined in a much larger community, rich with talent, humor, and energy to make research more accessible and ethical.

Eric joins 12 others at the White House today. The event can be viewed live here: A YouTube video will be available after the event. The White House press release is available here:

Comments on OSTP Open Data Policy

Today, Open Context’s Eric Kansa spoke (via phone) at the meeting on Public Access to Federally-Supported Research and Development Data and Publications: Data, hosted by the National Research Council of the National Academies. The meeting, taking place May 16-17, is hearing invited and public comments on the White House OSTP memo on expanding access to data resulting from federally-funded research, with the aim of informing agencies as they develop policies in response to the memo.

The NRC has posted a video of the meeting online. In addition, you can read the AAI’s comments in a document that includes responses from various individuals/organizations, some of whom also spoke at the meeting. In the meantime, here are Eric’s comments:

My name is Eric Kansa, and I manage and direct Open Context, an open access, open licensed data publication venue for archaeology and related fields. I’ve also participated in text-mining in the digital humanities. Text-mining really shows that the boundaries between text and data are increasingly burred, and that texts (publications) increasingly share many of the open intellectual property requirements critical to the re-useability of data.

While we focus on editorial and peer review services on data contributions, we work closely with colleagues at the University of California, California Digital Library, an institution that provides us with essential digital repository & persistent identity services. With Open Context, we are grateful for grant support from the National Endowment for the Humanities, particularly the Office of Digital Humanities, the National Science Foundation (see current work), and private foundations. We’re one example of how the lines between the humanities and sciences are increasingly blurred, and that’s a good thing.

In receiving support from multiple federal agencies, I think coordination across agencies is vital. Research suffers when stove-piped in artificial silos. Similarly other agencies also support and even mandate research, especially to enforce laws in historical preservation and environmental protection. Data practices relating to compliance-oriented research also need to be harmonized with agencies that support mainly academic oriented-research.

Based on over 10 years experience promoting greater data openness and professionalism in archaeology, I think it critical for policy-making to promote dynamism and innovation in the management of data. Data needs are diverse and ever evolving. We need to encourage that dynamism by welcoming new entrants with new ideas and approaches to data management, data preservation, dissemination and reuse.

There’s often a tacit assumption that data are a “residue” of research, and a researcher’s primary responsibility with respect to data centers mainly on preservation. I think that is limiting, and in some circumstances, data can and should be valued as a primary outcome of research. To borrow a phrase from my colleagues at the California Digital Library, data can also be a “first class citizen” of scholarly production. Data can also play a central role in new modes of scholarly communications, with approaches like “data sharing as publication”, or exhibition, or even data sharing as a kind of open-source release cycle. The point is, data can play many and expanding roles in researcher communications. Policy should not assume that data should only play the role of a secondary, supplemental outcome to research.

The need to foster dynamism also needs to inform thinking about financial sustainability. Public policy needs to recognize that the sustainability of particular organizations and practices in the research endeavor is only a means to an end in promoting the public good. Sustainability of particular interests should not be an end to itself. “Resiliency” may be a better term, since it may better capture our obligations for data and knowledge stewardship without lock-in to particular set of institutions or practices.

In other words, notions of data “openness” need to expand beyond technical and licensing concerns, but also to the organizations and people participating in the research community’s information ecosystem, esp. the next generation of students who will have their own needs and priorities with respect to data. True resiliency will require real funding, an issue where OSTP policy memo falls short. And I urge agencies to work with the research community, libraries, and others to honestly understand funding requirements. We need this to make a clear case to the American public about investing in unlocking the richness of research data.

Earlier this week, the NRC sponsored a related meeting to hear comments on the other part of the OSTP memo, relating to public access to publications resulting from federally-funded research. The AAI submitted comments for this meeting, as well, which you can read here in a PDF containing all mail-in responses.

Lessons in Data Reuse, Integration, and Publication

On April 17, members of the Central and Western Anatolian Neolithic Working Group met at Kiel University to participate in the International Open Workshop: Socio-Environmental Dynamics over the Last 12,000 Years: The Creation of Landscapes III. Working group participants presented their hot-off-the-press analyses of various aspects of integrated faunal datasets from over one dozen Anatolian archaeological sites spanning the Epipaleolithic through the Chalcolithic (a range of 10,000+ years). Several more sites will add data to the project in the coming months to ensure that the resulting collaborative publications are as comprehensive as possible.

These presentations took place in the session Into New Landscapes: Subsistence Adaptation and Social Change during the Neolithic Expansion in Central and Western Anatolia. The session, which was chaired by Benjamin Arbuckle (Department of Anthropology, Baylor University) and Cheryl Makarewicz (Institute of Pre- and Protohistoric Archaeology, CAU Kiel), included a panel of presentations followed by an open discussion.

A bit of background: Over the past five months, with enabling funding from the Encyclopedia of Life (EOL), we have worked with participants in this project to prepare their datasets for publication. Each participant contributed a dataset that would be edited and published in Open Context, and integrated with the other datasets. Rather than ask all participants to analyze the entire corpus of datasets, we asked each participant to address a specific topic. These topics (“sheep and goat age data”, “cattle biometrics”) required access to a smaller set of relevant data, their analysis of which the participants presented at the Kiel conference.

The research community has very little experience with this kind of collaborative data integration. Archaeology rarely sees projects that go beyond conventional publication outcomes, to also emphasize the publication of high-quality, reusable structured data. After months of preparing datasets for shared analysis and publication, I was really looking forward to seeing the research outcomes unfold.

As an added bonus, our colleagues from the DIPIR project joined us there to document the data publishing and collaborative data reuse processes. We felt very fortunate that the DIPIR team members could apply highly rigorous methods to observing and studying how researchers grappled with integrating multiple datasets. We’re looking forward to learning from the DIPIR team as they synthesize their observations on how researchers collaborate with shared data.

In the meantime, we’d like to share some initial impressions and lessons on data reuse that emerged from this work:

Full data access can improve practice. We can learn a lot by looking at how others record data. Some may see sharing our databases and spreadsheets as opening ourselves up to criticism. Such practices can greatly improve the consistency in the way we record data, and therefore facilitate meaningful data integration. In this one-day workshop alone, we identified a few key areas where zooarchaeologists can improve their consistency in data recording.

An example of this from the workshop: Although all zooarchaeologists record age data based on the fusion stage of skeletal elements, some elaborate on their notations where others don’t. For example, an unfused calcaneus of a sheep might come from a newborn lamb or from a sheep up to about two years of age (when the calcaneus fuses). One researcher might put a note in a “Comments” field indicating that the bone is from a neonate. Another researcher, dealing with the same specimen, might leave the notation simply as “unfused.” Thus, two recording systems can lead to very different interpretations, one that recognizes the newborn lambs in the assemblage, and one that lumps them with the other “sub-adult” sheep. Such differences in aggregate can lead to vastly different interpretations of an assemblage. These recording discrepancies become apparent when data authors begin looking “under the hood” at each others’ datasets. Recognizing these discrepancies and their possible effects on interpretation can inform better practice in data recording, and thus work toward improving future comparability and integration of published datasets.

While data preservation is a good motivation for better data management, we think a professional expectation for data will help motivate researchers to create better data in the first place. The discussions provoked by this study helps us to better understand what “better data” may mean in zooarchaeology.

Documenting data in anticipation of reuse. I think we can all agree that datasets must contain certain critical information or they will not be useful to future researchers. But here’s the catch: Information deemed “critical” for one project is not the same for another project. Sure, there may be a baseline of key information that applies to all projects (location, date, author, etc.), but there is a much larger amount of discipline-specific or even project-specific information that needs to be documented to enable reuse. To complicate things, the absence of this documentation may only be noticed upon reuse. That is, the project may appear well-documented until an expert attempts to reuse the dataset.

An example: Some datasets in this study contained a large number of mollusks. From the perspective of a data re-user wanting to integrate multiple datasets, this poses a big question: Does an absence of mollusks at the other sites mean that the ancient inhabitants did not exploit marine resources? Or is their absence simply a result of the mollusks having not been included in the analysis (either not collected or perhaps set aside for analysis by another specialist)? Understanding this absence of data is critical for any reuse of the dataset.

This highlights the important role of data editors and reviewers, who can work with data authors to identify and gather this key information at the time the dataset is disseminated (rather than having questions come up years later upon reuse). Furthermore, not just anybody can review the dataset. Knowing if a dataset is documented sufficiently requires in-depth knowledge of the subject matter, and the ability to project potential applications of the data to anticipate questions that might arise with future use.

The benefits of peer-review via data reuse. Data publication is still in its infancy. There is a lot of exploration taking place as to what “data publication” means and how it should be carried out. If it mimics conventional publication, peer-review of datasets would occur before their publication. However, our data reuse studies are showing that, in fact, the most comprehensive peer-review of data occurs upon its reuse. It is only at the time of reuse that a dataset is tested and scrutinized to the point where key data documentation questions emerge. This may only be an issue in today’s data-sharing world. Perhaps future data authors, accustomed to full and expected data dissemination, will practice exhaustive documentation from the get-go. But what do we do now? How does post-publication peer-review, which appears to be so critical to documenting datasets properly, fit with models of data publication?

This work is supported by a Computable Data Challenge Grant from the Encyclopedia of Life, as well as by funds from the National Endowment for the Humanities and the American Council of Learned Societies. Any views, findings, conclusions, or recommendations expressed in this post do not necessarily reflect those of the funding organziations.

Posted in Data publications, Editorial Workflow, Events, Projects.