Skip to content

Florida and Georgia Site Files Launch DINAA Project

DINAA-logo-final-colorThe continent of North American has a long and rich history of human occupation spanning more than 13,000 years. The Digital Index of North American Archaeology (DINAA) is a multi-institutional project to help make the history of settlement in the Americas accessible to everyone. The two-year project, which is funded by the National Science Foundation, began in September 2012. DINAA’s central task is to publish records of archaeological and historic sites compiled by State Historic Preservation Offices (SHPOs) while keeping secure sensitive locational and ownership information about these properties. SHPOs do a tremendous public service by safeguarding our national heritage. They protect everything from historic battlefields to sites with evidence of how the first Americans explored and settled the continent over 13,000 years ago. DINAA provides greater access to cultural resource inventory data so that researchers, students, and the public can learn from our rich history.

DINAA uses advanced data management techniques to integrate site records together so that researchers and the public can get the “big picture” and explore where people lived across North America from the earliest times to the present. This is done in a way that accommodates large-scale research, while still protecting primary information. We are beginning public “beta testing” of the first two SHPO databases in the DINAA project- Florida and Georgia. These two states alone represent nearly 100,000 sites spanning some 10,000 years. Publication of these datasets involves significant user interface and visualization challenges. We welcome community feedback to improve how the data are presented and navigated.

Here are some starting points:

  • Faceted search of periods. On the left, different periods are listed as search options. The map shows sites in Georgia and Florida, with counts of sites visualized in geospatial tiles. The geospatial tiles are similar to those used by a variety of Web mapping systems to enable efficient indexing and retrieval of geospatial information at a variety of scales. We are using a QuadTree approach to define geospatial tiles and assign each archaeological site to a tile using the methods defined here. To reduce risks associated with sensitive location data, we limit the precision of the geospatial tiles. DINAA limits geospatial resolution to only 15 – 20 km squares.
  • Faceted search of Archaic sub-periods. The DINAA project annotates all data with a controlled vocabulary to facilitate data integration. The controlled vocabulary defines common concepts applied to each SHPO dataset and enables searching, browsing, and visualization across these datasets. Currently, the DINAA controlled vocabulary (“ontology” if you want to sound fancy) only covers major time periods / archaeological cultures. As the DINAA project continues, we will expand the controlled vocabulary to cover other important aspects of SHPO data.
  • Morrow Mountain Projectile Point / Knife. This displays a map of a diagnostic artifact type in Georgia.
  • Keyword search for the phrase “Chattahoochee Brushed”. This displays a map of another diagnostic artifact type in Florida and Georgia. The map is similar to this resource at the University of Georgia developed by Mark Williams, Victor Thompson and Llyod E. Schroder.

Posted in Data publications, News, Projects.

Recommendations on Ethics, Sustainability, and Open Access in Archaeology

In the article On Ethics, Sustainability, and Open Access in Archaeology available in the September 2013 issue of the SAA Archaeological Record, co-authors Eric Kansa, Sarah Whitcher Kansa, and Lynne Goldstein provide recommendations to the SAA for improving access to research results in archaeology.

The authors welcome comments on the following five recommendations:

  • Gain experience with Open Access.
    The SAA needs to better understand the opportunities and costs associated with Open Access. It needs to experiment and learn exactly how to run a sustainable peer-reviewed Open Access publishing service. This experience will give the SAA the needed understanding to better articulate policy recommendations to our financial backers. The SAA need not do this alone. It can partner with other societies, university library groups developing scholarly communications infrastructure, or other commercial or nonprofit Open Access publishers.
  • Refrain from lobbying against or weakening Open Access.
    Both the AAA and the AIA joined with monopolistic publishers like Elsevier in lobbying against Open Access (Kansa 2012). These actions debase these scholarly societies and put them into the camp of commercial giants that promote oppressive intellectual property laws that further commoditize knowledge; harm research, teaching, and free-expression; and endanger their own memberships.
  • Seek legal protections for researchers, students, and the public
    The SAA also can make a public statement calling for a more equitable and just balance in computer-security and copyright law and in the interpretation of such laws with regard to scholarly works. Legal frameworks governing publication need to better reflect our values and protect researchers, instructors, students, and other members of the public in accessing and using published research.
  • Encourage quality and prestige in Open Access archaeology.
    Even if the SAA does not launch its own Open Access titles, the SAA leadership should encourage greater professionalism and professional recognition for Open Access. The SAA should encourage senior scholars to join editorial boards of Open Access journals and should provide peer-review and other services to Open Access titles to increase their prestige, acceptance, and quality.
  • Publicly endorse Open Access as a goal to work toward.
    The SAA can issue a public statement that Open Access represents a goal for the organization, even if it is currently not financially feasible. The SAA needs to investigate funding and organizational requirements to sustain quality Open Access publishing and make it a goal to build the public support and financial resources needed to adopt publication models that better promote the common good of public knowledge. In other words, if we cannot finance Open Access with currently available funding, the SAA needs to make sustainable Open Access to peer-reviewed publications the goal of future fundraising and public policy campaigns.

Kansa, Eric. 2012. Openness and Archaeology’s Information Ecosystem. World Archaeology 44(4):498–520. DOI: (Open Access preprint:

Posted in Policy, Publication.

DINAA Project at Digital Humanities 2013: Open Context and North American Site Files


While Sarah and Eric Kansa are busy finishing up field work and “data wrangling” at Poggio Civitate, our colleagues and collaborators will discuss the Digital Index of North American Archaeology (DINAA) project at the 2013 Digital Humanities Conference.

Josh Wells (PI) will take part in a session Current Research & Practice in Digital Archaeology (organized by Ethan Watrall) to give an overview of DINAA and our progress thus far. The DINAA presentation is titled: An Introduction to the Practices and Initial Findings of the Digital Index of North American Archaeology (DINAA).

We’ve made some good progress in enhancing Open Context to support map-based browsing of data. This will be an important feature for navigating and visualizing data compiled by the DINAA project. We’re testing (WARNING! NOT READY FOR PRIME-TIME) some of these map-based browsing features here:

  • Example 1: Counts of items classified as cattle (Bos taurus)
  • Example 2: Percentage of cattle (Bos taurus) at sites in the Near East compared to all items with a biological classification

Again, we’re still only in early stage testing now, and there are lots of interaction bugs and issues to solve to make this feature more useful and less frustrating. This feature uses Leaflet (and open-source Web mapping library) and GeoJSON (a very popular and open geospatial data format).

We should also note that GeoJSON has some of its roots in developments Sean Gillies made for the Pleiades gazetteer of ancient world places. GeoJSON is just one of the great outcomes of Pleiades, and a major contribution of the digital humanities toward Web technologies. With DINAA, we’re very excited to follow in these footsteps!

Posted in Events, News, Projects.

Tagged with , , , .

Open Context Honored by White House as a Contribution to Open Science

We’re proud to announce that today the White House is recognizing Eric Kansa as a “Champion of Change” in Open Science. We are honored and gratified that the White House has chosen to recognize the research community in the humanities and social sciences, including archaeology, the discipline where we focus most of our efforts. We are also honored that the White House chose to recognize an “#AltAc” (alternative academic), a growing global community of scholars working outside of traditional academic career path.

“Openness” is still struggling to take root in many areas of the humanities and social sciences. Many of the models like PLoS or that have proved so effective in other areas of the sciences still need to be adapted to fit the funding and professional context of the humanities and social sciences. We hope this kind of recognition helps to further galvanize efforts to improve accessibility and equity in the humanities and social sciences, and we hope this helps to build bridges with other fields of research.

Over the past 10 years, we at the AAI have promoted data sharing in archaeology, and we have developed Open Context to that end. However, our efforts are not limited to Open Context. We have also worked to improve communication in archaeology and other areas of the humanities and social sciences, and have joined in a much larger community, rich with talent, humor, and energy to make research more accessible and ethical.

Eric joins 12 others at the White House today. The event can be viewed live here: A YouTube video will be available after the event. The White House press release is available here:

Posted in Events, News, Policy.

Comments on OSTP Open Data Policy

Today, Open Context’s Eric Kansa spoke (via phone) at the meeting on Public Access to Federally-Supported Research and Development Data and Publications: Data, hosted by the National Research Council of the National Academies. The meeting, taking place May 16-17, is hearing invited and public comments on the White House OSTP memo on expanding access to data resulting from federally-funded research, with the aim of informing agencies as they develop policies in response to the memo.

The NRC has posted a video of the meeting online. In addition, you can read the AAI’s comments in a document that includes responses from various individuals/organizations, some of whom also spoke at the meeting. In the meantime, here are Eric’s comments:

My name is Eric Kansa, and I manage and direct Open Context, an open access, open licensed data publication venue for archaeology and related fields. I’ve also participated in text-mining in the digital humanities. Text-mining really shows that the boundaries between text and data are increasingly burred, and that texts (publications) increasingly share many of the open intellectual property requirements critical to the re-useability of data.

While we focus on editorial and peer review services on data contributions, we work closely with colleagues at the University of California, California Digital Library, an institution that provides us with essential digital repository & persistent identity services. With Open Context, we are grateful for grant support from the National Endowment for the Humanities, particularly the Office of Digital Humanities, the National Science Foundation (see current work), and private foundations. We’re one example of how the lines between the humanities and sciences are increasingly blurred, and that’s a good thing.

In receiving support from multiple federal agencies, I think coordination across agencies is vital. Research suffers when stove-piped in artificial silos. Similarly other agencies also support and even mandate research, especially to enforce laws in historical preservation and environmental protection. Data practices relating to compliance-oriented research also need to be harmonized with agencies that support mainly academic oriented-research.

Based on over 10 years experience promoting greater data openness and professionalism in archaeology, I think it critical for policy-making to promote dynamism and innovation in the management of data. Data needs are diverse and ever evolving. We need to encourage that dynamism by welcoming new entrants with new ideas and approaches to data management, data preservation, dissemination and reuse.

There’s often a tacit assumption that data are a “residue” of research, and a researcher’s primary responsibility with respect to data centers mainly on preservation. I think that is limiting, and in some circumstances, data can and should be valued as a primary outcome of research. To borrow a phrase from my colleagues at the California Digital Library, data can also be a “first class citizen” of scholarly production. Data can also play a central role in new modes of scholarly communications, with approaches like “data sharing as publication”, or exhibition, or even data sharing as a kind of open-source release cycle. The point is, data can play many and expanding roles in researcher communications. Policy should not assume that data should only play the role of a secondary, supplemental outcome to research.

The need to foster dynamism also needs to inform thinking about financial sustainability. Public policy needs to recognize that the sustainability of particular organizations and practices in the research endeavor is only a means to an end in promoting the public good. Sustainability of particular interests should not be an end to itself. “Resiliency” may be a better term, since it may better capture our obligations for data and knowledge stewardship without lock-in to particular set of institutions or practices.

In other words, notions of data “openness” need to expand beyond technical and licensing concerns, but also to the organizations and people participating in the research community’s information ecosystem, esp. the next generation of students who will have their own needs and priorities with respect to data. True resiliency will require real funding, an issue where OSTP policy memo falls short. And I urge agencies to work with the research community, libraries, and others to honestly understand funding requirements. We need this to make a clear case to the American public about investing in unlocking the richness of research data.

Earlier this week, the NRC sponsored a related meeting to hear comments on the other part of the OSTP memo, relating to public access to publications resulting from federally-funded research. The AAI submitted comments for this meeting, as well, which you can read here in a PDF containing all mail-in responses.

Posted in Events, News, Policy.

Lessons in Data Reuse, Integration, and Publication

On April 17, members of the Central and Western Anatolian Neolithic Working Group met at Kiel University to participate in the International Open Workshop: Socio-Environmental Dynamics over the Last 12,000 Years: The Creation of Landscapes III. Working group participants presented their hot-off-the-press analyses of various aspects of integrated faunal datasets from over one dozen Anatolian archaeological sites spanning the Epipaleolithic through the Chalcolithic (a range of 10,000+ years). Several more sites will add data to the project in the coming months to ensure that the resulting collaborative publications are as comprehensive as possible.

These presentations took place in the session Into New Landscapes: Subsistence Adaptation and Social Change during the Neolithic Expansion in Central and Western Anatolia. The session, which was chaired by Benjamin Arbuckle (Department of Anthropology, Baylor University) and Cheryl Makarewicz (Institute of Pre- and Protohistoric Archaeology, CAU Kiel), included a panel of presentations followed by an open discussion.

A bit of background: Over the past five months, with enabling funding from the Encyclopedia of Life (EOL), we have worked with participants in this project to prepare their datasets for publication. Each participant contributed a dataset that would be edited and published in Open Context, and integrated with the other datasets. Rather than ask all participants to analyze the entire corpus of datasets, we asked each participant to address a specific topic. These topics (“sheep and goat age data”, “cattle biometrics”) required access to a smaller set of relevant data, their analysis of which the participants presented at the Kiel conference.

The research community has very little experience with this kind of collaborative data integration. Archaeology rarely sees projects that go beyond conventional publication outcomes, to also emphasize the publication of high-quality, reusable structured data. After months of preparing datasets for shared analysis and publication, I was really looking forward to seeing the research outcomes unfold.

As an added bonus, our colleagues from the DIPIR project joined us there to document the data publishing and collaborative data reuse processes. We felt very fortunate that the DIPIR team members could apply highly rigorous methods to observing and studying how researchers grappled with integrating multiple datasets. We’re looking forward to learning from the DIPIR team as they synthesize their observations on how researchers collaborate with shared data.

In the meantime, we’d like to share some initial impressions and lessons on data reuse that emerged from this work:

Full data access can improve practice. We can learn a lot by looking at how others record data. Some may see sharing our databases and spreadsheets as opening ourselves up to criticism. Such practices can greatly improve the consistency in the way we record data, and therefore facilitate meaningful data integration. In this one-day workshop alone, we identified a few key areas where zooarchaeologists can improve their consistency in data recording.

An example of this from the workshop: Although all zooarchaeologists record age data based on the fusion stage of skeletal elements, some elaborate on their notations where others don’t. For example, an unfused calcaneus of a sheep might come from a newborn lamb or from a sheep up to about two years of age (when the calcaneus fuses). One researcher might put a note in a “Comments” field indicating that the bone is from a neonate. Another researcher, dealing with the same specimen, might leave the notation simply as “unfused.” Thus, two recording systems can lead to very different interpretations, one that recognizes the newborn lambs in the assemblage, and one that lumps them with the other “sub-adult” sheep. Such differences in aggregate can lead to vastly different interpretations of an assemblage. These recording discrepancies become apparent when data authors begin looking “under the hood” at each others’ datasets. Recognizing these discrepancies and their possible effects on interpretation can inform better practice in data recording, and thus work toward improving future comparability and integration of published datasets.

While data preservation is a good motivation for better data management, we think a professional expectation for data will help motivate researchers to create better data in the first place. The discussions provoked by this study helps us to better understand what “better data” may mean in zooarchaeology.

Documenting data in anticipation of reuse. I think we can all agree that datasets must contain certain critical information or they will not be useful to future researchers. But here’s the catch: Information deemed “critical” for one project is not the same for another project. Sure, there may be a baseline of key information that applies to all projects (location, date, author, etc.), but there is a much larger amount of discipline-specific or even project-specific information that needs to be documented to enable reuse. To complicate things, the absence of this documentation may only be noticed upon reuse. That is, the project may appear well-documented until an expert attempts to reuse the dataset.

An example: Some datasets in this study contained a large number of mollusks. From the perspective of a data re-user wanting to integrate multiple datasets, this poses a big question: Does an absence of mollusks at the other sites mean that the ancient inhabitants did not exploit marine resources? Or is their absence simply a result of the mollusks having not been included in the analysis (either not collected or perhaps set aside for analysis by another specialist)? Understanding this absence of data is critical for any reuse of the dataset.

This highlights the important role of data editors and reviewers, who can work with data authors to identify and gather this key information at the time the dataset is disseminated (rather than having questions come up years later upon reuse). Furthermore, not just anybody can review the dataset. Knowing if a dataset is documented sufficiently requires in-depth knowledge of the subject matter, and the ability to project potential applications of the data to anticipate questions that might arise with future use.

The benefits of peer-review via data reuse. Data publication is still in its infancy. There is a lot of exploration taking place as to what “data publication” means and how it should be carried out. If it mimics conventional publication, peer-review of datasets would occur before their publication. However, our data reuse studies are showing that, in fact, the most comprehensive peer-review of data occurs upon its reuse. It is only at the time of reuse that a dataset is tested and scrutinized to the point where key data documentation questions emerge. This may only be an issue in today’s data-sharing world. Perhaps future data authors, accustomed to full and expected data dissemination, will practice exhaustive documentation from the get-go. But what do we do now? How does post-publication peer-review, which appears to be so critical to documenting datasets properly, fit with models of data publication?

This work is supported by a Computable Data Challenge Grant from the Encyclopedia of Life, as well as by funds from the National Endowment for the Humanities and the American Council of Learned Societies. Any views, findings, conclusions, or recommendations expressed in this post do not necessarily reflect those of the funding organziations.

Posted in Data publications, Editorial Workflow, Events, Projects.

New data publications in Open Context highlight early globalization

What does a fragment of a Canton blue and white porcelain plate from the early 19th century in Alaska have in common with a stone jar from the mid-18th century Northern Mariana Islands? Give up? Both were published in Open Context this week!

The two projects these objects come from also share a common theme— documenting early globalization in the greater (much greater!) Pacific region. The Asian Stoneware Jars project, authored by Peter Grave of the University of New England (Australia), presents data on the likely provenance and production dynamics of large stone jars, many found in dozens of shipwrecks in the Pacific and Indian Oceans. Using a variety of analytical techniques to detect trace elements, Graves and his team identified that the stoneware vessels originated in at least seventeen discrete production zones ranging from southern China to Burma, providing insights on the transport of goods around the globe during the 14th- 17th centuries.

The Mikt’sqaq Angayuk Finds project (authored by Amy Margaris, Fanny Ballantine-Himberg, Mark Rusk, and Patrick Saltonstall, in collaboration with the Alutiiq Museum) catalogs finds from an historic Alutiiq settlement of the early 19th century on Kodiak Island. The site was a springtime encampment occupied only briefly by a small number of individuals, likely Alutiiqs conscripted into service to provision the residents of Russia’s first colonial capital in Alaska (St. Paul Harbor, now the City of Kodiak). Ceramics of Russian, British, and Chinese origin, together with a variety of artifacts of local manufacture, reveal a settlement that saw the interface of two cultures and participation in an increasingly global economy.

Both publications currently carry a three star rating as they await external peer review. The star ratings are part of a new system Open Context uses to help users understand the editorial status of the publication (ranging from one star for demonstration projects to five stars for peer reviewed projects). Open Context’s Publishing page has more details on how the star ratings work.

Posted in Data publications.

Decoding Data- A View from the Trenches

This has been a busy data month for me, as I prepare zooarchaeological datasets for publication for a major data sharing project supported by the Encyclopedia of Life Computable Data Challenge award. The majority of my time has been spent decoding datasets, so I’ve had many quiet hours to mull over data publishing workflows. I’ve come up for air today to share my thoughts on what I see as some of the important issues in data decoding.

  • Decoding should happen ASAP. Opening a spreadsheet of 25,000 specimens all in code makes my blood pressure rise. What if the coding sheet is lost? That’s a lot of data down the drain. Even if the coding sheet isn’t lost, decoding is not a trivial task. Though much of it is a straightforward one-to-one pairing of code to term, there are often complicated rules on how to do the decoding. Though an individual with little knowledge of the field could do much of the initial decoding, one quickly arrives at a point where specialist knowledge is needed to make judgment calls about what the data mean. Furthermore, there are almost certainly going to be typos or misused codes that only the original analyst can correct. Decoding should be done by the original analyst whenever possible. If not, it should be done (or at least supervised) by someone with specialist knowledge.
  • Decoding is expensive. In fact, it is one of the biggest costs in the data publishing process. I’ve decoded five very large datasets over the past few weeks and they required about five to ten times more work than datasets authors submitted already decoded. The size of the dataset doesn’t matter—whether you have 800 records of 100,000 records, data decoding takes time. For example, one of the datasets I edited for the EOL project had over 125,000 specimens. It was decoded by the author before submission. Editing and preparing this dataset for publication in Open Context took about four hours. In comparison, another dataset of 15,000 specimens was in full code and took over 30 hours to translate and finalize for publication. This is something critical for those in the business of data dissemination to consider when estimating the cost of data management. Datasets need to be decoded to be useful, but decoding takes time. Should data authors be required to do that work as part of “good practice” for data management?
  • Coding sheet formats matter. Ask for coding sheets in a machine-readable format so you can easily automate some of the decoding. Though PDFs are pretty, they’re not great for decoding.
  • Decoding often has complicated (and sometimes implicit!) rules. Keep all the original codes until you are sure you have finished decoding. Otherwise, you may find you need a code from one field to interpret another field. For example, one researcher used four different codes that all translated to “mandible.” It turns out each code was associated with a certain set of measurements on the mandible. If you decode the elements first (as you would) and make all the mandibles just “mandible,” then you reach the measurements section and realize you still need that original code distinction.

Because of all of this complexity, in practice it is hard to totally automate decoding, even if you are lucky enough to have machine-readable “look-up” tables that relate specific codes to their meanings. In practice, codes may be inconsistently applied or applied according to some tacit set of rules that make them hard to understand. Mistakes happen when unpacking complicated coding schemes. It really helps to use tools like Google Refine / Open Refine that record and track all edits and changes and allow for the role-back of mistakes.

Finally, the issues around decoding help illustrate that treating data seriously has challenges and requires effort. One really needs to cross-check and validate the results of decoding efforts with data authors. That adds effort and expense to the whole data sharing process. It’s another illustration why, in many cases, data sharing requires similar levels of effort and professionalism as other more conventional forms of publication.

Decoding is necessary to use/understand data. Why not do it at the dissemination stage, when it only has to be done once and can be done in collaboration with the data author. Why make future researchers struggle through often complicated and incompletely documented coding systems?

Support for our research in data publishing also comes from the ACLS and the NEH. Any views, findings, conclusions, or recommendations expressed in this post do not necessarily reflect those of the funding organizations.

Posted in Data publications, Editorial Workflow, Projects.

New Publication on Open Access and Open Data

Eric Kansa’s hot-off-the-press paper Openness and Archaeology’s Information Ecosystem provides a timely discussion of how Open Access and Open Data models can help researchers move past some of the dysfunctions of conventional scholarly publishing. Rather than threatening quality and peer-review, these models can unlock new opportunities for finding, preserving and analyzing information that advance the discipline. The paper is published in an Open Archaeology-themed special issue of World Archaeology (ironically, a closed-access journal). For those who can’t get past the pay-wall, Eric has archived a preprint. Abstract:

The rise of the World Wide Web represents one of the most significant transitions in communications since the printing press or even since the origins of writing. To Open Access and Open Data advocates, the Web offers great opportunity for expanding the accessibility, scale, diversity, and quality of archaeological communications. Nevertheless, Open Access and Open Data face steep adoption barriers. Critics wrongfully see Open Access as a threat to peer review. Others see data transparency as naively technocratic, and lacking in an appreciation of archaeology’s social and professional incentive structure. However, as argued in this paper, the Open Access and Open Data movements do not gloss over sustainability, quality and professional incentive concerns. Rather, these reform movements offer much needed and trenchant critiques of the Academy’s many dysfunctions. These dysfunctions, ranging from the expectations of tenure and review committees to the structure of the academic publishing industry, go largely unknown and unremarked by most archaeologists. At a time of cutting fiscal austerity, Open Access and Open Data offer desperately needed ways to expand research opportunities, reduce costs and expand the equity and effectiveness of archaeological communication.

Posted in Publication.

Digital Humanities Conference in Berkeley

The 2012 Pacific Neighborhood Consortium (PNC) Annual Conference and Joint Meetings will take place at School of Information at UC Berkeley from December 7th to December 9th, 2012. The conference is hosted by the Electronic Cultural Atlas Initiative (ECAI) and the School of Information at UC Berkeley. The main theme is New Horizons: Information Technology Connecting Culture, Community, Time, and Place. The program is packed with presentations on various digital heritage topics. Eric Kansa of the AAI will present Sunday morning on Applying Linked Open Data: Refining a Model of ‘Data Sharing as Publication’.

Posted in Events, News.

Tagged with , , , , .