 Erik Kansa ist  Programmdirektor bei Open Context (, dem Open Access Data Publishing-Dienst des Alexandria Archive Institute (AAI) in San Francisco, USA. Er ist zuständig für die Bereiche Forschung und Entwicklung in Open Context und verwaltet die technischen Aspekte der Veröffentlichung und Archivierung von Daten, einschließlich Systeminteroperabilität, Datenintegration und Indexierung.

Er hat einen Doktortitel in Anthropologie von der Harvard University und seine Forschungsinteressen beschäftigen sich mit Forschungsdateninformatik, Forschungsdatenpolitik, Ethik und dem beruflichen Kontext der Digital Humanities.

Can you give us some numbers about Open Context? How many datasets have been published until to date and what disciplines are represented?

We have115 public projects, and another 18 in preparation. Most are archaeological, but we do have one project published from public health research.

And where do your “clients” come from (i.e. mostly from the US, or more globally)?

Most data come from individual specialist researchers, some from excavation teams (with multiple specialists). The majority of data we publish relevant to North American archaeology comes from US state government agencies that manage cultural resources. Most of the people who have published with us are American, and a few are from the UK. There are currently 4 datasets in preparation from EU-based researchers.

Open Context has established a successful business model for publishing open data in Archaeology. What are the key “ingredients” in your “recipe”?

We’re constantly working to secure adequate funding. The current political context in the US makes this especially challenging, as some key government funding sources have been threatened with total elimination, and we expect health-care insurance costs to rise drastically. So our main approach is to try to develop multiple sources of revenue. We’re very grateful for continued support from grants, but recently we’ve earned a larger portion of our income from data publishing fees. We have also done consulting and contracting work to raise revenue. Sometimes this works well, as it gives us a chance to work with a wider professional network, but we try to limit consulting work because it can distract us from our primary mission of publishing open archaeological data. We also don’t try to do everything in house. We use data preservation and repository services provided by other organizations, that means we don’t have to try to finance a much larger and more complex organization, technology, and workflows. So, by staying small, relatively flexible, and by working with a larger network of libraries we can keep costs down.

Open Context’s 1-minute introductory video ends with the motto “Data is for discovery and inspiration”. What is your favorite example of unexpected use ever made by anyone of contents published by Open Context?

We aspire to support all sorts of use, not just research outcomes. I’d love to see more artistic and creative uses of open data in archaeology, since I think making the past come to life in new forms of expression should be an important goal for archaeology. One area I’m excited to see is that some people are using Open Context in teaching. Some colleagues told me that they just had a paper accepted in Advances in Archaeological Practice (a journal published by the Society for American Archaeology) about this. But the most unexpected use I think comes from Shawn Graham’s „sonification“ of data he obtained from text-mining excavation diaries from Kenan Tepe in Turkey. He used Open Context’s API to get all the field notes, ran some text-analysis algorithms on the content, and then transformed that to sound, in addition to more conventional data visualizations. I’m not sure it will be a musical classic, but it sure illustrates very surprising uses of archaeological data!

On a related note, this publication ( documenting the impacts of climate change on coastal archaeological sites in Eastern North America saw a great deal of coverage in the news media. A major US government report about climate change also cited this paper (see point #14 in the CNN report: and here’s the US government report: Open Context hosts the data used in this paper.

What are the standard formats recommended by Open Context?

We accept data in a variety of standards and formats, especially common tabular or relational data (CSV, Excel, Access, Filemaker). We also get data from ESRI shapefiles, GeoJSON, or ESRI JSON exports. But all the data we accept go through an ETL („Extract transform load“ process so that we publish the data in common open formats in a common, though very abstract, schema. The formats that we offer for the public include: GeoJSON, JSON, JSON-LD, RDF-XML, Turtle, and N3.

Which licenses are used in Open Context?
We prefer researchers to publish their data with us under the CC BY license or as Public Domain CC0. But CC BY-SA is also used frequently. Additionally, we have some use of CC BY-NC. We generally recommend against it, unless there are other ethical considerations (especially views of indigenous communities).

I really do appreciate this word of caution about the use of open licenses that I have found on Open Context’s website:

“Open Context publishes open data, free of access and reuse restrictions. While open access and open licensing of research data are powerful tools for encouraging better and more collaborative scientific practice, they are not universally appropriate.”

Can you give us some concrete examples of cases where using an open license was deemed inappropriate?

Absolutely. I think many people appreciate how the ethics of open data depend on context and on power dynamics. I think many would agree that we can promote better social outcomes and greater justice by demanding more openness and transparency to hold the powerful accountable, while at the same time, privacy can help safeguard people that need protection from power. So, I don’t see a contradiction between promoting openness and privacy protections in different contexts. This helps shape our thinking about intellectual property policies with Open Context. Colonialism has a powerful and painful historical legacy impacting many people in many areas of the world, especially the Americas. We want to make sure open data is applied thoughtfully so that it does not reinforce or perpetuate colonialism. Intellectual property laws and norms all vary cross-culturally, and so do expectations about privacy. So it’s important to work collaboratively, using approaches like „community archaeology“, to understand what can and should be shared appropriately in a venue like Open Context. Fundamentally, we think respecting people and building genuine partnerships across communities are much more important goals than simply adding more gigabytes of Creative Commons licensed content.

What would be your advice to researchers who are planning a new project and aim to archive and publish the project’s data as open data? What can they do, for example, to be later on in the position of being able to openly share their data?

Good question. We’re actually actively researching this right now with the „The Secret Life of Data“ (SLO-data) project. This research involves some workplace ethnography to document and study how archaeologists create data and how these processes impact later archiving and reuse. In our recent publication in the journal Advances in Archaeological Practice with Ixchel Faniel we share some specific advice: Faniel, I., Austin, A., Kansa, E., Kansa, S., France, P., Jacobs, J., Boytner R., Yakel, E. (2018). Beyond the Archive: Bridging Data Creation and Reuse in Archaeology. Advances in Archaeological Practice, 6 (2), 105–116. DOI:

But in general terms, researchers should consider:

  • Data validation (ways to make data entry / creation less error prone)
  • Set clear expectations about attribution, credit
  • Set clear expectations that project data will be shared (and archived)
  • Set expectations for how to merge different datasets together. We see lots of problems merging specialist data (zooarchaeological, lithics, etc.) with other data because of inconsistencies in identifiers.

What do you see as the main obstacles or resistances to a wider-spread adoption of an “open data” culture in the current academic as well as political climate?

I don’t see much overt hostility or resistance to data sharing any more. I mainly see that researchers are typically over-worked and burdened by many conflicting demands on their attention. While fewer researchers want to „hoard“ data, many researchers feel like they don’t have enough time to focus on preparing data for publication or meaningful archiving. Most of the expectations and rewards are still focused on conventional publishing, and even there, researchers often feel like they can’t keep up.

So, we face important structural challenges. We need to make scholarly life less stressful and less of a frantic rat-race. Data curation can and should be an integral aspect to careful and rigorous research. Researchers need time to allocate to thoughtful engagement with data. Services like Open Context or specialist services in libraries can really help. As policy matter we have to realize that data work is work, and we need to provide the funding, time, and other supports needed to make data a more meaningful aspect of knowledge creation.



