Making the Resources Fit Together. Interconnection of Diverse Archaeological Document Collections

Øyvind Eide, The Museum Project, University of Oslo, Norway. Email: oyvind.eide@muspro.uio.no
Jon Holmen, The Museum Project, University of Oslo, Norway. Email: jon.holmen@muspro.uio.no
Anne Birgitte Høy-Petersen, Norwegian University of Science and Technology, Museum of Natural History and Archaeology, Norway. Email: gitte.hoy@vm.ntnu.no

Background

The Museum Project has been digitizing archaeological material since the early 1990s. An important task of this work has been the interconnection between the various archival documents, books and images. This paper describes the work we have done in order to connect information from these resources, including the problems we encountered due to the fact that the data from the various collections often do not fit together, at least not without extensive human interaction.

In order to discuss this topic, some of the central archives of the Norwegian archaeological collections must be described briefly. In Norway, many of the administrative tasks of four of the five archaeological regions have traditionally been handled by the university museums. This has resulted in large archives containing everything ranging from excavation reports to police reports, other sites and monuments information in various forms, including printed books, millions of artifacts and their printed catalogues as well as large image collection.

These collections have for the last 15 years been in a process of digitization. Much of this task was delegated by the university museums to the Museum Project and its forerunner. The result of this work is the following digital collections:

Traditionally, an important part of archaeological work has been to locate objects in paper-based collections and describe and interpret the relations between such objects. Today, with the widespread use of databases people sometimes assume that such relations are created by the computer. That, needless to say, is impossible. There are some relations that can be created relatively easily, some that require a great deal of work, and some that are impossible to create.

The Egge/Hegge case, and beyond: Sites and monuments vs. archives

As part of the Museum Project's work related to the EU funded ARENA project, we have worked on the interconnection of material such as archive documents, collection object records and images concerning archaeological sites on the farms called Egge and Hegge in the central part of Norway. This has resulted in an integrated web system including a map based interface, as reported on at the CAA in Prato last April (Eide forthcoming). In the map interface the user can navigate to various monuments in the area where information from the various sources can be found.

In order to build this system, an archaeologist at the museum with responsibility for the area, The Museum of Natural History and Archaeology at the Norwegian University of Science and Technology, with thorough knowledge of the sources, had to make the connections between the locations on the map and the various sources. While this may look easy at first glance, there were numerous problems in the process of the work making it quite a time-consuming project.

A general problem with respect to the sites is the lack of a tradition for mapping the sites. It has been considered sufficient to know which particular farm the sites were located on, but when 40 or 50 monuments have been found at the same farm, things become complex. To make it even more difficult, the description of a monument could be "150 yards from the north corner of the pigsty". Today the pigsty no longer exists, a fact that makes the identification difficult because the history of the farm has to be tracked down in order to find out where the pigsty was at the time of the find.

The lack of a tradition for the mapping of sites may be related to the fact that Norway is a large country where travel is difficult in many areas, and consequently the mapping of the country has been a longwinded process. Many areas were not officially mapped when the earlier finds were made; in fact many desolated areas are not even mapped in detail today.

Compared to other sites, the farm Egge is well supplied with sketched maps. The problem is that the sketches are not linked to a defined geography and none of the older sketchers tried to compare their mapping with those of their predecessors. A systematic comparison was only initiated in 1971 by Anne Stalsberg and concluded in the early 1990s by Ingrid Smedstad. They both transferred features from the old sketched maps to modern maps, but this was only done for the farm Egge, whose fields are protected today.

For the farm Hegge we only have the old sketched maps and the descriptions made by Schøning (1770s) and Klüwer (1810s). In the fields of the farm there were more than 30 monuments, but the development of the nearby town of Steinkjer has been fast and severe. This shows how important it is to document before it is too late. Today, there are only a few monuments left, and we have no chance of identifying monuments that existed only 100 years ago.

A more technical archive problem has been to connect the right archive documents to the correct locations. In the document database as it is today there are no links between documents concerning the same monument at different times. The links are at the farm level or sometimes at the case level. If each find had a map reference number, it would have been quite easy to link the relevant documents to the individual find.

It is very tempting to add a cheap point: The official SMR identification numbers have been changed twice during the last ten years. While this is no major obstacle, as tables connecting the systems do exist, it certainly does not make our work any easier.

    Figure 1

So, in most cases, we know the location of an area containing archaeological sites, but it is much more difficult to know which of the several sites are referred to in documents. As illustrated in figure 1, smaller geographical areas are more difficult to link.

Museum catalogues

In her seminal 1997 paper entitled "Linking Text and Image Databases in GENREG: A Multi-media Museum Management System at the National Museum of Denmark", Lene Rold describes the problems involved in creating a database based on the card-based inventory catalogues of the National Museum of Denmark, especially outlining the problems involved in linking images to the records. While it is quite easy to link a set of images to an inventory number, it is problematic to make more fine-grained matches. She discusses a proposed system where it is possible to link images to parts of the objects found at an inventory number and states in conclusion:

We certainly have no problem in programming such a solution, but we have to be aware that as the system becomes more sophisticated in dealing with various documentation problems, it also demands more specialist knowledge on the user's part. It is not evident that a photographer can distinguish between two different motives on an altarpiece. [...] So we are actually faced with a problem which cannot be solved by curational and documentary logic, but has to be considered from an organizational point of view - e.g. who are actually in charge of the documentation, and are these people the same who actually enter the information in the electronic system.

We have seen similar problems in our museum catalogue work in Norway. If an inventory number in the acquisition catalogues states that the objects in a particular container includes two parts from a sword and three parts from a dagger, whereas another description of the same inventory number made 50 years later states that the objects in the very same container includes three parts from a sword and one part each from two daggers, how do we know which is which? As illustrated in figure 2, it is quite easy to connect at the main catalogue number level, but more difficult at the artifact level.

    Figure 2

Solutions?

What we have seen in the examples above, is that it is easier to link together at a higher, more general level than at a more detailed level. There are two ways to live with, or even come around, this problem with respect to old material:

  1. To link at such a high level that it is possible to do it correctly and automatically, i.e. at the level where we have unique identifiers. Examples: The main acquisition catalogue number with respect to artifacts, or the farm with respect to sites and monuments.
  2. To link manually at a lower level for selected materials. This is very time-consuming and requires evaluation by well qualified archaeologists. Example: We did so for Egge when making the links necessary for the web system (Eide forthcoming).

    Figure 3

In the future, it is important not to create new "black boxes" (as illustrated in figure 3) when changes are made in a database, if the information we need in order to avoid such "information gaps" is present at the time of registration. At the Museum Project, we are currently implementing a system where the user can specify exactly which objects are affected by a change in the system, aiming towards the situation shown in figure 4. In this work, we use an event oriented model based on the CIDOC CRM standard. But, of course, the existence of a system is not enough - the methods used by archaeologists in their documentation work must be changed as well. And, of course, when building up databases we have to respect that sometimes we just do not know. It is better to get uncertain answer, such as "the monument may be the one you are looking for", rather than wrong answers.

    Figure 4

Conclusion

So, we need identifiers. What those identifiers should be, or what level they should be connected to, will vary depending on the collections in question. For objects in collections, Lene Rold suggested the use of images, which may be a solution. As for monuments, geographical coordinates seem like a good solution in most cases. Even if such methods will not solve the problems in all cases, it will help to reduce them.

Postscript

At the early meetings of the ARENA project, ways of connecting digital sites and monument data from the six partners were discussed. The solution is described elsewhere in this edition: The hundreds and thousands of key words used in order to describe the objects in the national catalogues had to be grouped into 18 top level categories, as shown in figure 5, and translations of these 18 were made for the six languages in question.

    Figure 5

In other words, in order to be able to link the information, we had to use broader categories. This is quite similar to the first solution described above. This is also common for many problem fields, in the humanities and beyond. We have seen similar problems related to work on geographical information (i.e. changes in borders between states in modern times are well documented, but this is not the case for changes in borders between municipalities as well as borders between farms) and collections of botanical samples, just to mention two examples.

References

(CIDOK-CRM 2004) Definition of the CIDOC Conceptual Reference Model. Version 4.0. / Nick Crofts, Martin Doerr, Tony Gill, Stephen Stead, Matthew Stiff (eds.) Produced by the ICOM/CIDOC Documentation Standards Group, continued by the CIDOC CRM Special Interest Group. April 2004.

(Eide forthcoming) Eide, Øyvind, Jon Holmen og Anne Birgitte Høy-Petersen: "Between the Book and the Exhibition : Creating Archaeological Presentations Based on Database Information." Proceedings of the CAA 2004 conference, Prato, Italy.

(Rold 1997) Rold, Lene: "Linking Text and Image Databases in GENREG: A Multi-media Museum Management System at the National Museum of Denmark". Museum Interactive Multimedia 1997: Culture Heritage Systems. Design and Interfaces. Selected Papers from ICHIM 97. Pittsburgh. 1997.