Roderic Crooks

    Using Hierarchy to Describe Getty Provenance Data



    Archival Inventories and Their Records Treemap

    Archival Inventories and Their Records Treemap


    How do digital records relate to the analog documents from which they derive evidentiary and explanatory power? “Meta-Metadata” explores the relationship between data in the Getty Provenance Index ® Databases and the archival documents to which these electronic records relate. This proof-of-concept shows how the introduction of hierarchal relationships among the various fields that constitute the individual records in a database can be used to describe and visualize the global contents of that database. That is to say, by pinning down a heirerchical relationships between certain fields, we can build richer descriptions of that data as a collection. Hierarchy here provides a sort of frame that supports a more meaningful exploration. This project uses records from the Archival Inventories database and a few specific hierarchical relationships, but such an approach could be expanded to more fields, more relationships, and more records. This exploratory project addresses descriptions of the Getty’s extensive analog and digital holdings, with implications for future interface and database design.

    Two Definitions of Provenance

    Provenance in the context of the field of art history, curatorship, and museum informatics refers to evidence that establishes the chain of custody of an artifact, the “history of ownership of a valued object, such as a work of art” (Getty Research Institute). For researchers working to establish the historical ownership of works currently owned by museums and collectors, a relatively small corpus of highly valuable archival documents held by prestigious institutions supplies needed information. The Getty’s databases contain “1.5 million records taken from source material such as archival inventories, auction catalogs, and dealer stock books” in order to make such data searchable and accessible (ibid.). These records are “taken” for the database via a process of transcription of the source material, a translation that converts analog material into digital form. This collection grants greater accessibility to distant users, but produces a new problem: the disassociation of the archival document from its intellectual contents. Digital collections that represent analog objects via electronic facsimiles might promote a misleading elision of an analog thing and its electronic surrogate, a conceptual slip that prevents users from understanding the wider ground upon which a given records stands. In some sense, the legibility of a collection of digital records relies on such a substitution. But the conceptual space there, between an analog thing and its digital surrogate, can be useful. In this case, I want to use that difference as an occasion for further interpretive work with the data, interpretive work that involves altering, shaping,and editing these data in the hopes that these efforts will produce a compelling story about the distant objects to which they refer.

    One way to take advantage of this space of difference is the introduction of another conception of provenance as articulated by archival science, the discipline, theory, and professional practice of the creation and maintenance of records of enduring value. In archival science, provenance refers the intellectual, juridical, and social context of records taken as a collective body rather than as individual items. Gilliland (2001) describes provenance as encompassing “hierarchy in records and their descriptions” (p. 12). This definition includes the related terms respects des fonds and original order, terms that in contemporary archival science offer ways to understand how records, taken together, mean more than they do individually. As Craven (2008) writes,

    In a digital format the archival context is system-dependent, as it is through the system that the user begins to understand and analyze the archival value of the digital object. This need for interpretation and translation by the user is in direct conflict with the whole principle of provenance and archival description” ( p. 154).

    In short, this project proposes to use two definitions of provenance, the art historical and the archival, in concert: it deploys the latter via data visualization to promote the pursuit of the former. This visualization also seeks to return the data collected in these databases to their archival bodies, so to speak, to (re)associate individual records with the various documents that mediate their evidentiary power. Finally, this project seeks to remind viewers and users of the constructedness of data, of the “the situated, partial, and constitutive character of knowledge production” (Drucker 2011, p. 2).

    Description of Process

    This project illustrates how individual records in the database relate to one another and to the analog documents from which they are translated. Individual records within the Archival Inventories databases of the Getty Provenance Index ® Databases are divided into two views, accessible via two different interfaces. Archival Inventories describe documents, some held by the Getty, others housed in distant archives, including predictable metadata such as date, time, place, owner, and so forth.

    View of Archival inventory,  Getty Provenance Index.

    View of Archival inventory, Getty Provenance Index.

    This record contains many elements critical for the establishment of provenance in the art historical sense. However, owing to the current design of the database, images of the document itself or a listing of its contents are not currently available.

    From a separate portal, users may access a transcribed list of the individual items listed in this historical document. A link for “Inventory contents” leads to another view of records from the database:

    View of Contents Records, Getty Provenance Index

    View of Contents Records, Getty Provenance Index

    These 1,145 records constitute a transcription of the individual items listed in this inventory along with some details culled from the previous view of Inventory I-3961. Again, no images of the analog parent document can be accessed from this view.

    First, for convenience, I collected a narrow sampling of records. Over a series of four searches, I collected 200 years worth of archival inventories, collectively representing all archival inventories in the database from the years 1400 – 1600. The current interface requires downloading records serially to avoid a system failure. These dates were selected in a series of searches over 50 year intervals and combined into a single spreadsheet. In total, 3,865 individual contents records were collected for visualization.

    Next, missing data or blank fields were identified using Open Refine. Particularly problematic was the field that identified the city of origin of a particular archival inventory. In some cases, supplemental research had to be conducted within other parts of the database to determine the city and country that corresponded to a particular archival inventory. Archival Inventories descriptive data is quite rich, so in all case it was possible to determine the city of origin of a particular inventory. In many cases, documents were associated with a famous residence.

    Hierarchical Relationship

    This visualization then distills the more complex, complete record into a few hierarchical relationships. This hierarchy represents only one possible arrangement of entities and should be considered a suplement to the data and visualizations already produced: no attempt is made here to claim that all data within the database should be subjected to a strict hierarchy. Rather, this visualization merely explores what kinds of information about the records and their sources might be foregrounded productively. A general, root entity is established (“Global”), under which will be nested other fields in the following order: Global, City, Archival Inventory, and Contents Record. Global is parent to City; City is parent to Archival Inventory; and so forth.

    Global > City > Archival Inventory > Contents Record

    This visualization returns records to the archival documents that spawned them, here represented as a treemap. The treemap visualization builds on the imposed hierarchy  to indicate the realtive area (or magnitude or depth or any other quantified characteristic) of one entity relative to another and relative to a total. The hiererchical relationship imposed on the data could also be visualized by other formats such as a dendrogram, but for this proof of concept, I wanted to focus on a numerical representation of records. The region covered by each constituent rectangle representing an Archival Inventory gives a relative indication of how many individual Contents Records are contained within a given document. The colors of the regions that represent archival inventories indicate the country of origin of a given document, blue for the Netherlands and orange for Italy (although clearly any color can be used).

    Relative distribution of number of unique records per inventory document.

    Relative distribution of number of unique records per inventory document.

    Documents are indexed according to their Getty identifiers. Labels can be toggled on or off to indicate the city corresponding to a given record. This view, taken alongside a number of other visualizations to describe the databases, gives a sense of the relative distribution of individual records within archival inventories and, by extension, within their parent entities.

    Archival Inventories and Contents Records by City.

    Archival Inventories and Contents Records by City.

    This project shows that in the 200-year period here, the greatest density of records is clearly within Italy (3,692 records) and within Venice (1,872) and Rome (1, 631).

    This project shows that in the 200-year period here, the greatest density of records is clearly within Italy (3,692 records) and within Venice (1,872) and Rome (1, 631).

    What emerges from this experiment with provenance data is an application of Drucker’s (2011) caution that digital records are in fact capta, that they are taken (captured, recorded, annotated, and otherwise constructed) rather than given (produced by nature, observer-independent). This central theoretical insight can be used to enrich interface design. Similar visualizations could help researchers without expertise in these collections understand the coverage of the database better and might also remind users that the electronic records they are viewing relate to analog documents. Future research and exploration could involve connecting database views to documents and records via such visualizations, in effect creating a cue to remind users of the contextual properties of records.

    Works Cited

    Drucker, J. (2011). Humanities approaches to graphical display. Digital Humanities Quarterly, 5(1).

    Getty Research Institute. (n.d.). Collecting and provenance research. Retrieved from

    Gilliland-Swetland, A. J. (2000). Enduring paradigm, new opportunities: The value of the archival perspective in the digital environment. Council on Library and Information Resources.

    Johnson, A. (2008). Users, use and context: Supporting interaction between users and digital archives. In L. Craven (Ed.), What are archives?: Cultural and theoretical perspectives: A reader (pp. 145 – 164). Aldershot, England ; Burlington, VT: Ashgate.