When we use information, we need to understand what we’re looking at. We do this by framing that information — sharing new details about what it is and how we can use it. For museum collections that connect data points across centuries of artworks and objects, institutions are turning to new tools to share and communicate that data. Here, we can look at four institutions using GitHub as a platform to share collections data — the Metropolitan Museum of Art, Museum of Modern Art (MoMA), Cooper Hewitt Smithsonian, Design Museum, and the Tate collection — as an opportunity to parse current practice in this area.
GitHub is a platform for sharing and collaborating on code repositories. In a GitHub repository, the README functions as an overview of the repository and its contents. In the museum context, the README may act as a guide for how institutions have chosen to share their collections data. In identifying what information is commonly included in the README, we can map commonalities in which elements institutions have selected to frame and contextualize their collections data.
The Metropolitan Museum of Art Open Access
openaccess – The Metropolitan Museum of Art’s Open Access Initiativegithub.com
The Metropolitan Museum of Art Open Access repository has two major sections which include an untitled introductory section and a second section, Additional usage guidelines.
The introductory segment begins with context, bringing the reader into contact with the scope and history of the institution. Here, the README outlines the scope of data available and the terms that guide its use — referencing and linking to the Creative Commons CC0 license applied to that data, as well as pointing to a space of the repository dedicated to this information, titled LICENSING. This section is closed by addressing available data formats and encoding standards.
The second segment titled Additional usage guidelines includes five subheadings connected to dedicated topics in this scope. The first, titled Images not included, adds what is not included in the repository. The second, Documentation in progress addresses the status of data and current work. The third, titled Pull requests indicates that this repository does not accept changes to data housed on GitHub (known as a pull request), and how to contact the institution with proposed changes. Fourth, titled Attributionoutlines preferred methods for data citation. The last section titled Do not misrepresent the dataset contains details for data use related to remix, reuse, and institutional trademarks and endorsement.
This README closes with a thank you, “The writers of these guidelines thank the The Museum of Modern Art, Tate, Cooper-Hewitt, and Europeana.”.
The Museum of Modern Art (MoMA) Collection
The Museum of Modern Art (MoMA) collection datagithub.com
Mirroring our last example, the Museum of Modern Art (MoMA) collection README is based on two primary sections — an untitled introductionfollowed by Additional usage guidelines.
This introduction meets many points — from the contextual to the technical. First, outlining the history and breadth of the collection from the analog to the digital — moving on to describe how this collection is represented in the datasets shown in this repository. This includes addressing the scope and size of the dataset, the formats available, and the Creative Commons CC0 license applied to that data as linked within the README text. Last, this section closes with a new data point — user stories centered around data usage.
Additional usage guidelines outline the terms for using collections data through five subheadings. The first, Images not included, addresses what the dataset is missing and guidance on licensing that missing content. Second, Research in progress, outlines the current status of the dataset and offers advisory information for its use. The third section, Pull requests, outlines restrictions on, and the process for, accepting changes to the dataset. The fourth, Give attribution to MoMA, is an ask for attribution to the user, but also touches on the how and why of data attribution — including a Digital Object Identifier (DOI) for easily referencing and citing the dataset. The last fifth subheading, Do not misrepresent the dataset, outlines restrictions on data usage including terms for representation and reuse. To conclude the README, the Metropolitan Museum of Art (MoMA) include a short message, “The writers of these guidelines thank the Tate, Cooper-Hewitt, and Europeana.”.
Cooper Hewitt, Smithsonian Design Museum
collection – Collection Data for Cooper Hewitt, Smithsonian Design Museumgithub.com
The Cooper Hewitt, Smithsonian Design Museum README is made up of seven sections. The first section, Collection, functions as an introduction to the README by highlighting their digital collections and collections data.
The second section, Instructions, directs the user to a wiki for using the dataset.
The third, Usage Guidelines, identifies a license for the dataset (CC0) in addition to added guidelines which include: “Give attribution to Cooper Hewitt, Smithsonian Design Museum”, “Contribute back any modifications or improvements”, “Do not mislead others or misrepresent the Metadata or its sources”,“Be responsible”, and “Understand that they use the data at their own risk.”. This section closes by indicating what is not included in the dataset, directing the user again to the associated wiki for more information.
The fourth section, Collections items as JSON files, provides a brief scope for data available in this format.
The fifth, Objects as JSON files (in Git(Hub)), acts as a disclaimer surrounding the functionality of GitHub and large datasets.
The sixth, See Anything Wrong?, outlines how users may inform the institution if they find a problem in the data.
The seventh and last section, Licensing, contains the legal code for the license applied to the dataset, Creative Commons CC0 1.0 Universal.
The Tate Collection
collection – Tate Collection metadatagithub.com
The structure of the Tate collection REAMDE is structured around four main sections — an untitled introduction, Examples, Repository Contents, and Usage guidelines for open data.
The introduction of the README is a snapshot of the collections data stored in this GitHub repository — outlining the scope of the dataset and providing a short summary of licensing terms under Creative Commons CC0.
The Examples section of the README contains “examples of Tate data usage in the wild”, which each represent how users are sourcing this collection data in their own projects — from data visualization to Twitter bots. This section includes the call to action, “Please submit a pull request with your creation added to this list” as a path for users to have their work based on Tate Gallery collections data included in this area — highlighting how people are using the dataset with tangible examples. Last, this section invites users to contribute to strengthening the dataset — creating a path for users to connect with their data and improve data quality.
The following section, Repository Contents, outlines the technical scope of the data with attention to formats — containing two subheadings which define scope for both JSON and CSV.
The last section, Usage guidelines for open data, houses a total of five subheadings. Introducing this larger section, Tate presents a CC0 license dedication and requests attribution by users. The first subheading, Give attribution to Tate, requests that users maintain the visibility of assigned licensing information and use terms. The second, Metadata is dynamic, is a disclaimer that the data shown is in continued flux as new information is gradually incorporated. The third, Mention your modifications of the Metadata and contribute your modified Metadata back, is a request to make evident any changes to the dataset in reuse and release the product under open terms — a request which shares qualities with the practice of “share alike”. The fourth, Be responsible, requests that users do not suggest institutional affiliation or endorsement. The fifth subsection, Ensure that you do not mislead others or misrepresent the Metadata or its sources, is disclaimer to the user to “Ensure that your use of the Metadata does not breach any national legislation based thereon, notably concerning (but not limited to) data protection, defamation or copyright”. To end this README, the Tate extends a thank you to supporting institutions, writing in conclusion, “The writers of these guidelines are deeply indebted to the Smithsonian Cooper-Hewitt, National Design Museum; and Europeana.”.
Shared Practice in Four Cases
In the above cases defined by the Metropolitan Museum of Art, Museum of Modern Art (MoMA), Cooper Hewitt, Smithsonian Design Museum, and the Tate collection, we can begin threading together shared practices in how institutions are using GitHub to share their collections data.
Of the four cases reviewed, the following themes are shown:
- 4/4 README examples contain usage guidelines.
- 4/4 README examples apply Creative Commons CC0 license to datasets.
- 3/4 README examples include link to license description maintained by Creative Commons.
- 4/4 README examples request attribution for data usage.
- 3/4 README examples extend gratitude to influential peer institutions including the Tate, Cooper Hewitt, Smithsonian Design Museum, and Europeana.
- 4/4 README examples include format information.
- 4/4 README examples include relevant disclaimer related to dataset contents.
- 4/4 README examples present institutional context for released datasets.
Of this group, we may also find practices — while not universal — bear value in the presentation of collections data and the promotion of its’ usage:
- Tate collection and MoMA: Examples of data use and reuse. Inclusion of examples of data reuse in Tate collection and MoMA READMEs highlight both the larger guiding practice of reuse in releasing openly licensed data and the value of their own open datasets.
- Museum of Modern Art (MoMA): Digital Object Identifiers (DOIs). In a distribution of 4/4 institutions requesting attribution in uses of their open collections data, the Museum of Modern Art (MoMA) incorporates the use of Digital Object Identifiers (DOIs) which help support data attribution and citation across disciplines.
On GitHub, the README may act as a guidebook to how data is shared. For museum institutions, these READMEs provide a key to how they contextualize and share their collections data on this platform. By reviewing active examples of GitHub READMEs maintained by Metropolitan Museum of Art, Museum of Modern Art (MoMA), Cooper Hewitt Smithsonian Design Museum, and the Tate collection, we can evaluate how museums are are choosing to frame their collections for use beyond institutional bounds.