Today on the blog we’re tackling one of our most frequently asked questions: “Why don’t you digitize everything?” and its related runner-up, “When will you be putting all your records on the web?”
As archivists we like these questions because they tell us that people are eager for access to archival records. They also show that people realize that not everything is digitized. Indeed only a tiny fraction of the world’s primary resources are available digitally. This doesn’t mean that undigitized records are inaccessible or not worth consulting, but you will need to visit us archivists to use them.
In fact, archivists and librarians themselves are behind the abundance of primary sources already available on the internet. From rare books to official records and from diaries to sound recordings, digitized resources have spread the word (literally) that the past informs our present and our future. In the meantime, both non-profit and commercial organizations whose main mission includes digitizing material (like the Internet Archive, or Ancestry.com) have raised public expectations about access to historical resources.
In this post we’ll share some of the behind-the-scenes realities of digitizing and uploading rare materials. We hope this boosts awareness about some important facets of document digitization and sharing. One is the vast army of largely anonymous labourers out there whose work makes these valuable resources available. Another is the existence of the original records behind the images, which archivists continue to steward.
We also hope that people who are informed about digitization will advocate for archives in the opportunities and challenges they face.
But first, a basic question.
Why do archivists digitize records?
It’s important to understand what digitization can and can’t do. A common assumption is that digitization preserves analogue (non-digital) archival records. In some cases – say, when the record is in imminent danger of becoming unusable – this is true, in a way. Think about a paper map disintegrating into fragments, a letter faded almost to illegibility, or a cassette tape turning brittle and unplayable. In such cases digitization – the production of an electronic image of these records – saves information gleaned from the record. But it doesn’t produce a clone of the record (more on this later). At best it results in a digital “surrogate,” an approximation (even if a very good one) of a dimension of the record.
Archivists commonly digitize records to facilitate access. Easily copied electronic files help people consult records at a distance in multiple locations. Of course, consulting digital files instead of originals also aids preservation by sparing originals from repeated physical handling – a vital function that was once (and still is) served by microfilming records.
What’s special about digitization in archives?
Archivists will often say that mass digitization in particular is costly, both in money and time (which is also money). Sometimes people are skeptical about this. After all, it’s so easy to take a picture of your high school yearbook and share it on Facebook, or to throw some old postcards on a scanner and upload them to a blog.
Below is an overview of some of the factors archivists deal with in digitizing records. Here we’ll concentrate on two-dimensional archival records like paper documents and photographs. Some of these challenges relate to the complexity of the material itself; others are due to the digitization process. All show that large-scale digitization in an institutional setting is not your average home scanning operation. And the challenges for analogue media like old sound recordings or film are even more acute (for one thing, it’s getting increasingly difficult to find equipment that will play old media).
Dealing with volume
As you read on, keep in mind the vast amount of material held by archives. Even a modestly-sized archival institution measures its holdings in kilometres of shelf space. The boxes on these shelves can variously hold between 700 to 1800 individual pieces of paper and even more photographs, negatives, and slides. Digitizing even a small fonds (a type of archival collection) is a big commitment.
Dealing with dimension
Many archival record groups are not easy to scan quickly. The fastest way of scanning a stack of pages is with an automatic feeder; but feeders only work with same-sized pages in good condition. Even then, the benefits of speed have to be weighed against the risk of a one-of-a-kind document being mangled by a paper jam.
For unique or fragile records (and most archival records fall under one of these headings), manual scanning is one of the only responsible options. For each scanned item, there can be dozens of associated tasks, from removing staples and positioning the item, to processing images and entering metadata (see below). That adds up to a lot of work: scanning a single archival box of records can take days.
If the records in a file are various sizes and shapes, constant readjustments to scanning parameters add even more time. If items are really large, they may have to be scanned in sections and digitally stitched together.
Sometimes the best option is to take a photograph instead, which then necessitates a high-quality photographic set-up including lighting, document holders, and a camera with an appropriate lens. Items that are torn, wrinkled, thick, or reflective will also require skilled handling and digital manipulation.
Capturing physical evidence
We mentioned above that scanning doesn’t produce an exact copy of a record but only an impression of certain aspects of it. As archives researchers often find out, records speak to us in ways that go beyond their original intended use.
For example, annotations accumulated, say, in the margins of reports or on the backs of photographs often provide essential or at least illuminating information. It’s important to think about whether (and how) these additions should be digitally captured. Physical characteristics like thickness and type of paper, marks of wear, and enclosures also can be “read” for evidence of the past, but are even harder to convey in a digital file.
And this brings us to the most under-recognized aspect of digitally capturing an archival record: linking its digital image to crucial information which tells us what it is. We call this information “metadata” (data about the data). Some metadata is technical information about the digital capture. Other metadata is part of the archival description of the record itself.
Some archival metadata is short and sweet such as the date a record was created. Other information is more complex, such as the story of the person or organization that created it. Most complex of all is a description of the place the records occupy in nested groups of records.
Archivist Laura Millar gives us a great example of how context can turn the most mundane slip of paper into an important source of insight. She points out that a single sticky note scrawled with the words “Meet Joe” tells us so little that by itself, divorced from any other context, it would have little value. But if the sticky note is attached to a page in the day timer of Barack Obama’s secretary a few weeks prior to Obama’s announcement of Joe Biden as his running mate, the same simple note reveals an important moment in the history of the US government. [i] (Incidentally, sticky notes are a digitization nightmare.)
An individual record within an archival collection does not tell us its whole story. And here lies the part of digitization that many people never see: the prior work of organizing and documenting collections to make them intelligible and searchable in the first place. Without this vital descriptive work the electronic files produced by digitization would be little more than an undifferentiated and unusable mass of thousands of files.
For other examples of the importance of context and how archivists describe it, check out our posts How do archivists organize records? and How do archivists describe records?
Because digitization involves an investment of time and resources, we need to make sure we get it right – that the electronic files we produce are adequately representing the archival originals. That means our process will need to incorporate quality control checks.
Quality results depend on a host of factors from scanning resolutions to photographic skill to typing accuracy. Quality isn’t just a matter of aesthetics. As archivists, we’re responsible for making sure that people are getting a reliable and authentic view of records. Important decisions may (and often do) rely on the information found in them.
Maintaining digital files
It’s tempting to think of digitization as a fix-and-forget proposition: that once information is captured digitally, it’s automatically pinned down for the long-term. It isn’t. And this means digitization presents archivists with a new set of files to maintain.
Because stored digital files are in a way intangible – huge amounts can fit on tiny thumb drives and multiple identical copies can be piped through electrical wires – it’s easy to think of them as non-physical and incorruptible. In fact, digital files are physical states of physical things, and they’re subject to decay and disorder just like their analogue counterparts. Digital data fundamentally exists as millions of minute magnetic or electric charges. A tiny shift on a subatomic level can cause a cascade of errors. Even data just sitting unused on a drive is subject to “bit rot,” random degradation over time.
Besides the problem of data degradation, archivists also have to think about the future readability of current file formats. There’s no point in investing a lot of time in digitizing a body of material if no one will be able to open the files as software and hardware inevitably becomes obsolete.
Archivists themselves are on the forefront of pushing the boundaries of digital longevity. Technologies that can neutralize errors are improving; agreed-on standards for file formats are being developed. And refreshing, migrating, and copying digital data can help protect it. Still, the average lifespan of a hard or flash drive is still a fraction of that of a piece of paper stored in optimal conditions (and incidentally, digital media have temperature and humidity requirements too).
So when archivists digitize anything they commit to maintaining that file as well as the original on which it’s based. This labour needs to be factored in to decisions. (For all the reasons outlined above, archivists seldom use digitization alone as a reason for disposing of originals.)
Digitization depends on significant quantities of technical equipment and human labour.
High-resolution scanners and cameras that can adequately capture large materials or negative images are very expensive. Image processing software can also be costly, as can adequate secure digital storage.
To make a dent in an average archival collection, a scanner (or several of them) needs to be working every day, all day, sometimes for months – and often that scanner is also needed for everyday operations. Some large archives maintain digitization units staffed by specialists. Archivists at smaller institutions fit digitization in where they can amid their other duties. This is why digitization of record groups is often conducted as discrete projects funded by grants or partnerships.
In this post we’re aware that we’re representing archivists in general, so we feel it’s also appropriate to point out a reality about staffing in the 21stcentury. Around the world today the number of staff in many archival institutions is often no greater (and is sometimes smaller) than in the pre-digital era. This means archivists need to carefully manage their limited resources while taking care of perennial core tasks like accessioning and processing records, and helping researchers. And as populations increase, the number of incoming records continues to increase exponentially.
Even after a set of records is digitized, responsibly sharing them on the web also calls for a process and resources.
First, archivists have to make sure that they are free to share the records in the first place. Some donors of archival records don’t want them to be available for a certain period of time; other records (such as those the government keeps about you) are kept private for legislated periods; and sensitive information about still-living people might be tucked away in personal papers. Copyright (ownership of the intellectual property in the records) may also prohibit widespread sharing.
We’ve already seen that the full meaning of records isn’t necessarily available in the image of those records. It has to be provided by archivists and linked to that image. That information will also need to be conveyed to users on the internet, so the archivist will have to arrange for software or online platforms that can manage this function, and for internet servers to mediate it.
Digitization as a process
Given the factors we’ve outlined, it’s no wonder that archivists approach digitization projects methodically. Rather than running into unexpected problems with any of these challenges (from removing paper clips to clearing copyright), we usually assess an archival collection beforehand to see whether it’s a good candidate for digitization and sharing. Of course this process also takes time, so even if we can mobilize an inexpensive pool of labour, digitization is still a big investment of time and resources.
We hope this overview helps explain why digitization within archival institutions proceeds the way it does – and why we may never, in fact, digitize everything. The triumphs and trials of digitization are themselves a constantly unfolding process. New models are being explored and old ones reconsidered. Nevertheless, access is important to archivists, so digitization is too. You can be sure that archivists in institutions large and small will continue to grapple with this immensely powerful way to broadcast the knowledge we steward.