Terra Foundation for American Art Digitization Project
- Project Background
- Scanning Collections
- EAD Finding Aids
- Digital Collections Database and Collections Online
- List of Collections
Project Background
In February 2005, the Archives of American Art received an award of $3.6 million to dramatically increase the accessibility of its resources on the web. This support is funding a comprehensive, six-year program to digitize and make available on the Archives’ website a substantial cross-section of the Archives’ most important collections, including the papers of a highly diverse range of artists and arts-related figures from the eighteenth century to today. At the end of the program, an estimated 1.2 million digital files will be available to the public.
The Archives has been scanning selected items from collections for years. Each individual item is cataloged and entered into our Digital Collections Database and can be accessed using the Search Images interface. Collections Online, however, does not follow this approach. Instead, entire collections are digitized with equipment designed specifically for increased levels of production.
In addition, Collections Online is different because it provides access to the digitized documents through folder level access instead of item level access. All descriptive metadata is derived from the XML (Extensible Mark-Up Language) EAD (Encoded Archival Description) tagged data in the collection’s finding aid as the metadata structure from which the digital image files are linked and presented online.
Scanning Collections
In August 2005, the Archives began scanning entire collections using a planetary scanner, the Zeutschel Omni Scan 10000A1. The equipment is capable of scanning in black and white, grayscale, and color in ranges from 25 - 800 dpi. The Archives’s default setting is grayscale mode at 300 dpi.
Grayscale mode was selected because it captures and displays the wide variety of tones found in older manuscripts and the nuances of handwritten documents. This format often suppresses the typical bleed-through of handwritten documents on older and thinner papers. The Archives uses the color mode to scan sketches, vintage photographs, rare publications, and illustrated letters. In July 2007, the Archives purchased a second Zeutschel scanner, designed as a “tabletop” model 10000TT, and now has two digital imaging technicians working full-time.
Because the collections consist of historical documents, many of the original items are discolored, faded, stained, or fragile because of age and past handling. Their corresponding digital images depict the same conditions as the original documents; no attempt to digitally enhance documents has been made.
When documents are scanned from the collection with the new Zeutschel equipment, the scanning technician saves the digital files according to a file structure that matches the collection’s naming code, and box and folder numbers—essentially the equivalent to the finding aid container listing. Master uncompressed TIFF format files are archived in an offline Digital Asset Preservation System. Low resolution derivative JPEG format images are automatically generated for web presentation in three sizes: thumbnail, large (400 pixels) and full view (1000 pixels). An Archives of American Art watermark is automatically integrated to the full size image.
EAD (Encoded Archival Description) Finding Aids
Each collection selected for scanning is first processed according to current archival standards, and an EAD finding aid to the collection is created by the Project Archivist. The Terra Foundation grant supports three full-time and one part-time Project Archivists.
All of the Archives’s EAD finding aids are encoded in XML by the archivists using the text editor Note Tab Pro. The finding aids contain the typical EAD tags for descriptive biographical or historical notes, scope and content notes, and narrative series descriptions. Detailed container listings with numbered box and folder headings are also included. It is this box and folder listing that forms the file structure for the scanning technician to save the digital files, as well as the primary descriptive metadata for discovery of the digital files.
Digital Collections Database and Collections Online
Using ColdFusion programming, each EAD XML file is passed through a parser that transforms the XML EAD data into an EAD Document Object, which is then transformed into a Finding Aid Record in a SQL Server Digital Collections Database. The Finding Aid Record in the database contains all of the EAD descriptive and component information, such as series, sub-series, folder headings, box numbers, and folder numbers.
In addition, the same Digital Collections Database holds the digital files from the scans as Digital Resource Records. Storing all of the EAD XML data and the digital files in one relational database allows for flexible output of the stored data for many different resources. It also allows the data to be linked with the other records or resources in the database.
Again, using ColdFusion programming, the Finding Aid Record and the Digital Resource Record stored in the database are then dumped into the Collections Online template and interface. The resulting web presentation allows users to view and navigate the digital files within their archival context and hierarchy.
For further information about this project, contact:
Karen Weiss, Project Director, weissk@si.edu
Barbara Aikens, Chief of Collections Processing, aikensb@si.edu
List of Collections included in the Terra Project
The following is a list of the collections that are included in the Archives’ six-year Terra Foundation for American Art Digitization Grant, which began in July 2005. Each will be processed and described by an archivist and descriptions for all collections will be posted online as a Finding Aid. When completed, over 100 collections will be digitized in their entirety either from existing microfilm or from originals, and made accessible via Collections Online.
- Collections which have already been fully digitized are in bold
- Collections which are scheduled to be digitized have an * after the name