Digitizing Entire Collections: Chapter 3, For Processing Archivists

Collections that are scheduled to be scanned in their entirety must be processed to a more detailed level than other collections, particularly the level of descriptive detail provided in the finding aid, especially on the folder level. The processing archivist is preparing the primary online access and navigation tool for the scanned collection via the arrangement of the collection and the EAD finding aid.

The processing archivist must also prepare a set of scanning instructions for the scanning technician, complete the Scanning Information Worksheet, meet with the scanning technician, and review and approve the Collection Online site for the collection. The overall goal of the Collections Online workflow is to repurpose traditional archival methodologies and EAD finding aids for large-scale digitization.

Page Contents

Deciding what not to scan

Creating and Filing Instructions for the Scanning Technician

Flagging Items for Scanning by Another Technician

Integrating Existing High-Resolution Scans

Removal Notices

Notation in EAD Finding Aids About Materials Scanned or Not Scanned

Arrangement and Description

Letter Collections and Other Small Collections

Oversized References in Folder Headings

Deciding what not to scan

If a collection is prioritized and scheduled for scanning, most of the material within the collection should be scanned. The decision not to scan materials should NOT be made on an item by item basis, but rather on the folder and/or series/subseries level. For the most part, it is simply easier to scan the materials than to flag and make notes for not scanning. However, not everything can or should be scanned in a collection.

As the processing archivist, you will use your archival appraisal skills to determine series and folders that should not be scanned due to minimal research value, privacy and ethical issues, inappropriate content or images, and copyright concerns.

If significant portions of a collection will not be scanned, this decision will have to be approved by the Chief of Collections Processing. In some cases, a decision not to scan significant portions may have been made before processing begins.

The processing archivist should look at the collection materials for potential copyright and trademark issues, as well as inappropriate or sensitive material. It is not the archivist’s job to determine who owns copyright, but simply to be aware of the issue and discuss it with the Chief of Collections Processing and Information Resources Manager if anything might be highly questionable–such as photographs stamped with the photographer’s name and a statement indicating the photograph can not be reproduced.

For most collections, the donor is also the creator and has signed over their literary rights to AAA for scholarship. However, it is the rare collection where the donor/creator is the sole copyright owner. Generally, AAA takes an overall high risk for copyright issues in its approach to support access and scholarly studies. Thus, we routinely digitize materials for which we do not own copyright and have not sought permission. Exceptions generally include photographs of works of art that are not annotated, the entirety of published materials, and photographs clearly stamped with restrictions on reproduction by the photographer.

AAA does not want to scan and make inappropriate materials available online, such as pornography and lascivious materials that are clearly not source materials for the artist or that lend nothing to the understanding of the life of the creator. However, this is often not always a clear judgment. Your archival appraisal skills should help you determine which of these types of materials support an understanding of the creator’s life or work and should be scanned, and which of those may support a lesser understanding of the creator’s life and not scanned, but merely noted in the finding aid. Remember, you are not disposing of the material if it is not scanned; it is fully accessible to researchers via the finding aid. You should consult with your supervisor or another member of the “Terra Team” when these issues arise. Also remember that we are not trying to censor our materials or access to our materials. However, we are a federal institution that is accountable to the public.

Be aware of privacy legal issues. Generally avoid having mental or other health records, personnel files, personal tax records, or banking records scanned. Do not have documents that include payroll or social security numbers digitized.

Large volumes of routine non-archival printed materials, such as clippings, auction catalogs, exhibition catalogs, etc. might be passed over for scanning or selectively scanned. Do not have entire published books digitized. Rather, you may flag the cover, title page, and relevant pages within for scanning.

DO have rare printed materials scanned, such as rare unpublished catalogs and announcements, printed manuscripts, etc.

In summary, most often you will review the following and possibly flag for not scanning.

  • Duplicates.
  • Large quantities of news clippings and magazine clippings, entire newspapers.
  • Photographs of works of art that are not annotated with research notes.
  • Contemporary published books and contemporary exhibition catalogs (those that may be readily available in a library). Most likely scan the cover, title page, any relevant pages within.
  • Pornography and lascivious materials.
  • Some nude photographs, including ALL nude photographs of children and nude photographs of any person that has not signed the deed of gift on file granting AAA literary rights. Review any nude photographs of the creator to ensure that the image is not pornographic or lascivious. Consult with supervisor if you have questions.
  • Research files composed primarily of photocopies, including photocopies of original material from other repositories.
  • Multiple versions of edited manuscripts, unless these versions are highly annotated or significantly different from the final manuscript version.
  • Sensitive financial, personal, and legal materials, such as tax returns, banking records, personnel records, medical records, and any records with social security numbers–particularly if the information is relatively current or belongs to someone other than the person who signed the deed.
  • Routine materials, such as equipment manuals and utility bills. Review with supervisor whether these may be disposed of.
  • Slides, negatives, and transparencies. If these are significant enough to scan, please talk to the digital assets manager for special arrangements
  • Materials that are too large or bulky to be scanned.
  • Materials that are too fragile to be scanned.
  • Materials having little or no research value. Review with supervisor whether these may be disposed of.

Back to Top

Creating and Filing Instructions for the Scanning Technician

It is the processing archivist’s responsibility to prepare the collection for scanning by writing very clear and concise instructions and flags for the scanning technician, at the folder and item level, and filing these notes within the collection. Instructions and notes filed within the collection should be created on brightly colored paper, such as pink or yellow or lime green. They do not have to be typed, just legible.

AAA increasingly uses vendors for scanning. It is critical that all scanning notes and instructions be very specific and clearly written.

Once the collection has been scanned, review the contents and remove the instructions if the scanning technician did not remove.

  • Clearly flag materials that are not to be scanned, including duplicates. Don’t make the scanning technician identify duplicates–this is your job. If there is a large amount of material, mark the beginning and end clearly. In some cases, it is helpful to wrap a piece of paper around the folders or items not to be scanned.
  • Write clear instructions about what pages to scan in publications, such as “Scan cover, title page, and pages 15-18. If no written directions are included, the scanning technician will most likely scan the entire publication.
  • It is helpful to flag blank pages in bound volumes, so that the scanning technician does not have to go through each page looking for text or writing. This is especially critical for outside vendors.
  • Flag particularly fragile documents, noting either to handle with care or not to scan.
  • Write clear instructions and that pinpoint where an oversized item/s is located (container #) and file in the folder. Clearly mark the oversized material to be scanned in the oversized container, with a reference to the specific box and folder from which it was separated and should be integrated. See instructions below for referencing oversized materials in the EAD finding aid and AAA’s Guidelines for Creating Finding Aids.
  • Notify the digital assets manager that the collection is ready to be scanned and the file directory can be uploaded. Let her know the physical location of the collection–all pieces.
  • Before turning the collection over to be scanned, fill out the Scanning Information Worksheet and meet with the scanning technician and digital assets manager to discuss any issues and/or concerns.
  • After scanning is complete, the digital assets manager will notify you when the images are deployed to the internal workflow, so that you can complete the final online review and checklist worksheet. (See Section VI. Reviewing and Finalizing Site)

Back to Top

Flagging Items for Scanning by another Technician

Clearly flag items that need to be scanned by Marv and note the respective folders and boxes on the Scanning Information Worksheet. Typically, this includes significant negatives for which there are no corresponding prints, transparencies, and slides. List the box and folder numbers on the Scanning Worksheet.

When the items have been clearly flagged, the scanning technician will simply skip this material, keeping the file structure intact. Once the collection has been completely scanned for CO, the scanning technician or digital assets manager will turn the relevant box/es and the portable drive over to Marv. Marv will scan the designated items and “drop” the images into the file structure.

Back to Top

Integrating Existing High-resolution Scans

Most often, the scanning technician will just scan the entire collection, including any single items scanned at an earlier time by Marv. This is fine, but there may be that rare item that you think should be inserted. Most often, this would only be appropriate for particularly rare or fragile items that should not be handled again, items that were oversized and have already been “stitched” together, and items that are out on loan or exhibition when the collection is being scanned. You can check the DCD for any existing high-res scans that might be inserted. Clearly flag the folder and item with an instruction to the scanning technician that an existing scan from the DCD should be inserted and not to scan. Also, make a note on the Scanning Worksheet.


Back to Top

Removal Notices

When you encounter a current removal notice with an attached photocopy, write an instruction for the scanning technician to scan the front of the removal notice. The images generated for this item(s) will be inserted prior to image processing.

For missing items with old Removal Notices, make sure that you instruct the scanning technician to not scan the removal notice or leave a place holder–most likely it will not turn up anytime soon. Also notify the Registrar if you encounter this situation.

Removal Notices for both situations (current and missing) should be noted on the Scanning Worksheet.

Back to Top

Notations in EAD Finding Aids About Materials Scanned or Not Scanned

The collection-level Alternate Formats Note <altformsavail> in the EAD finding aid includes a statement about the collection being available in digital format, but should also include a brief sentence about the types of materials generally not scanned. For example:

The bulk of the collection was digitized in 2008 and is available via the Archives of American Art’s website. Blank pages, blank versos of photographs, photographs of artwork, duplicates, as well as select financial documents have not been scanned. In most cases, only the cover, title page, and individual relevant pages have been scanned from published materials.

At each series and subseries level, include a sentence in a separate <arrangement>**note specifying whether the series has been scanned in its entirety, partially scanned, or not scanned. Specify which materials have not been scanned if the series/subseries has been partially scanned. For example:

  • <scopecontent><p>narrative descriptive text</p></scopecontent>
  • <arrangement> <p> Personal papers are arranged in alphabetical order according to format.  This series is partially scanned; early academic records, tax records, duplicates, and poems written by others have not been scanned. </p><arrangement/
  • <scopecontent><p>narrative descriptive text</p></scopecontent>
  • <arrangement> <p>This series has been scanned in its entirety.</p><arrangement/>
  • <scopecontent><p>narrative descriptive text</p></scopecontent>
  • <arrangement> <p>This series has not been scanned.</p><arrangement/>
  • ** Using the <arrangement> note for this information is not fully EAD-compliant and does not meet national best practices; we will soon change our practices and start using <altformsavail> notes at component levels to note scanned status.

In the EAD finding aid, also make a notation “partially scanned” at the folder level in the <physdesc> tag. It may be combined with other <physdesc> information, separated by semicolon. DO NOT use a “partially scanned” notation when you are referring to duplicate materials. In 2011, AAA deprecated the use of a “not scanned” notation for folders not being scanned in their entirety.

For example:

  • <c02><did>
  • <container type="box">1</container>
  • <container type="folder">1</container>
  • <unittitle> Newsclippings, <unitdate>1920-1930 </unitdate></unittitle>
  • <physdesc>(2 folders; partially scanned) </physdesc>
  • </did></c02>
  • <c02><did>
  • <container type="box">1</container>
  • <container type="folder">2</container>
  • <unittitle> Newsclippings, <unitdate>1936-1940 </unitdate></unittitle>
  • <physdesc>(2 folders; oversized items in Box 14; partially scanned)
  • </physdesc>
  • </did></c02>


Back to Top

Arrangement and Description

Correspondence should be arranged in alphabetical order by name of correspondent when possible. This arrangement provides access points without the need to create a separate index, which is not accessible from the online image viewer.

If you must arrange correspondence in chronological order because it is too time consuming to arrange alphabetically, you must have your supervisor’s approval. If approved, do not group a large number of folders together in one folder heading. Remember, the folder title is a link to the contents of each folder, and it is too confusing to link to a large number of folders with the exact same heading. The breaks should fall where materials bulk the most or at regular intervals.

  • <c02><did>
  • <container type="box">1</container>
  • <container type="folder">1</container>
  • <unittitle>Letters, <unitdate>1942</unitdate></unittitle>
  • <physdesc></physdesc>
  • </did></c02>
  • <c02><did>
  • <container type="box">1</container>
  • <container type="folder">2-3</container>
  • <unittitle>Letters, <unitdate>1943-1944</unitdate></unittitle>
  • <physdesc>(2 folders)</physdesc>
  • </did></c02>

Back to Top

Letter Collections and other small collections

When describing a collection that is very small, less than 0.2 linear feet, comprised primarily of letters, please provide the number of letters/documents within the collection in the abstract note and scope and content note. You may also choose to create a folder title for each item, but only if the collection is fewer than 20 items.

Back to Top

Oversized References in Folder Headings

References to oversized materials are made in the <physdesc> tag.

At the folder heading where items have been removed, use the following language, as appropriate, in parenthesis.

<physdesc>(Oversized items housed in Box ## or OV##)</physdesc>

At the folder heading where the oversized item/s is physically housed, use the following language in parenthesis.

<physdesc> (Oversized items scanned with Box ##, F##) </physdesc>

At the folder heading where an entire folder has been removed and filed in an oversized box or oversized folder, and you have created a dummy folder, use the following language, as appropriate, in parenthesis.

<physdesc>(Oversized folder housed in Box ## or OV##)</physdesc>

At the folder heading where the entire oversized folder is now physically housed, use the following language in parenthesis.

<physdesc> (Oversized folder scanned with Box ##, F##) </physdesc>

When combining multiple items of information in a <physdesc>, the order of elements should be 1) number of folders if more than one, 2) oversized references if any, and 3) any notes about not scanned or partially scanned. The elements should be enclosed in only one set of parenthesis, separated by semicolons.

Oversized references in folder headings for materials not scanned do not include the word “scanned” when referencing back to other boxes. Simply state <physdesc> (Oversized items from Box #, F#) </physdesc>.


Back to Top