Processed Archives
Processed Archives: Finding Aid Component packages from Forensic Toolkit
Processed Archives packages are packages of files created by the Archival Processing Unit using either Forensic Toolkit (FTK) or a file manager. Subsequently, they are exported by the Digital Archives team. They are also known as Finding Aid Component (FA component) packages. More information on the Processing steps can be found on Digital Archives documentation website. The following diagram shows how this type of package may look like.
The top-level folder should be named after the finding aid component ID, aka the Electronic Records Identifier, e.g. M1234_ER_0001. This top-level folder should also include a “metadata” and an “objects” folder. Within the “metadata” folder, there should be a CSV file, named with the same finding aid component ID. Within the “objects” folder, it should have one or more files.
Data Model
Born-Digital Archives Data Model is created to accommodate a wide range of content collected by the NYPL Digital Archives program. It is designed to be adaptable to legacy collections as well as Digital Archives’ future acquisitions.
Data Model Description
The following data model describes how a FA component package will be structured after ingested into the digital repository software, Preservica.
Each component forms a Structural Object (SO), named as “DI/EM/ER Container”, which can be understood as a folder. DI stands for Digital Image; EM stands for Email; and ER stands for Electronic Record. DI/EM/ER Container must have one metadata SO, named “(original folder title)_metadata”, and one contents SO, “(original folder title)_contents”. Within the metadata SO, there may not have any file, or it may have metadata file(s). Within the contents SO, there can be Information Object(s) (IO), also known as asset(s), and/or file and folder hierarchy, depending on the original content structure.
Process
The Born-Digital Archives Ingest instructions document how Digital Preservation (DP) staff move FA component packages from temporary storage locations to the Library’s digital repository hosted on Preservica.
Step-by-step ingest instructions
-
Locate packages
- Choose a collection to work with
- Create a Trello ticket to log the work
-
Upload the collection to the source folder with rsync
rsync -arP /source/folder/* DA_Source/folder
Argument explanation:
- a is archive mode
- r is recursive
- P is progress
-
Validate and update packages
Normally, a linter is a static program that catches errors, bugs and flags potential problems in the source code. In our context, lint_er.py is a Python script that confirms each Electronic Record (ER) package conforms to the structure expected by the packaging and ingest processes.
- Log in to a virtual machine (VM)
-
Run the linter on all of the packages from the collection.
python3 lint_er.py -... /source/digarch/path/to/collection ...
- After the linting process, go through the log file
- Fix each package that has error(s) individually
- If a repair can be carried out, do so
- If further help is needed, contact other Digital Preservation or Digital Archives staff
- Document common issues found and what we perform on them
- Continue linting the packages until all packages pass
-
Repackage and ingest
Packages that conform to the data model structure are ready to be ingested into Preservica. First, they must be repackaged according to Preservica’s expectations.
- Log in to a VM
-
Switch user to preservica
su preservica
- Change directory to
DA_Scripts
, which has a pyenv environment for Python version control -
Run the packaging script
python3 DigArch_NYPL_Uploader.py
- Follow the instructions to create pre-ingest containers for all packages
- Select
1
to ingest content to PRODUCTION tenant or select2
to ingest content to TEST tenant - Clear the process list? Choose
N
- Select the process you would like to run. Select
1
to Create New Container - Select the workflow type. Enter
1
for DigArch
- Select
- After all containers are created, ingest all packages to the instance of choosing
- Select
1
to ingest content to PRODUCTION tenant or select2
to ingest content to TEST tenant - Clear the process list? Choose
N
- Select the process you would like to run. Select
2
to Ingest Container - Enter
ALL
to upload all packages, unless for other purposes, specify which container to ingest
- Select
- Monitor the ingest progress on the Preservica user interface
Ingest confirmation
Confirm packages are ingested correctly on the Preservica website.