Pre-processed Archives
Pre-processed Archives: Data from original media carriers
Pre-processed Archives packages are files from the digital carriers. There are a number of techniques used to transfer files including, disk images, logical file transfers, and cloud transfers. Pre-processed Archives packages may be referred to by these names as well. The transfer process is conducted by the Digital Archives team. More information can be found at Digital Archives Transfers documentation page.
The following diagram show how a typical file transfer package may look like.
The top-level folder should be named after the digital carrier ID, e.g. ACQ_1234_123456. This top-level folder should also include a “metadata” and an “objects” folder. Within the “metadata” folder, there may be multiple plaintext log files, which stores information about the transfer process. Within the “objects” folder, it is a bag structure, including a “data” folder alongside with bag-info.txt, bagit.txt, manifest-md5.txt and tagmanifest-md5.txt. Within the data folder, it should have one or more files, which can be in folder structures.
Process
The Pre-processed Archives Ingest instructions document how Digital Preservation (DP) staff move pre-processed archives packages from temporary storage locations to the Library’s digital repository hosted on Preservica.
Step-by-step ingest instructions
- Locate packages
- Choose a collection to work with
- Create a Trello ticket to log the work
-
Upload the collection to the source folder with rsync
rsync -arP /source/folder/* DA_Source/folder
Argument explanation:
- a is archive mode
- r is recursive
- P is progress
-
Validate and update packages
Normally, a linter is a static program that catches errors, bugs and flags potential problems in the source code. In our context, lint_ft.py is a Python script that confirms each pre-processed archives package conforms to the structure expected by the packaging and ingest processes.
- Log in to a virtual machine (VM)
-
Run the linter on all of the packages from the collection.
python3 lint_ft.py -... /source/digarch/path/to/collection ...
- After the linting process, go through the log file
- Fix each package that has error(s) individually
- If a repair can be carried out, do so
- If further help is needed, contact other Digital Preservation or Digital Archives staff
- Document common issues found and what we perform on them
- Continue linting the packages until all packages pass
-
Repackage and ingest
Packages that conform to the data model structure are ready to be ingested into Preservica. First, they must be repackaged according to Preservica’s expectations.
- Log in to a VM
-
Switch user to preservica
su preservica
- Change directory to
DA_Scripts
, which has a pyenv environment for Python version control -
Run the packaging script
python3 ingest_digarch_ft.py
- Follow the instructions to create pre-ingest containers for all packages
- Select
1
to ingest content to PRODUCTION tenant or select2
to ingest content to TEST tenant - Clear the process list? Choose
N
- Select the process you would like to run. Select
1
to Create New Container - Select the workflow type. Enter
2
for FileTransfer
- Select
- After all containers are created, ingest all packages to the instance of choosing
- Select
1
to ingest content to PRODUCTION tenant or select2
to ingest content to TEST tenant - Clear the process list? Choose
N
- Select the process you would like to run. Select
2
to Ingest Container - Enter
ALL
to upload all packages, unless for other purposes, specify which container to ingest
- Select
- Monitor the ingest progress on the Preservica user interface
Ingest confirmation
Confirm packages are ingested correctly on the Preservica website.