Ingests should be run from dedicated virtual machines (VMs) to provide optimal bandwidth for file transfers and uploads. VMs might be dedicated to a specific kind of ingest, such as born-digital archives or digitized audio and moving image. Additional VMs may be necessary to paralellize the workflow and increase bandwidth utilization or to support a new ingest process. All VMs are managed by the Information Technology Group (ITG) and should be configured the same when possible.
- File a Jira Ticket with ITG to create a new virtual machine. Include the following:
- required processor cores, 4-16
- required RAM, 16+ GB
- required working storage, 4+ TB if possible
- required mounts of other storage clusters, such as Isilon or workgroup storage
sudoprivileges for your account
- a list of users to create
- a list of software to install (screen, tmux, nano, python3, python3-pip, git)
- Confirm that the VM meets your needs.
- Check connectivity from both the office hardwired connections and wireless VPN connections.
- Check your account’s sudo privileges,
- Check all software is installed.
- Check all user accounts were created,
- Setup additional users, install software, and mounts if required.
- Contact users to test their connections.
- Add the VM to the Keeper list of workstations.
- Close the ticket with ITG.
Create a new user account
sudo useradd <username> -m -p <pw> -s /bin/bash -G ingest
<username>set the username for the account
-p <pw>set a temporary password of your choosing
-mcreate a home directory
-s /bin/bashset shell
-G ingestset secondary group to ingest
Send the login information to the user. Ask them to test the connection and also change the password.
Delete the account
sudo userdel -r <username>
sudo userdelcommand to delete the account
-rdeletes the home directory and other user data
<username>name of the account to delete
Most VMs require the same software available either from
sudo apt-get screen tmux nano
sudo apt-get install python3 python3-pip git
sudo pip3 install --system lxml boto3
VMs also require the custom Python packaging scripts. Instructions are being investigated to install these as system-wide scripts. In the meantime, clone the repository to your home folder and use that version.
git clone https://github.com/NYPL/prsv-tools.git
All storage mounts should be made into one of the following directories:
/source/for read-only mounts to data source locations
/data/for read/write mounts to working storage for the script
/ifs/preservica/developmentfor mounts to the upload/storage/download directories for the Preservica test instance
/ifs/preservica/productionfor mounts to the upload/storage/download directories for the Preservica production instance
Create a directory for mounting and change its ownership to the ingest group.
sudo mkdir /path/to/mountpoint chmod -R ingest /path/to/mountpoint
Update the file system table file with the location and characteristics of the storage to mount. Use existing entries in the file as models.
sudo nano /etc/fstab
# example read-only mount storage.cluster.url:path/to/folder /path/to/mountpoint nfs4 ro,rsize=65536 1 1
# example read-write mount storage.cluster.url:path/to/folder /path/to/mountpoint nfs4 rw,rsize=65536,wsize=65536 1 1
Mount the drives. If drives fail to mount, investigate with IT.
sudo mount -a
Comment out the appropriate line in the file system table.
sudo nano /etc/fstab
Unmount the source
sudo umount /path/to/mountpoint
Ensure the source is unmounted by checking that it is not in the list of current mounts.
Ensure the directory used as the mountpoint is empty and delete it.
ls /path/to/mountpoint rmdir /path/to/mountpoint