Ingests should be run from dedicated virtual machines (VMs) to provide optimal bandwidth for file transfers and uploads. VMs might be dedicated to a specific kind of ingest, such as born-digital archives or digitized audio and moving image. Additional VMs may be necessary to paralellize the workflow and increase bandwidth utilization or to support a new ingest process. All VMs are managed by the Information Technology Group (ITG) and should be configured the same when possible.
Create a New Virtual Machine
- File a Jira Ticket with ITG to create a new virtual machine. Include the following:
- required processor cores, 4-16
- required RAM, 16+ GB
- required working storage, 4+ TB if possible
- required mounts of other storage clusters, such as Isilon or workgroup storage
sudo
privileges for your account- a list of users to create
- a list of software to install (screen, tmux, nano, python3, python3-pip, git)
- Confirm that the VM meets your needs.
- Check connectivity from both the office hardwired connections and wireless VPN connections.
- Check your account’s sudo privileges,
sudo -v
. - Check all software is installed.
- Check all user accounts were created,
ls /home
.
- Setup additional users, install software, and mounts if required.
- Contact users to test their connections.
- Add the VM to the Keeper list of workstations.
- Close the ticket with ITG.
User management
Create new user accounts
-
Create a new user account
sudo useradd <username> -m -p <pw> -s /bin/bash -G ingest
<username>
set the username for the account-p <pw>
set a temporary password of your choosing-m
create a home directory-s /bin/bash
set shell-G ingest
set secondary group to ingest
-
Send the login information to the user. Ask them to test the connection and also change the password.
Delete user accounts
-
Delete the account
sudo userdel -r <username>
sudo userdel
command to delete the account-r
deletes the home directory and other user data<username>
name of the account to delete
Install software
Most VMs require the same software available either from apt-get
or pip
sudo apt-get screen tmux nano
sudo apt-get install python3 python3-pip git
sudo pip3 install --system lxml boto3
VMs also require the custom Python packaging scripts. Instructions are being investigated to install these as system-wide scripts. In the meantime, clone the repository to your home folder and use that version.
cd ~
git clone https://github.com/NYPL/prsv-tools.git
python3 ~/prsv-tools/path/to/script.py
Mount Drives
Add new mount
All storage mounts should be made into one of the following directories:
/source/
for read-only mounts to data source locations/data/
for read/write mounts to working storage for the script/ifs/preservica/development
for mounts to the upload/storage/download directories for the Preservica test instance/ifs/preservica/production
for mounts to the upload/storage/download directories for the Preservica production instance
-
Create a directory for mounting and change its ownership to the ingest group.
sudo mkdir /path/to/mountpoint chmod -R ingest /path/to/mountpoint
-
Update the file system table file with the location and characteristics of the storage to mount. Use existing entries in the file as models.
sudo nano /etc/fstab
-
# example read-only mount storage.cluster.url:path/to/folder /path/to/mountpoint nfs4 ro,rsize=65536 1 1
-
# example read-write mount storage.cluster.url:path/to/folder /path/to/mountpoint nfs4 rw,rsize=65536,wsize=65536 1 1
-
-
Mount the drives. If drives fail to mount, investigate with IT.
sudo mount -a
Remove a mount
-
Comment out the appropriate line in the file system table.
sudo nano /etc/fstab
-
Unmount the source
sudo umount /path/to/mountpoint
-
Ensure the source is unmounted by checking that it is not in the list of current mounts.
sudo mount
-
Ensure the directory used as the mountpoint is empty and delete it.
ls /path/to/mountpoint rmdir /path/to/mountpoint