PRONOM is the digital preservation file format registry of choice, but it’s only as good as the file format signatures that we put into it. There are already great resources about how to do this. Jenny Mitcham from the University of York included links to most resources in her blog post about creating a signature and there is a trove OPF blogs on the subject by Ross Spencer, Andrea Byrne, Becky McGuiness, and others.
This post will be a very fast flyover of building signature for Microsoft’s ACS format, aka Clippy’s format.
Gather some sample data.
The more examples of a format you have, the more confidence you have that you’re finding a good signature. Luckily for ACS, the Microsoft Agent tool had a small community that built more of these demonic helpers. I even found one resource with over 300 examples of agents. So let’s download all of them.
Look around for magic numbers
A lot of formats use a semi-unique series of bytes at the start of the file to identify themselves. We can see if this is the case by printing out the first 16 bytes of all of the ACS files.
Test any patterns
The first four bytes (0xC3ABCDAB) look like a good candidate, so let’s see how many files have that as the first 4 bytes.
You can try cutting more or less fields, but 0xC3ABCDAB looks like the longest common byte string. If you search for 0xD0CF11E0, you’ll find that that is the signature for Microsoft OLE Compound File, a general file structure used as the basis for many formats like .doc Word files and .ppt PowerPoint files. These OLE ACS files probably deserve more research.
Research to backup your hypothesis
It’s always good to back up your work with data from other people. One potential source is the TrID format registry. In this case there is a signature for ACS files, and it matches our signature.
There’s even someone that wrote an unofficial spec for the format. Remy Lebeau’s work also match our signature.
Make and test a PRONOM signature file
Ross Spencer built this excellent tool so that you don’t have to write the XML by hand. Both DROID and Siegfried can be loaded with the custom signature for testing
Submit the signature.
Package up any of the information and resources you have found and submit them to PRONOM. The folks at the National Archives will look over the signature and contact you with any comments.