In the last post, I used some bash tools to derive a file format signature for the ACS format used to store Clippy. During that analysis, 32 of the files had an OLE format signature, 0xD0CF11E0A1B11AE1.
Also known as the Compound File Binary Format, OLE was used as a structure for many different formats from software like 97-2003 Microsoft Office, SPSS, Minitab, and Caseware. It was also used to package the animations and character data used for Microsoft Agents.
On this documentation page, you can see the three formats Microsoft used for agents: ACS, ACF, and AAC. The ACS can store the entire character, but Microsoft also wanted to allow people to build websites and network applications with Agents. ACF and AAC made this easier since an application would only have to download the character data and animations it needed. On the other hand, Clippy.ACS took 6+ minutes to download over a 56k modem.
Examples of ACF and AAC are actually difficult to find on the web. Microsoft used to host them at locations like http://agent.microsoft.com/agent2/chars/genie/genie.acf. As far as I can tell, all of these resources are now down and weren’t collected by a web archive. However, the OLE files from http://www.ponx.org/msagent/Acs/ have some promise.
This OLE file has at least one ACF file and AAF file embedded in it. The trick is extracting these files from inside the OLE file.
Collect all the OLE files from ponx
Honestly, I would probably use python for this, but it was a fun exercise to work out as a shell-scripting ‘one-liner’. There should be 33 files in there.
Extract ACF and AAC files from OLE
This requires some tools that can read the OLE format. Since it is the basis of the 97-2003 Office formats, there’s already a very good Java library to manipulate the formats, Apache POI. Even better, Ross Spencer wrote a nice command-line tool with the Java library to extract the files.
Find signatures for ACF and AAC files
The ole folder should be filled with subfolders now, each with the extracted ACF and AAC files. Follow the same process as the post on the ACS files to analyze them for magic numbers. You can use a small tweak to peak into all the subdirectories with one command.
I ended up with 0x1F000100 and 0x1E000100 for AAF and 0xC1ABCDAB for ACF, although I’m doing a little more research before submitting these to PRONOM.