6.6. TNEF Attachments
So MIME is our standard, but we have to make allowances for uuencoding, which came before it. After receiving a few thousand emails, you're bound to come across one like this:
Content-type: application/ms-tnef; name=winmail.dat Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=winmail.dat CBAAAAYFAOEXzQfBwAAABgMAAOEAAgCwBMEAAgAAAOIAAABcAHAAEAAATmlybWFsYSBBbmlzZXR0 aSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEIAAgCwBGEBAgAAAMABAAA9AQIA AQCcAAIADgAZAAIAAAASAAIAAAATAAIAAACvAQIAAAC8AQIAAAA9ABIAaAE8AFss/CE4AAAAAAAB AFgCQAACAAAAjQACAAAAIgACAAAADgACAAEAtwECAAAA2gACAAAAMQAaAMgAAAD/f5ABAAAAAACj BQFBAHIAaQBhAGwAMQAaAMgAAAD/f5ABAAAAAACjBQFBAHIAaQBhAGwAMQAaAMgAAAD/f5ABAAAA
It looks like some kind of garbage file. At first you might want to discard it, as you would do with other unknown file attachment types (whenever you receive email into an application, you'll receive spam, viruses, and the like). Then a user will start complaining that they attached an image to their email and it didn't get processed. A little investigation reveals that there was no image attached, but there was this mysterious winmail.dat file. Sometimes the file is called unknown.001, but always with the media type of application/ms-tnef.
Some research reveals that TNEF is a Microsoft proprietary format for bundling attachments and metadata into a single file. TNEF stands for Transport Neutral Encapsulation Format and was designed to allow attachments and Outlook-specific metadata to be sent via both email and the protocol used by Microsoft Exchange.
The TNEF format is used only by Microsoft Outlook (Microsoft Outlook Express can't read it) and is generated under a hazy set of conditions. The winmail.dat file contains a packed list of files and metadata, such as calendar events. Once you know the format, extracting the files is fairly trivial. Unfortunately, finding the specification isn't so easy, but it can be implied by looking at how open source email clients deal with it. The implied version of the specification is as follows.
The TNEF block starts with a six-byte preamble:
4 byte signature - 0x223e9f78 2 byte object count
This preamble is followed by zero or more objects of the following format:
1 byte object type 0x01 - TNEF_LVL_MESSAGE 0x02 - TNEF_LVL_ATTACHMENT 4 byte sub-type 4 byte data length x bytes of data 2 byte checksum
The TNEF_LVL_MESSAGE objects can be discardedthese contain rich text versions of the message, which has already been provided in both text and HTML formats as regular MIME chunks. The TNEF_LVL_ATTACHMENT objects are the ones we're interested inseveral of them in sequence define an attached file. There are a number of subtypes of attachment objects, but we're only concerned with a few of these. Because they all take the same format, we can easily skip over the ones we don't know or care about. The ones we should concern ourselves with are:
0x00069002 - TNEF_ARENDDATA - marks the start of a new attachment 0x00018010 - TNEF_AFILENAME - a filename for the attachment 0x0006800f - TNEF_ATTACHDATA - the attached file data 0x00069005 - TNEF_AMAPIATTRS - file attributes
The TNEF_AMAPIATTRS object contains a data type, attribute, and valuein essence a key value pair with an extra flag to denote the data type. The attributes data block starts with a four-byte count of the number of attributes, and then a repeating pattern of type, name, and data. The data element's length depends on the data type:
4 bytes - number of attributes 2 byte type 0x0002 - TNEF_MAPI_SHORT - 2 bytes 0x0003 - TNEF_MAPI_INT - 4 bytes 0x000b - TNEF_MAPI_BOOLEAN - 4 bytes 0x0004 - TNEF_MAPI_FLOAT - 4 bytes 0x0005 - TNEF_MAPI_DOUBLE - 8 bytes 0x0040 - TNEF_MAPI_SYSTIME - 8 bytes 0x001e - TNEF_MAPI_STRING - special 0x001f - TNEF_MAPI_UNICODE_STRING - special 0x0102 - TNEF_MAPI_BINARY - special 2 byte name x bytes of data
The string types act a little differently. Each string value has a four-byte segment count. For each segment, a four-byte length is followed by bytes of data (aligned to four-byte boundaries).
Once we've extracted the attributes' data, the following types are of use to us:
0x370E - TNEF_MAPI_ATTACH_LONG_FILENAME - better version of the filename 0x3707 - TNEF_MAPI_ATTACH_MIME_TAG - mime type for the file
A comprehensive parser in PHP can be written in a couple of hundred lines, and open source products such as Horde (http://www.horde.org/) have already implemented it. If you're using Perl, then your life is made even easier by the Convert-TNEF module on CPAN.