6.6. TNEF Attachments

So MIME is our standard, but we have to make allowances for uuencoding, which came before it. After receiving a few thousand emails, you're bound to come across one like this:

Content-type: application/ms-tnef;
        name=winmail.dat
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename=winmail.dat
CBAAAAYFAOEXzQfBwAAABgMAAOEAAgCwBMEAAgAAAOIAAABcAHAAEAAATmlybWFsYSBBbmlzZXR0
aSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEIAAgCwBGEBAgAAAMABAAA9AQIA
AQCcAAIADgAZAAIAAAASAAIAAAATAAIAAACvAQIAAAC8AQIAAAA9ABIAaAE8AFss/CE4AAAAAAAB
AFgCQAACAAAAjQACAAAAIgACAAAADgACAAEAtwECAAAA2gACAAAAMQAaAMgAAAD/f5ABAAAAAACj
BQFBAHIAaQBhAGwAMQAaAMgAAAD/f5ABAAAAAACjBQFBAHIAaQBhAGwAMQAaAMgAAAD/f5ABAAAA

It looks like some kind of garbage file. At first you might want to discard it, as you would do with other unknown file attachment types (whenever you receive email into an application, you'll receive spam, viruses, and the like). Then a user will start complaining that they attached an image to their email and it didn't get processed. A little investigation reveals that there was no image attached, but there was this mysterious winmail.dat file. Sometimes the file is called unknown.001, but always with the media type of application/ms-tnef.

Some research reveals that TNEF is a Microsoft proprietary format for bundling attachments and metadata into a single file. TNEF stands for Transport Neutral Encapsulation Format and was designed to allow attachments and Outlook-specific metadata to be sent via both email and the protocol used by Microsoft Exchange.

The TNEF format is used only by Microsoft Outlook (Microsoft Outlook Express can't read it) and is generated under a hazy set of conditions. The winmail.dat file contains a packed list of files and metadata, such as calendar events. Once you know the format, extracting the files is fairly trivial. Unfortunately, finding the specification isn't so easy, but it can be implied by looking at how open source email clients deal with it. The implied version of the specification is as follows.

The TNEF block starts with a six-byte preamble:

4 byte signature - 0x223e9f78
2 byte object count

This preamble is followed by zero or more objects of the following format:

1 byte object type
        0x01 - TNEF_LVL_MESSAGE
        0x02 - TNEF_LVL_ATTACHMENT
4 byte sub-type
4 byte data length
x bytes of data
2 byte checksum

The TNEF_LVL_MESSAGE objects can be discardedthese contain rich text versions of the message, which has already been provided in both text and HTML formats as regular MIME chunks. The TNEF_LVL_ATTACHMENT objects are the ones we're interested inseveral of them in sequence define an attached file. There are a number of subtypes of attachment objects, but we're only concerned with a few of these. Because they all take the same format, we can easily skip over the ones we don't know or care about. The ones we should concern ourselves with are:

0x00069002 - TNEF_ARENDDATA  - marks the start of a new attachment
0x00018010 - TNEF_AFILENAME  - a filename for the attachment
0x0006800f - TNEF_ATTACHDATA - the attached file data
0x00069005 - TNEF_AMAPIATTRS - file attributes

The TNEF_AMAPIATTRS object contains a data type, attribute, and valuein essence a key value pair with an extra flag to denote the data type. The attributes data block starts with a four-byte count of the number of attributes, and then a repeating pattern of type, name, and data. The data element's length depends on the data type:

4 bytes - number of attributes
2 byte type
  0x0002 - TNEF_MAPI_SHORT - 2 bytes
  0x0003 - TNEF_MAPI_INT - 4 bytes
  0x000b - TNEF_MAPI_BOOLEAN - 4 bytes
  0x0004 - TNEF_MAPI_FLOAT - 4 bytes
  0x0005 - TNEF_MAPI_DOUBLE - 8 bytes
  0x0040 - TNEF_MAPI_SYSTIME - 8 bytes
  0x001e - TNEF_MAPI_STRING - special
  0x001f - TNEF_MAPI_UNICODE_STRING - special
  0x0102 - TNEF_MAPI_BINARY - special
2 byte name
x bytes of data

The string types act a little differently. Each string value has a four-byte segment count. For each segment, a four-byte length is followed by bytes of data (aligned to four-byte boundaries).

Once we've extracted the attributes' data, the following types are of use to us:

0x370E - TNEF_MAPI_ATTACH_LONG_FILENAME - better version of the filename
0x3707 - TNEF_MAPI_ATTACH_MIME_TAG - mime type for the file

A comprehensive parser in PHP can be written in a couple of hundred lines, and open source products such as Horde (http://www.horde.org/) have already implemented it. If you're using Perl, then your life is made even easier by the Convert-TNEF module on CPAN.