April 23-27, 2002 - Version 1 - Draft 2
hypothetic.org

MSN Instant Messenger Protocol

Overview Basics Connecting Session Messaging File Transfer Other FAQ Research

Connections

Socket Information

The MSN Messenger protocol uses only TCP sockets. With the exception of "client to client" connections such as file transfer and voice chat, all connections are outgoing. The port number should always be 1863, but the server can specify another port.

Types of Servers

There are three types of servers that are used in the protocol.

Protocol Basics

Overview

The MSN Messenger protocol is made up of plain text - specifically, the UTF-8 flavour of Unicode. UTF-8 is an ASCII-compatible multi-lingual format that represents non-ASCII characters as sequences of two or more bytes. If you want to handle languages other than English, you'll need to see what kind of support your programming language has for Unicode.

A single command may be split up into several packets or many commands may be joined together into a single packet, so the only reliable way to distinguish between them is by message lengths and newlines. There are two types of data sent between a client and a server.

Commands

Most of the data sent is as normal commands. Every normal command is made up of a three character command identifier, followed by any parameters included with the command, and ending with a newline (always a \r\n). All parameters are separated with spaces.

If you want to send a space or a newline without MSN Messenger thinking it's a separator (for example, as part of a friendly-name), you must URL-encode it. This involves replacing the character with "%HH", where HH is the hexadecimal value of the character (available, for example, from www.asciitable.com). URL encoding was originally defined in RFC 1738.

Messages

Messages are a unique type of command that function slightly differently. Their three character command is MSG. Messages contain newline characters, so they use another method of distinguishing when they end. At the end of the first line of every message (the fourth word), is a number representing the number of bytes in the message below (MIME header and body together). Every newline (\r\n) is counted as two bytes, and the initial line contain the MSG is not included in the count. Clients will know the message has ended when they have counted that many bytes of data after the first newline.

After the first line of the message comes the MIME header. This should always begin with MIME-Version: 1.0 followed by a newline. The second line defines the type of message being sent, and usually looks like this: Content-Type: */*; charset=UTF-8, where */* represents the type of message. The ; charset=UTF-8 is completely optional, but is always used by the official MSN Messenger client and the servers. Additional lines can be used depending on the type of message. The final MIME header line is followed by two newlines the distinguish between the header and the body of the message. The body of the message can contain anything.

Fields

Some commands convey information useful to the client (for example, a user's profile information or formatting information in a message). These are transmitted in the form of fields, one per line. Every field has a key and a value, for example:

preferredEmail: example@passport.com

The last character of a key must be a colon, and will be followed by a space. The key can't have a colon anywhere else in it, but the value can have several or none. For example, in file transfer, one client might send:

Request-Data: IP-Address:

And the other client might come back with:

IP-Address: 10.44.102.65

A command which sends fields will usually send many of them. You can't generally expect that they will always be sent in the same order, or that only the fields mentioned here will be sent. For example, GAIM will include a field something like User-Agent: Gaim/0.59 in all of its messages.

Challenges

The only other exception to the "commands always end with a newline" rule is a challenge (CHL). Since they are only sent from the client and never from the server, this should not make parsing any more difficult. Challenges will be explained in the session section.

Errors

When something goes wrong, the server sends an error command. These are just regular commands, but the command name is always a three digit number. The number is followed by the transaction ID of the command it is replying to. An error can be sent from any of the three types of servers. Below is a list of error numbers and what they represent.

Error Codes

A * means that this error is not in the official IETF protocol document by Microsoft and is basically just a guess as to what it means by context.

Transaction IDs

Every command sent from a client to a server must contain a transaction ID. When a server sends a reply to one of these messages, it will contain the same exact transaction ID, so that the client knows which command the server was replying to. A transaction ID is always a number between 0 and 4294967295 (2^32 - 1). It would probably be best for a client to start at 0 and increment the ID every time it sends a command.

The transaction ID is always sent right after the three character command name and before the parameters, with spaces in between. According to the official MSN Messenger Service 1.0 Protocol draft, when the server sends a command to a client that is not in response to a client command, it should use a transaction ID of 0. However, the server has never sent any transaction ID in those commands to me, so it's possible the statement in the draft was incorrect.

Screen Names

All screen names on MSN are URL quoted so that they fit in one word and can be easily parsed. URL quoting is where spaces are replaced with %20 along with other special characters.

MD5

Whenever a password is sent to the server, it is encoded using the MD5 algorhythm. The result is always lowercase and hexidecimal. If you want to check to see if you are encoding correctly, try encoding the hash 1013928519.693957190 added to mypassword, or in other words, 1013928519.693957190mypassword. The result should be 6f3963009fc8a9d2b2ff137da0905c55

Copyright ©2002-2003 to Mike Mintz.