All connections to MSN servers take place over TCP/IP. The client always makes the outgoing connections to the servers. The official port for MSN Messenger is 1863, although there are many places in the protocol where alternate ports could be specified, so this may be subject to change.
The connection to the server must be considered asynchronous - you can send many commands to the server without waiting for a reply, and the server won't necessarily reply to your commands in the order you sent them. The server may also send messages that are not in reply to any particular message from the client. However, sometimes (for example, when logging into a notification server) the protocol requires you to send one command then receive one command, and so on.
There are also several OOB (out-of-band) protocols that take place directly between clients and do not involve the server. These protocols are described in their respective sections, and are not necessary for basic functionality of a client.
At the lowest level, computers can only send 1s and 0s to each other. In order for two computers to communicate, they must agree on what the 1s and 0s represent. In MSN Messenger (except in file transfer), they represent characters, such as "Latin Capital letter A", "Digital Four", or "Runic letter short-twig-sol". Hence, MSN Messenger is a text-based protocol.
Note: how these characters become squiggles on a computer screen is another matter entirely.
There are many different standards for representing characters using series of 1s and 0s. ASCII (the American Standard Code for Information Interchange) is the most famous and most widely used. It says that every byte with a decimal value between 0 and 127 inclusive represents a character - for example, a byte with value 90 represents "Roman capital letter Z", and a byte with a value 32 represents "Space". ASCII says nothing about what to do with bytes whose value is greater than 127. The complete ASCII table can be found many places on-line, including www.asciitable.com.
Because ASCII was primarily designed for use in America, people in other parts of the world have defined other standards - mostly, these are just extensions of ASCII to use bytes with values higher than 127, but some languages are so unlike English (e.g. Japanese) that they've started over from scratch.
One popular standard is ISO 8859-1, which extends ASCII to make it more useful in Western Europe - for example, it says that a byte with value 226 represents the British currency symbol (£), and a byte with a value 226 represents "Latin small letter A with circumflex" (â)
A more ambitious standard - which is on course to eventually replace ASCII - is Unicode, a single standard that can represent every character used in every language in the world, including quite a few ancient languages. It seems the world is enough, though, as they decided against Klingon language support.
Unfortunately, the increased ambition of Unicode must translate to increased complexity - most importantly, no direct translation of bytes to characters can possibly please everybody. Instead, every character is represented by a "Unicode value", and that value is represented by one or more bytes using a "Unicode Transformation Format", such as UTF-8 or UTF-16.
The most popular format is UTF-8, which was designed to be compatible with ASCII - like ISO 8859-1, it adds meanings for bytes with values greater than 127. There are over a million Unicode values, and only 256 values can be stored in a single byte, so unlike ISO 8859-1, UTF-8 may use up to six bytes to represent one character.
Data sent to and from MSN servers is encoded as UTF-8, an ASCII-compatible multi-lingual format that represents non-ASCII characters as sequences of two or more bytes. You need only worry about the difference between UTF-8 and ASCII if your program will need to print multi-lingual characters on-screen, in which case you should find out whether your programming environment supports Unicode (and if so, how well).
If you send or receive plaintext MSGs without a charset=UTF-8
parameter, you should assume the message is sent in ISO 8859-1, an ASCII-compatible format for use in Western Europe that represents non-ASCII characters as single bytes. The official client always includes charset=UTF-8
in messages it sends, and so should you.
At many occasions in the protocol, text may need to be URL-encoded (also referred to as URL quoted). Nicknames, friendly names, group names, and phone numbers are all examples of things that must be URL-encoded. URL-encoding is used to make sure a particular parameter does not contain any spaces, newlines, or otherwise invalid characters.
URL-encoding is defined in section 2.2 of RFC 1738. Basically, it replaces every special character with a percent symbol followed by the two digit hexadecimal representation of the character. For example, a space becomes %20
, a percent symbol becomes %25
, a linefeed becomes %0A
, and a carriage return becomes a %0D
. You can find a list of hexadecimal ASCII values at asciitable.com.
Where URL-encoding applies, the RFC says that everything except numbers, letters, and the special characters $-_.+!*'(),;/?:@=&
should be URL-encoded. In practice, you need only encode those characters whose hexadecimal value is 20 or below (i.e. space and below), and the "%" character, which has a hexadecimal value of 25. However, you may encode any other character if you wish.
Because the official client decodes every character, and not just required ones (like %25 and %20), other clients must decode every character.
If you really want to URL-encode multi-byte UTF-8 characters, you should do it one byte at a time, so the UTF-8 character with hexadecimal value D786 would be represented as "%D7%86", not "%D786". And therefore, when decoding strings, first URL-decode it, and then convert it to UTF-8.
Data sent to and from the server are divided into commands as explained in the Commands page. The server and client can send multiple commands in a packet, or spread out one command over multiple packets. Writing a client that parses data by packets is not a good method.
All normal commands end with a newline. Special types of commands specify the length of the body in bytes. When writing functions to parse commands, you will need to separate commands by newlines, except when receiving special commands with specified lengths.
There are two different types of servers used in the protocol: notification servers (NS), switchboards (SB).
The connection to a notification server is the basis of an MSN Messenger session, as it represents your online presence: if you are disconnected from the notification server, you are no longer online to your buddies. The purpose of the notification server is to authenticate you, let you view and make changes to your contact list, notify you when users go online and offline, initiate conversations, and provide various other services.
The NS basically does everything but hold conversations. It can only redirect you to the switchboard server where the conversations actually take place.
Note: The original draft refers to a third type of server known as a Dispatch Server, or DS for short. However, throughout our documentation, a Dispatch Server is just a default notification server.
The switchboard holds conversations between users. Each individual conversation corresponds to a separate connection to the switchboard. Direct connections to other users are not used in conversations, and the switchboard acts as a proxy between you and the people you are chatting with.
The SB is also where invitations to other services such as file transfer and NetMeeting are sent and received. Mobile paging is one of the only forms of communication that does not take place over a switchboard server.
Note that the SB and the NS are not very tightly integrated. For example, when a user in a switchboard session change his or her friendly name, the switchboard still sends out messages and other commands with the old friendly name. In addition, when a user disconnects from the NS, all switchboard sessions still remain open until the client explicitly closes them.
In order to use MSN in some highly restrictive environments, it is possible to wrap up an MSN Messenger session in HTTP requests and responses. This is slow and wasteful of bandwidth, so end-users should be discouraged from using it when normal connection are possible. This type of connection is explained in the HTTP Connections page.
A proxy server is a program which takes messages from one side of a connection, processes them, and may or may not forward them to the destination. Proxies can be to reduce bandwidth (e.g. by providing a local cache for web pages), increase security (e.g. by requiring a username and password before allowing connections), or any number of other reasons.
A good proxy server should be invisible (or almost invisible) to the destination server, so you are free to support whichever proxy servers you like. The official client supports SOCKS versions 4 and 5 proxies for use with normal connections, and HTTP proxies for use with HTTP connections.
SOCKS (which just stands for SOCKetS) is a very mature protocol, designed for use by general-purpose proxy servers on networks that want to control access to the public Internet. Unlike most applications protocols, which are used for some particular purpose (e.g. MSN Messenger is used for instant messaging), SOCKS is just used to tunnel connections.
SOCKS version 4 is specified at http://archive.socks.permeo.com/protocol/socks4.protocol, the version 4a extension is specified at http://archive.socks.permeo.com/protocol/socks4a.protocol, and version 5 is specified at RFC 1928. SOCKS4 only supports TCP connections, but SOCKS5 adds support for UDP, ICMP, and other less widely used transport protocols. You should be able to find a SOCKS library for your chosen operating system and programming language. If not, you'll have to write it yourself based on the protocol specifications, above.
If you're wrapping MSN in an HTTP connection, you can use an HTTP proxy. HTTP version 1.1 (defined in RFC 2616), was designed with proxy servers in mind. All HTTP requests must include the complete URL of the resource being requested, so using an HTTP proxy server normally just means opening a connection to your proxy server instead of the server you really want to contact - the proxy server will detect where your request is supposed to go based on the URL.
Some HTTP proxy servers may require you to include special headers - for example, to authenticate yourself with the proxy server. RFC 2616 defines these headers, and the HTTP Connections page discusses the behavior of the official client.