If the internet is the information highway, then the path for email is a narrow ravine. Only very small carts can pass.
The transport system of email is designed for plain ASCII text only. Trying to send text in other languages or arbitrary files is like getting a truck through the ravine.
How Does the Big Truck go Through the Ravine?
Then how do you send a big truck through a small ravine? You have to take it to pieces on the one end, transport the pieces through the ravine, and rebuild the truck from the pieces on the other end.
The same happens when you send a file attachment via email. In a process known as encoding the binary data is transformed to ASCII text, which can be transported in email without problems. On the recipient's end, the data is decoded and the original file is rebuilt.
One method of encoding arbitrary data as plain ASCII text is Base64. It is one of the techniques employed by the MIME standard to send data other than plain text.
Base64 to the Rescue
The first step is to convert three bytes to four numbers of six bits. Each character in the ASCII standard consists of seven bits. Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. None of the special characters available in ASCII are used. The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as '+' and '/'.
If, for example, the three bytes are 155, 162 and 233, the corresponding (and frightening) bit stream is 100110111010001011101001, which in turn corresponds to the 6-bit values 38, 58, 11 and 41.
These numbers are converted to ASCII characters in the second step using the Base64 encoding table. The 6-bit values of our example translate to the ASCII sequence "m6Lp".
- 155 -> 10011011
- 162 -> 10100010
- 233 -> 11101001
- 100110 -> 38
- 111010 -> 58
- 001011 -> 11
- 101001 -> 41
- 38 -> m
- 58 -> 6
- 11 -> L
- 41 -> p
This two-step process is applied to the whole sequence of bytes that are encoded. To ensure the encoded data can be properly printed and does not exceed any mail server's line length limit, newline characters are inserted to keep line lengths below 76 characters. The newline characters are encoded like all other data.
Solving the Endgame
At the end of the encoding process we might run into a problem. If the size of the original data in bytes is a multiple of three, everything works fine. If it is not, we might end up with one or two 8-bit bytes. For proper encoding, we need exactly three bytes, however.
The solution is to append enough bytes with a value of '0' to create a 3-byte group. Two such values are appended if we have one extra byte of data, one is appended for two extra bytes.
Of course, these artificial trailing '0's cannot be encoded using the encoding table. They must be represented by a 65th character. The Base64 padding character is '='. Naturally, it can only ever appear at the end of encoded data.