1. Computing & Technology

Discuss in my forum

Unicode

By , About.com Guide

Definition: Unicode is a way to represent characters and symbols in computers with support for most of the world's writing systems (including African, Arabic, Asian and Western).

Why Unicode?

Computers represent letters and characters as numbers. Historically, conventions mapping characters to numbers have been limited to certain writing systems or computing platforms. Encoding systems often conflicted: the same character could be represented by different numbers in different systems and one number could mean any wild character.

Unicode is an attempt to unify these encoding systems. Every character of any writing system currently in use (Unicode includes only a limited number of historical scripts) is assigned exactly one number, no matter the operating system or locale.

A character, in Unicode, is independent of how it appears on screen or in print. Many languages contain characters that have a different shape at the end of a word, for example. In Unicode, both variants are the same character.

How Many Characters Does Unicode Hold?

Unicode can support up to 1,114,111 characters. Various methods can be used to represent these, most commonly UTF-8, UTF-16 and UTF-32. They differ slightly in the way characters are translated to numbers, but each supports all Unicode characters. Converting between them is easy.

The current Unicode 5.1 standard knows 100,713 characters.

Unicode and Email

Emails using Unicode are typically sent using UTF-8 or UTF-7. The latter is yet another representation of Unicode characters specifically designed for email. UTF-7 is not in widespread use but for email has the advantage that it can be sent through old mail servers without complications. UTF-8 often has to be encoded using Base64 or quoted-printable.

©2012 About.com. All rights reserved.

A part of The New York Times Company.