HTML Charsets

HTML Charsets: Ensuring Proper Character Display

HTML charsets define how characters are represented in a web document. The character encoding ensures that text appears correctly across different devices and platforms.

The <meta> tag's charset attribute is used to specify which character encoding the HTML document uses. By setting the charset, we ensure proper rendering of special characters, symbols, and text.


Common Character Encodings

1. ASCII

The American Standard Code for Information Interchange (ASCII) is a character encoding standard. This character encoding is used in C/C++ programming.

It has 128 alphanumeric characters consisting of alphabets (A-Z and a-z) and some special symbols like +, -, *, /, (, ), @, etc.

2. ANSI (Windows-1252)

The American National Standards Institute (ANSI) created a character encoding that supported 256 characters. It is used as the default character set in Microsoft Windows.

3. ISO-8859-1

This is the default character set for HTML4 and also supports 256 characters. The International Standards Organization (ISO) defines standard character sets for different alphabets/languages. It contains numbers, upper and lowercase English letters, and some special characters.

4. UTF-8

The UTF-8 and UTF-16 standards were developed by the Unicode Consortium because the ISO-8859 character sets are limited and not compatible in a multilingual environment. It consists of all the characters and punctuation symbols.


Specifying the Character Set

A web browser must know the character encoding standard used in the HTML page. This is specified using the <meta> tag.

Example:

HTML 4:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

HTML 5:

<meta charset="UTF-8">

Note:


Why is Character Encoding Important?


Character Sets for Different Encoding Standards

The following list shows different character encoding standards with their characters and their assigned number codes.

Table 1 (ASCII Device Control Characters)

This table contains characters which are designed to control hardware devices. These are also known as control characters.