UTF-8 Character Reference Tool

Understanding UTF-8 Encoding

UTF-8 (Unicode Transformation Format - 8-bit) is a variable-length character encoding that can represent any character in the Unicode standard. It uses 1 to 4 bytes per character, making it efficient for text that contains mostly ASCII characters while still supporting the full Unicode range.

How UTF-8 Encoding Works:

1 byte: ASCII characters (0-127) - stored as-is
2 bytes: Characters 128-2047 (Latin, Greek, Cyrillic, etc.)
3 bytes: Characters 2048-65535 (most languages, symbols)
4 bytes: Characters 65536+ (emoji, rare symbols)

Chart Column Explanations:

Char: The actual Unicode character as it appears

Decimal: The Unicode code point in base-10 (0-1114111)

Hex: The Unicode code point in hexadecimal (0x0-0x10FFFF)

Binary: The Unicode code point in binary representation

UTF-8 Bytes: The actual bytes used to store this character in UTF-8 encoding

Category: The general type of character (Letter, Digit, Punctuation, etc.)

Description: The official Unicode name or description of the character

Examples:

'A' (65) = 1 byte: 41
'é' (233) = 2 bytes: C3 A9
'€' (8364) = 3 bytes: E2 82 AC
'👍' (128077) = 4 bytes: F0 9F 91 8D

See also: Online Hex Dump - View the raw bytes of any text or file

UTF-8 Character Reference

Character Inspector

Understanding UTF-8 Encoding

How UTF-8 Encoding Works:

Chart Column Explanations:

Examples: