UTF-8 Character Reference

Explore Unicode characters with their decimal and hexadecimal values

Character Inspector

or code point:
Char Decimal Hex Binary UTF-8 Bytes Category Description

Understanding UTF-8 Encoding

UTF-8 (Unicode Transformation Format - 8-bit) is a variable-length character encoding that can represent any character in the Unicode standard. It uses 1 to 4 bytes per character, making it efficient for text that contains mostly ASCII characters while still supporting the full Unicode range.

How UTF-8 Encoding Works:

  • 1 byte: ASCII characters (0-127) - stored as-is
  • 2 bytes: Characters 128-2047 (Latin, Greek, Cyrillic, etc.)
  • 3 bytes: Characters 2048-65535 (most languages, symbols)
  • 4 bytes: Characters 65536+ (emoji, rare symbols)

Chart Column Explanations:

Char: The actual Unicode character as it appears
Decimal: The Unicode code point in base-10 (0-1114111)
Hex: The Unicode code point in hexadecimal (0x0-0x10FFFF)
Binary: The Unicode code point in binary representation
UTF-8 Bytes: The actual bytes used to store this character in UTF-8 encoding
Category: The general type of character (Letter, Digit, Punctuation, etc.)
Description: The official Unicode name or description of the character

Examples:

  • 'A' (65) = 1 byte: 41
  • 'Γ©' (233) = 2 bytes: C3 A9
  • '€' (8364) = 3 bytes: E2 82 AC
  • 'πŸ‘' (128077) = 4 bytes: F0 9F 91 8D

See also: Online Hex Dump - View the raw bytes of any text or file