Unicode / Emoji Inspector

Break any string into code points, categories, escapes (JS / HTML / URL), and NFC/NFD/NFKC/NFKD normalization. In your browser.

published

  • [FREE]
  • [NO_SIGNUP]
  • [NO_UPLOAD]

A Unicode inspector explains exactly what’s inside a string β€” every code point, its escapes, and how it normalizes β€” all in your browser.

What it shows

  • Counts: graphemes (what you see), code points, UTF-16 units (JS .length), and UTF-8 bytes β€” the four numbers that confuse everyone.
  • Per character: the U+ code point, decimal value, category (Letter/Number/Symbol…), astral flag, and byte size. Invisible/control characters are flagged.
  • Escapes: JS \u, HTML decimal &#…;, HTML hex &#x…;, URL percent-encoding, and a bare code-point list β€” each one click to copy.
  • Normalization: NFC, NFD, NFKC, NFKD side by side, with code-point counts so you can see composition differences.

Why it’s handy

Debug β€œwhy is my string length wrong,” strip zero-width characters, catch homoglyph spoofing, prep text for reliable search/compare, or generate escapes for code. Runs entirely client-side.

Frequently asked questions

What is the difference between code points, UTF-16 units, and bytes?

A code point is one Unicode character (e.g. πŸ˜€ is one). UTF-16 units are how JavaScript counts string .length β€” an emoji is two units (a surrogate pair), which is why "πŸ˜€".length is 2. UTF-8 bytes are how the text is stored/transmitted β€” πŸ˜€ is 4 bytes. Graphemes are what a human sees as one character (an emoji with a skin-tone modifier is several code points but one grapheme). This tool shows all four.

Why does my string have a different length than I expect?

Usually invisible characters (zero-width spaces, joiners, BOM) or composed vs decomposed accents. The per-character table marks invisibles, and the normalization section shows whether Γ© is one code point (NFC) or e + a combining accent (NFD).

What is normalization (NFC/NFD/NFKC/NFKD) for?

The same visible text can be encoded multiple ways. Normalizing to NFC (composed) or NFD (decomposed) makes comparisons reliable. The NFK forms additionally fold compatibility characters (e.g. fi ligature β†’ fi, full-width οΌ‘οΌ’οΌ£ β†’ ABC) β€” useful for search and de-spoofing.

Can it spot homoglyph / spoofing characters?

Indirectly: paste a suspicious string and the per-character breakdown shows the real code points, so a Cyrishic "Π°" (U+0430) standing in for Latin "a" (U+0061) is obvious. NFKC also collapses many look-alike compatibility forms.

Is anything uploaded?

No. All inspection, escaping, and normalization runs in your browser. Nothing you paste leaves the page.