Unicode / Emoji Inspector
Break any string into code points, categories, escapes (JS / HTML / URL), and NFC/NFD/NFKC/NFKD normalization. In your browser.
published
- [FREE]
- [NO_SIGNUP]
- [NO_UPLOAD]
A Unicode inspector explains exactly whatβs inside a string β every code point, its escapes, and how it normalizes β all in your browser.
What it shows
- Counts: graphemes (what you see), code points, UTF-16 units (JS
.length), and UTF-8 bytes β the four numbers that confuse everyone. - Per character: the
U+code point, decimal value, category (Letter/Number/Symbolβ¦), astral flag, and byte size. Invisible/control characters are flagged. - Escapes: JS
\u, HTML decimal&#β¦;, HTML hex&#xβ¦;, URL percent-encoding, and a bare code-point list β each one click to copy. - Normalization: NFC, NFD, NFKC, NFKD side by side, with code-point counts so you can see composition differences.
Why itβs handy
Debug βwhy is my string length wrong,β strip zero-width characters, catch homoglyph spoofing, prep text for reliable search/compare, or generate escapes for code. Runs entirely client-side.
Related tools
Frequently asked questions
What is the difference between code points, UTF-16 units, and bytes?
A code point is one Unicode character (e.g. π is one). UTF-16 units are how JavaScript counts string .length β an emoji is two units (a surrogate pair), which is why "π".length is 2. UTF-8 bytes are how the text is stored/transmitted β π is 4 bytes. Graphemes are what a human sees as one character (an emoji with a skin-tone modifier is several code points but one grapheme). This tool shows all four.
Why does my string have a different length than I expect?
Usually invisible characters (zero-width spaces, joiners, BOM) or composed vs decomposed accents. The per-character table marks invisibles, and the normalization section shows whether Γ© is one code point (NFC) or e + a combining accent (NFD).
What is normalization (NFC/NFD/NFKC/NFKD) for?
The same visible text can be encoded multiple ways. Normalizing to NFC (composed) or NFD (decomposed) makes comparisons reliable. The NFK forms additionally fold compatibility characters (e.g. ο¬ ligature β fi, full-width οΌ‘οΌ’οΌ£ β ABC) β useful for search and de-spoofing.
Can it spot homoglyph / spoofing characters?
Indirectly: paste a suspicious string and the per-character breakdown shows the real code points, so a Cyrishic "Π°" (U+0430) standing in for Latin "a" (U+0061) is obvious. NFKC also collapses many look-alike compatibility forms.
Is anything uploaded?
No. All inspection, escaping, and normalization runs in your browser. Nothing you paste leaves the page.