How to use the Unicode & Invisible Character Inspector
Paste any text — JSON, YAML, CSV, source code, a URL, or a token — and instantly see hidden or risky characters that are invisible on screen but can break parsers, comparisons, and copy-pasted code. The tool flags invisible and zero-width characters, bidirectional controls, typographic look-alikes like smart quotes and em dashes, and words that mix scripts (a common homoglyph trick). One click cleans the text to safe ASCII. Everything runs locally in your browser; nothing is uploaded.
What it does
- Detects zero-width spaces, BOM, soft hyphens, word joiners, and other invisible formatting.
- Flags bidirectional override characters used in "Trojan Source" tricks.
- Converts smart quotes, en/em dashes, and ellipses to plain ASCII.
- Highlights words that mix Latin with Cyrillic or Greek look-alike letters.
- Shows a per-character breakdown and all four Unicode normalization forms.
When to use it
- A JSON or YAML file fails to parse but looks correct.
- A string comparison fails even though two values look identical.
- Code copied from a doc, PDF, or chat won't compile.
- You suspect a phishing domain, token, or identifier uses look-alike letters.
- You need to sanitize pasted content before committing or sharing it.
How to use it
- Paste your text into the input box.
- Review the Findings badges and the annotated view.
- Choose which fixes to apply, click Clean, then copy the safe output.
Tips & pitfalls
- Cleaning converts an em dash to two hyphens and curly quotes to straight quotes — disable that toggle if you want to preserve typography.
- Mixed-script detection is a heuristic for Latin/Cyrillic/Greek; legitimate multilingual text can mix scripts intentionally.
- Normalization (NFKC) can change characters' meaning — review the forms before using them as keys.
FAQ
- What is a zero-width space? A zero-width space (U+200B) is an invisible character with no width. It can break string comparisons, JSON keys, and copy-pasted code even though nothing appears on screen.
- What is a homoglyph attack? A homoglyph attack uses characters from another script that look identical to Latin letters, such as a Cyrillic a that looks like a Latin a, to disguise a different identifier, domain, or token.
- Is my text uploaded anywhere? No. All analysis and cleaning run entirely in your browser. Nothing you paste is sent to any server.
Related guides