Last updated
Invisible Character Remover Examples
The Invisible Character Remover detects and strips hidden Unicode characters that cause mysterious bugs in text processing, string comparisons, and data pipelines. Below are real-world examples of where invisible characters appear and how to clean them.
Common Invisible Characters and Their Unicode Code Points
- U+200B — Zero-Width Space (ZWSP): invisible but breaks word wrapping
- U+200C — Zero-Width Non-Joiner (ZWNJ): affects character joining in scripts
- U+200D — Zero-Width Joiner (ZWJ): joins characters like emoji sequences
- U+00A0 — Non-Breaking Space (NBSP): looks like a space but prevents line breaks
- U+FEFF — Byte Order Mark (BOM): often prepended to UTF-8 files
- U+00AD — Soft Hyphen: invisible hyphenation hint
- U+200E — Left-to-Right Mark (LRM): controls text direction
- U+200F — Right-to-Left Mark (RLM): controls text direction
Example: Text Copied from a Web Page
When you copy text from a website, invisible characters often come along for the ride. Here is what the raw string might look like when inspected:
Input (appears normal):
"Hello World"
Actual bytes (hex):
48 65 6C 6C 6F E2 80 8B 20 57 6F 72 6C 64
The E2 80 8B sequence is U+200B (Zero-Width Space) hiding between "Hello" and the space.
After removal:
"Hello World"
Bytes: 48 65 6C 6C 6F 20 57 6F 72 6C 64
Example: String Comparison Failure
This is one of the most frustrating bugs invisible characters cause. Two strings look identical but fail equality checks.
// JavaScript example
const fromDatabase = "username"; // clean string
const fromUserInput = "username"; // contains U+200B after last 'e'
console.log(fromDatabase === fromUserInput); // false!
console.log(fromDatabase.length); // 8
console.log(fromUserInput.length); // 9 ← reveals the hidden character
// After running through the remover:
const cleaned = removeInvisible(fromUserInput);
console.log(fromDatabase === cleaned); // true
Example: Non-Breaking Space in Data
Non-breaking spaces (U+00A0) are common in text copied from HTML pages. They look exactly like regular spaces but break many text processing operations.
Input text (copied from a webpage):
"Price: $42 00"
↑ this space is U+00A0, not U+0020
Problem: parseFloat("$42 00") fails because of the NBSP
Problem: split(" ") doesn't split on NBSP
Problem: trim() does not remove NBSP in some environments
After removal (replace NBSP with regular space):
"Price: $42 00"
parseFloat works, split works, trim works.
Example: BOM at Start of File
UTF-8 files sometimes start with a Byte Order Mark (U+FEFF). This causes issues when the file content is read and processed as a string.
CSV file content (raw):
name,age,city
Alice,30,New York
The at the start is U+FEFF (BOM).
Problem: The first column header reads as "name" not "name"
Problem: CSV parsers may fail or produce wrong column names
Problem: JSON.parse() fails if BOM is present
After BOM removal:
name,age,city
Alice,30,New York
Example: Bidirectional Control Characters (Security Risk)
Attackers can use bidirectional Unicode characters to make malicious code look harmless in code reviews.
Visible in editor:
// Check if admin
if (user.role == "admin") { grantAccess(); }
Actual string with hidden RLM/LRM characters injected:
// Check if admin
if (user.role == "admin") { grantAccess(); }
The string "admin" never equals "admin" — access is never granted,
but the code looks correct in a review.
After removing bidirectional control characters:
// Check if admin
if (user.role == "admin") { grantAccess(); }
Example: Cleaning API Response Data
Raw API response field:
{
"username": "johndoe", ← U+200B between john and doe
"email": "john@example.com"
}
Database lookup for "johndoe" fails.
Email validation regex fails.
After cleaning:
{
"username": "johndoe",
"email": "john@example.com"
}
Example: Soft Hyphen in Product Names
Input: "SuperWidget Pro"
↑ U+00AD (soft hyphen) between Super and Widget
Problem: search for "SuperWidget" returns no results
Problem: slug generation produces "super-widget-pro" unexpectedly
After removal: "SuperWidget Pro"
Slug: "superwidget-pro" ✓
Selective Removal Options
The tool lets you choose which character types to remove, since some invisible characters are intentional:
- Remove zero-width spaces: yes (almost never intentional in data)
- Remove non-breaking spaces: optional (may be intentional in HTML content)
- Remove BOM: yes (almost always a problem in data processing)
- Remove ZWJ: optional (needed for emoji sequences like family emoji)
- Remove bidirectional marks: yes (rarely needed, security risk)
Statistics Output Example
Input text: 1,247 characters
Invisible characters found: 8
- Zero-Width Space (U+200B): 5 occurrences at positions 42, 87, 203, 441, 892
- Non-Breaking Space (U+00A0): 2 occurrences at positions 156, 670
- Byte Order Mark (U+FEFF): 1 occurrence at position 0
Output text: 1,239 characters (8 characters removed)
Paste your text into the tool, review the highlighted invisible characters, choose which types to remove, and get clean output instantly. This is especially useful before storing user input in a database or sending data to an external API.