Use Text Normalizer

Enter your data below to use the Text Normalizer

📌 Try these examples:
RESULT

Last updated

Unicode Normalization Forms

// The letter "é" can be represented two ways in Unicode:
// Precomposed: U+00E9 (single character)
// Decomposed:  U+0065 + U+0301 (e + combining accent)

// Both look identical but are different byte sequences:
"\u00e9"        // NFC form — precomposed é
"e\u0301"       // NFD form — e + combining accent

// NFC normalization (most common for storage/exchange):
Input:  "café" (decomposed)
Output: "café" (precomposed, NFC)

// NFD normalization (for sorting/comparison):
Input:  "café" (precomposed)
Output: "cafe\u0301" (decomposed, NFD)

// NFKC normalization (compatibility + composition):
Input:  "file" (fi ligature U+FB01)
Output: "file" (decomposed to f + i)

Whitespace Normalization

// Input with various whitespace issues
"Hello   World"          // multiple spaces
"  leading spaces"       // leading whitespace
"trailing spaces   "     // trailing whitespace
"tab\there"              // tab character
"non\u00a0breaking"      // non-breaking space (U+00A0)
"thin\u2009space"        // thin space (U+2009)

// After whitespace normalization:
"Hello World"            // collapsed to single space
"leading spaces"         // leading whitespace trimmed
"trailing spaces"        // trailing whitespace trimmed
"tab here"               // tab → space
"non breaking"           // non-breaking space → regular space
"thin space"             // thin space → regular space

Case Normalization

// Lowercase (for search indexing)
Input:  "The Quick Brown Fox"
Output: "the quick brown fox"

// Uppercase
Input:  "hello world"
Output: "HELLO WORLD"

// Title Case
Input:  "the quick brown fox"
Output: "The Quick Brown Fox"

// Sentence case
Input:  "hello world. this is a test."
Output: "Hello world. This is a test."

// Unicode-aware lowercase (handles non-ASCII)
Input:  "ÜBER STRASSE"
Output: "über straße"

Punctuation Normalization

// Curly quotes → straight quotes (for code/data)
Input:  "Hello" and 'World'
Output: "Hello" and 'World'

// Straight quotes → curly quotes (for typography)
Input:  "Hello" and 'World'
Output: "Hello" and 'World'

// Em dash normalization
Input:  "word—word"   (em dash U+2014)
Output: "word - word" (spaced hyphen)

// Ellipsis normalization
Input:  "Wait…"       (ellipsis character U+2026)
Output: "Wait..."     (three periods)

// Apostrophe normalization
Input:  "it\u2019s"   (right single quotation mark)
Output: "it's"        (ASCII apostrophe)

Line Ending Normalization

// Windows CRLF → Unix LF
Input:  "line1\r\nline2\r\nline3\r\n"
Output: "line1\nline2\nline3\n"

// Old Mac CR → Unix LF
Input:  "line1\rline2\rline3\r"
Output: "line1\nline2\nline3\n"

// Mixed → Unix LF
Input:  "line1\r\nline2\nline3\r"
Output: "line1\nline2\nline3\n"

// Unix LF → Windows CRLF (for Windows compatibility)
Input:  "line1\nline2\nline3\n"
Output: "line1\r\nline2\r\nline3\r\n"

Diacritic Removal (ASCII Folding)

// Remove accent marks for URL slugs and search
Input:  "café résumé naïve"
Output: "cafe resume naive"

Input:  "Ångström über Straße"
Output: "Angstrom uber Strase"

Input:  "Ñoño señor"
Output: "Nono senor"

// Useful for:
// - Generating URL slugs from titles
// - Creating ASCII-safe identifiers
// - Search indexing for accent-insensitive search
// - Normalizing names for comparison

Search Index Normalization

// Normalize text before indexing for search
function normalizeForSearch(text) {
  return text
    .normalize('NFKD')                    // Unicode decomposition
    .replace(/[\u0300-\u036f]/g, '')      // Remove combining marks
    .toLowerCase()                         // Lowercase
    .replace(/[^\w\s]/g, ' ')             // Remove punctuation
    .replace(/\s+/g, ' ')                 // Collapse whitespace
    .trim();
}

normalizeForSearch("Café Résumé")   // → "cafe resume"
normalizeForSearch("Hello, World!") // → "hello world"
normalizeForSearch("über straße")   // → "uber strase"

Data Cleaning for Database Import

// Raw data from CSV export
"  Alice Smith  "    → "Alice Smith"      (trim)
"BOB JONES"          → "Bob Jones"        (title case)
"carol\u00a0white"   → "carol white"      (non-breaking space)
"dave\r\njohnson"    → "dave johnson"     (line ending in field)
"Eve\u2019s"         → "Eve's"            (curly apostrophe)

Normalization Options Reference

Paste your text into the Text Normalizer, select the normalization operations you need, and get clean, consistent output ready for processing.

Full Normalization Pipeline Example

// Raw input from user form submission
"  Hello,   World!\u00a0 It\u2019s a \u201cgreat\u201d day\u2026  "

// Step 1: Unicode NFC normalization
"  Hello,   World!\u00a0 It\u2019s a \u201cgreat\u201d day\u2026  "

// Step 2: Normalize special Unicode characters
"  Hello,   World!  It's a \"great\" day...  "

// Step 3: Trim leading/trailing whitespace
"Hello,   World!  It's a \"great\" day..."

// Step 4: Collapse multiple spaces
"Hello, World! It's a \"great\" day..."

// Step 5: Lowercase
"hello, world! it's a \"great\" day..."

// Final normalized output:
"hello, world! it's a \"great\" day..."

Frequently Asked Questions

Simply enter your data, click the process button, and get instant results. All processing happens in your browser for maximum privacy and security.

Yes! Text Normalizer is completely free to use with no registration required. All processing is done client-side in your browser.

Absolutely! All processing happens locally in your browser. Your data never leaves your device, ensuring complete privacy and security.