Use Unicode Normalizer

Enter your data below to use the Unicode Normalizer

📌 Try these examples:
RESULT

Last updated

What Is Unicode Normalization?

Unicode normalization ensures that equivalent character sequences are represented identically. The same visible character can be stored as different byte sequences — normalization converts them to a consistent form, which is essential for correct string comparison, search indexing, and database deduplication.

The Core Problem: Multiple Representations

The letter é can be stored two ways that look identical but are different bytes:

Precomposed (NFC):
  Character: é
  Code point: U+00E9
  UTF-8 bytes: C3 A9
  Byte length: 2

Decomposed (NFD):
  Characters: e + ◌́
  Code points: U+0065 + U+0301
  UTF-8 bytes: 65 CC 81
  Byte length: 3

String comparison without normalization:
  "café" (NFC) === "café" (NFD)  →  FALSE  ❌
  
After normalizing both to NFC:
  "café" (NFC) === "café" (NFC)  →  TRUE   ✓

NFC — Canonical Decomposition, then Composition

NFC is the most common form for web and application storage. It produces precomposed characters:

Input (NFD, decomposed):
  e + ́  →  é   (U+0065 + U+0301 → U+00E9)
  a + ̈  →  ä   (U+0061 + U+0308 → U+00E4)
  n + ̃  →  ñ   (U+006E + U+0303 → U+00F1)

After NFC normalization:
  All precomposed — single code point per accented character
  Recommended for: HTML, JSON, databases, APIs

NFD — Canonical Decomposition

NFD decomposes all precomposed characters into base + combining marks:

Input (NFC, precomposed):
  é  →  e + ́   (U+00E9 → U+0065 + U+0301)
  ä  →  a + ̈   (U+00E4 → U+0061 + U+0308)
  ñ  →  n + ̃   (U+00F1 → U+006E + U+0303)

After NFD normalization:
  All decomposed — base character + combining accent
  Recommended for: text processing, accent stripping, macOS file system

NFKC — Compatibility Decomposition, then Composition

NFKC handles characters that are semantically equivalent but visually different:

Compatibility mappings applied by NFKC:

Ligatures:
  fi (fi ligature, U+FB01)  →  fi
  ff (ff ligature, U+FB00)  →  ff

Fullwidth characters:
  A (fullwidth A, U+FF21)  →  A
  1 (fullwidth 1, U+FF11)  →  1

Superscripts/subscripts:
  ² (superscript 2, U+00B2)  →  2
  ₃ (subscript 3, U+2083)    →  3

Mathematical variants:
  𝐀 (math bold A)  →  A
  𝑨 (math italic A)  →  A

Recommended for: search indexing, username normalization, password hashing

NFKD — Compatibility Decomposition

NFKD applies all compatibility decompositions without recomposition — the most decomposed form:

Input: financé

NFKD output:
  fi → f + i  (ligature decomposed)
  é → e + ́   (accent decomposed)

Result: f + i + n + a + n + c + e + ́
  (7 base characters + 1 combining accent)

Recommended for: text analysis, character-level processing

Normalization in Practice: String Comparison

// JavaScript — normalize before comparing
const str1 = "caf\u00E9";        // NFC: café (precomposed)
const str2 = "cafe\u0301";       // NFD: café (decomposed)

console.log(str1 === str2);                    // false ❌
console.log(str1.normalize() === str2.normalize()); // true ✓

// Python
import unicodedata
s1 = "caf\u00E9"
s2 = "cafe\u0301"
print(unicodedata.normalize('NFC', s1) == unicodedata.normalize('NFC', s2))
# True ✓

Normalization for Search Indexing

-- PostgreSQL: normalize text before indexing
CREATE OR REPLACE FUNCTION normalize_text(input TEXT)
RETURNS TEXT AS $$
BEGIN
  -- NFKC normalization for search
  RETURN lower(normalize(input, NFKC));
END;
$$ LANGUAGE plpgsql;

-- Index on normalized form
CREATE INDEX idx_products_name 
ON products (normalize_text(name));

-- Search matches regardless of normalization form
SELECT * FROM products
WHERE normalize_text(name) = normalize_text('Café');
-- Matches: "Café", "Cafe\u0301", "CAFÉ", "cafe"

Password Hashing with Normalization

Normalize passwords before hashing to prevent authentication failures across devices:

// macOS produces NFD, Windows produces NFC
// Without normalization, same password fails on different OS

// Correct approach: normalize to NFKC before hashing
import { createHash } from 'crypto';

function hashPassword(password: string): string {
  const normalized = password.normalize('NFKC');
  return createHash('sha256').update(normalized).digest('hex');
}

// "café" typed on macOS (NFD) and Windows (NFC) now produce same hash

Cross-Platform Normalization Issues

# Git normalization config
git config core.precomposeunicode true   # macOS: store as NFC in repo
git config core.quotepath false          # show Unicode in file paths

Normalization Form Quick Reference

Form   Full Name                              Use Case
----   ---------                              --------
NFC    Canonical Decomposition + Composition  Web, databases, APIs (default)
NFD    Canonical Decomposition               Text processing, macOS FS
NFKC   Compatibility Decomp + Composition    Search, usernames, passwords
NFKD   Compatibility Decomposition           Analysis, character processing

Frequently Asked Questions

Simply enter your data, click the process button, and get instant results. All processing happens in your browser for maximum privacy and security.

Yes! Unicode Normalizer is completely free to use with no registration required. All processing is done client-side in your browser.

Absolutely! All processing happens locally in your browser. Your data never leaves your device, ensuring complete privacy and security.