Last updated
What Is Unicode Encoding?
Unicode encoding converts characters to various text representations — escape sequences, HTML entities, UTF-8 hex bytes, and percent-encoded URLs. The Unicode Encoder/Decoder handles all common formats, making it easy to prepare text for different programming environments and debug encoding issues.
Unicode Escape Sequences (JavaScript / Java)
Encode special characters as \uXXXX escape sequences:
Input text: Hello, 世界! ©
JavaScript/Java escape sequences:
Hello, \u4e16\u754c! \u00a9
Extended (ES6) for characters above U+FFFF:
Emoji 😀 → \u{1F600}
Emoji 🎉 → \u{1F389}
Java (surrogate pairs for supplementary planes):
😀 → \uD83D\uDE00
🎉 → \uD83C\uDF89
HTML Entity Encoding
Convert characters to HTML entities for safe use in HTML markup:
Input: © 2024
Named entities:
<script>alert("XSS")</script> © 2024
Decimal numeric entities:
<script>alert("XSS")</script> © 2024
Hexadecimal numeric entities:
<script>alert("XSS")</script> © 2024
UTF-8 Hex Byte Encoding
See the raw UTF-8 bytes for any character:
Character Code Point UTF-8 Bytes (hex)
--------- ---------- -----------------
A U+0041 41
é U+00E9 C3 A9
€ U+20AC E2 82 AC
世 U+4E16 E4 B8 96
😀 U+1F600 F0 9F 98 80
Full string: "Héllo"
UTF-8 hex: 48 C3 A9 6C 6C 6F
Percent Encoding (URL Encoding)
Encode characters for safe use in URLs:
Input: https://example.com/search?q=hello world&lang=中文
Percent-encoded:
https://example.com/search?q=hello%20world&lang=%E4%B8%AD%E6%96%87
Individual character encoding:
Space → %20
& → %26
= → %3D
中 → %E4%B8%AD (3 UTF-8 bytes)
文 → %E6%96%87 (3 UTF-8 bytes)
CSS Unicode Escapes
Use Unicode characters in CSS content properties and font icon declarations:
/* CSS unicode escape format: \XXXXXX */
/* Font Awesome icon in CSS */
.icon-home::before {
content: "\f015"; /* home icon */
}
/* Special characters in generated content */
.quote::before {
content: "\201C"; /* left double quotation mark " */
}
.quote::after {
content: "\201D"; /* right double quotation mark " */
}
/* Copyright symbol */
.footer::after {
content: "\00A9 2024"; /* © 2024 */
}
Decoding Encoded Text
Paste any encoded text and the decoder identifies the format and converts it back:
Input (HTML entities):
<h1>Café</h1>
Decoded:
<h1>Café</h1>
---
Input (Unicode escapes):
\u0048\u0065\u006C\u006C\u006F
Decoded:
Hello
---
Input (percent-encoded):
%48%65%6C%6C%6F%20%57%6F%72%6C%64
Decoded:
Hello World
Unicode Normalization Forms
The same character can be encoded in multiple ways. The encoder shows normalization form differences:
Character: é (e with acute accent)
NFC (precomposed):
Single code point: U+00E9
UTF-8: C3 A9
NFD (decomposed):
Two code points: U+0065 U+0301
UTF-8: 65 CC 81
(base 'e' + combining acute accent)
Both look identical when rendered but are different byte sequences.
Use NFC for storage and exchange (most common on the web).
Bidirectional Text Control Characters
The encoder reveals invisible Unicode control characters in bidirectional text:
Input: "Hello مرحبا World"
Visible control characters:
U+200F RIGHT-TO-LEFT MARK (invisible)
U+200E LEFT-TO-RIGHT MARK (invisible)
U+202A LEFT-TO-RIGHT EMBEDDING
U+202B RIGHT-TO-LEFT EMBEDDING
U+202C POP DIRECTIONAL FORMATTING
Encoded to show hidden characters:
Hello \u200Fمرحبا\u200E World
Practical Use Cases
- Preparing special characters for HTML templates without breaking markup
- Debugging encoding issues in API responses or log files
- Converting internationalized text for JavaScript string literals
- Encoding non-ASCII characters in URL query parameters
- Inserting Unicode symbols in CSS-generated content
- Understanding how emoji are stored in UTF-8 databases
- Detecting hidden bidirectional control characters in text
Quick Reference: Common Encodings
Character HTML Named HTML Decimal JS Escape URL Encode
--------- ---------- ------------ --------- ----------
& & & \u0026 %26
< < < \u003C %3C
> > > \u003E %3E
" " " \u0022 %22
© © © \u00A9 %C2%A9
€ € € \u20AC %E2%82%AC