Last updated
Common Encoding Issues and Solutions
- Garbled accented characters — usually latin1/ISO-8859-1 data read as UTF-8; convert from ISO-8859-1 to UTF-8
- Question marks replacing characters — UTF-8 data truncated or decoded with wrong encoding
- BOM characters at start of file — strip BOM when processing UTF-8 files in scripts
- Curly quotes appearing as boxes — Windows-1252 data; convert to UTF-8
- Japanese/Chinese characters garbled — identify the source encoding (Shift-JIS, GB2312, etc.) and convert to UTF-8
- Database storing wrong bytes — ensure connection charset matches the data encoding
The Encoding Converter on TechConverter.me supports all major character encodings, shows byte-level representations for debugging, handles BOM issues, and converts between any two encodings instantly — making it an essential tool for any developer working with international text data.
Examples
Example 1: Fixing Mojibake (Garbled Text)
A developer migrates data from a legacy MySQL database that used latin1 encoding to a new system using UTF-8. Customer names with accented characters appear garbled:
Stored in database (latin1): José García
Displayed incorrectly (read as UTF-8): José GarcÃa
The converter identifies the issue: the text was encoded in ISO-8859-1 (latin1) but decoded as UTF-8. The fix is to re-read the bytes as ISO-8859-1 and re-encode as UTF-8:
Input encoding: ISO-8859-1
Output encoding: UTF-8
Input: José GarcÃa (the garbled string)
Output: José García (correctly decoded)
Example 2: Converting Windows-1252 to UTF-8
A CSV file exported from a Windows application uses Windows-1252 encoding. When opened in a UTF-8 environment, special characters like curly quotes and em dashes appear as question marks or strange symbols:
Windows-1252 characters that differ from ISO-8859-1:
\x80 → € (Euro sign)
\x91 → ' (left single quotation mark)
\x92 → ' (right single quotation mark)
\x93 → " (left double quotation mark)
\x94 → " (right double quotation mark)
\x96 → – (en dash)
\x97 → — (em dash)
The converter handles these Windows-1252-specific characters correctly, producing proper UTF-8 output where all characters render as intended.
Example 3: Byte Representation Comparison
The converter shows the byte-level representation of text in different encodings, which is useful for debugging. For the string "café":
UTF-8 bytes: 63 61 66 C3 A9 (5 bytes — é is 2 bytes)
ISO-8859-1 bytes: 63 61 66 E9 (4 bytes — é is 1 byte)
UTF-16 LE bytes: 63 00 61 00 66 00 E9 00 (8 bytes — each char is 2 bytes)
UTF-32 LE bytes: 63 00 00 00 61 00 00 00 66 00 00 00 E9 00 00 00 (16 bytes)
This visualization explains why a file encoded in ISO-8859-1 and read as UTF-8 produces garbled output: the byte 0xE9 is a valid single-byte character in ISO-8859-1 (é), but in UTF-8 it is the start of a multi-byte sequence, causing the decoder to misinterpret the following bytes.