Last updated
What Is a BOM?
A BOM (Byte Order Mark) is a special Unicode character (U+FEFF) that appears at the
very beginning of a text file. It was originally used to indicate the byte order (endianness)
of UTF-16 and UTF-32 encoded files. In UTF-8, byte order is irrelevant, but some tools —
notably Microsoft Notepad and Excel — still prepend a UTF-8 BOM (0xEF 0xBB 0xBF)
to UTF-8 files.
The UTF-8 BOM is invisible in most text editors but causes real problems in code:
it breaks PHP scripts (output before headers), corrupts CSV imports, causes JSON parse errors,
and confuses shell scripts that check the first line for a shebang (#!/bin/bash).
Detecting a BOM
# Check if a file has a UTF-8 BOM
hexdump -C file.txt | head -1
# BOM shows as: ef bb bf at the start
# Using file command
file file.txt
# Output: "UTF-8 Unicode (with BOM) text"
# Python: detect and strip BOM
with open('file.txt', 'rb') as f:
raw = f.read()
if raw.startswith(b''):
print("BOM detected")
raw = raw[3:] # strip it
Removing BOMs in Bulk
import os, glob
def strip_bom(filepath):
with open(filepath, 'rb') as f:
content = f.read()
if content.startswith(b''):
with open(filepath, 'wb') as f:
f.write(content[3:])
return True
return False
# Strip BOM from all .csv files recursively
for path in glob.glob('**/*.csv', recursive=True):
if strip_bom(path):
print(f"Stripped BOM: {path}")
Preventing BOMs
Configure your editor to save files without BOM. In VS Code, open the Command Palette,
run "Change File Encoding", and select "UTF-8" (not "UTF-8 with BOM").
In Notepad++, go to Encoding → "Encode in UTF-8" (without BOM).
For new files, add an .editorconfig to your project:
[*]
charset = utf-8
# Note: editorconfig doesn't have a "no-bom" option,
# but most editors respect utf-8 as meaning no BOM