Last updated
When You Need to Merge CSV Files
CSV merging is a common data engineering task: combining monthly export files into a yearly dataset, merging data from multiple API responses, or consolidating reports from different teams. The key challenge is handling headers correctly — you want one header row in the output, not one per input file.
Merging CSV Files in Python
Python
import csv, glob, os
def merge_csv_files(input_pattern, output_file):
files = sorted(glob.glob(input_pattern))
if not files:
raise ValueError(f"No files found matching: {input_pattern}")
with open(output_file, 'w', newline='', encoding='utf-8') as out:
writer = None
for i, filepath in enumerate(files):
with open(filepath, newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
if writer is None:
writer = csv.DictWriter(out, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
writer.writerow(row)
print(f"Merged: {os.path.basename(filepath)}")
print(f"Output: {output_file}")
# Merge all monthly sales files
merge_csv_files('sales_2026_*.csv', 'sales_2026_full.csv')
Merging with pandas
Python
import pandas as pd, glob
# Vertical merge (stack rows)
dfs = [pd.read_csv(f) for f in glob.glob('data_*.csv')]
merged = pd.concat(dfs, ignore_index=True)
merged.to_csv('merged.csv', index=False)
# Horizontal merge (join on a key column)
df1 = pd.read_csv('users.csv')
df2 = pd.read_csv('orders.csv')
joined = pd.merge(df1, df2, on='user_id', how='left')
joined.to_csv('users_with_orders.csv', index=False)
Common Pitfalls
- Inconsistent column names: Check that all files have the same headers before merging. A typo in one file's header creates a new column with NaN values.
- Different encodings: Files from different sources may use UTF-8, Latin-1, or Windows-1252. Detect encoding with
chardetbefore reading. - Duplicate rows: If files overlap in date ranges, you may get duplicate records. Use
df.drop_duplicates()after merging.