Python 判断 Base64 图片格式：解码后检查 Magic Bytes 与 Pillow 兜底方案

脚本专家 · 发表于 2 小时前

处理前端上传的图片时，很多场景会将图片转为 Base64 字符串传输。Base64 只是编码方式，不会保留文件头（Magic Bytes），后端收到一串字符后如何确定它原本是 JPG、PNG 还是 WebP？本文整理五种判断方案并给出可用的 Python 代码，重点推荐解码后检查 Magic Bytes 的方法，并附上结合 Pillow 验证的完整实现。

一、文件头（Magic Bytes）原理
每种图片格式的二进制文件开头都有固定的几个字节，称为 Magic Bytes。例如：
- PNG 开头十六进制 89 50 4E 47 0D 0A 1A 0A，对应 Base64 前缀 iVBORw0KGgo
- JPG/JPEG 开头 FF D8 FF E0 或 FF D8 FF E1，对应 Base64 前缀 /9j/4
- GIF 开头 47 49 46 38，对应 R0lG
- WebP 开头 52 49 46 46 ... 57 45 42 50（RIFF....WEBP），对应 UklGR...WEBP
- BMP 开头 42 4D（BM），对应 QkI
- ICO 开头 00 00 01 00，对应 AAA

关键点：Base64 编码只改变表示方式，不改变原始二进制内容。解码后前几个字节依然包含 Magic Bytes，这是识别格式的依据。

二、五种判断方案
方案一：解码后检查 Magic Bytes（推荐）
最准确、最快，零第三方依赖。

import base64
def detect_image_format(base64_string):
"""
通过 Magic Bytes 判断图片格式
支持格式：PNG, JPG, GIF, WebP, BMP, ICO
"""
# 去掉可能存在的 data URL 前缀
if ',' in base64_string:
base64_string = base64_string.split(',')[1]
try:
# 解码前 12 个字节足够判断所有常见格式
header = base64.b64decode(base64_string)[:12]
except Exception:
return None, "解码失败"
# PNG: 89 50 4E 47 0D 0A 1A 0A
if header.startswith(b'\x89PNG\r\n\x1a\n'):
return 'png', 'PNG'
# JPG: FF D8 FF
if header.startswith(b'\xff\xd8\xff'):
return 'jpg', 'JPEG'
# GIF: GIF8
if header.startswith(b'GIF8'):
return 'gif', 'GIF'
# WebP: RIFF....WEBP
if header.startswith(b'RIFF') and b'WEBP' in header:
return 'webp', 'WebP'
# BMP: BM
if header.startswith(b'BM'):
return 'bmp', 'BMP'
# ICO: 00 00 01 00
if header.startswith(b'\x00\x00\x01\x00'):
return 'ico', 'ICO'
return None, "无法识别"
# 使用示例
b64_str = "iVBORw0KGgoAAAANSUhEUgAAAAUA..." # 一段 PNG 的 base64
fmt, name = detect_image_format(b64_str)
print(f"格式: {name}") # 输出: PNG

复制代码

该方案准确率接近 100%，Magic Bytes 是格式的“身份证”，不会骗人。

方案二：用 Pillow 尝试打开（简单粗暴）
Pillow 自动识别格式，无需手动判断。

from PIL import Image
import base64
import io
def detect_by_pillow(base64_string):
if ',' in base64_string:
base64_string = base64_string.split(',')[1]
try:
img_data = base64.b64decode(base64_string)
img = Image.open(io.BytesIO(img_data))
return img.format.lower() # 'png', 'jpeg', 'gif', 'webp'...
except Exception as e:
return f"识别失败: {e}"

复制代码

优点：代码极简，支持 Pillow 能识别的所有格式；缺点：需要安装 Pillow，依赖重，速度比方案一慢，损坏的图可能误判。

方案三：从 Base64 字符串本身推断（有限适用）
部分 Base64 字符串带 Data URL 前缀，如 data:image/png;base64,iVBORw0KGgo...，可直接解析 MIME type。

def detect_from_data_url(data_url):
if ';' in data_url:
mime = data_url.split(';')[0].split('/')[-1]
return mime
return None

复制代码

局限：很多场景前端只传纯 Base64，不带 data:image/xxx;base64, 前缀，此方案失效。

方案四：后端让前端额外传格式字段（工程上最可靠）
前端通过 File.type 获取格式：

const file = input.files[0];
console.log(file.type); // "image/png"

复制代码

后端接收类似 JSON：{"image": "iVBOR...", "format": "png"}
优点：100% 准确，零计算开销；缺点：依赖前端配合，若前端未传，仍需兜底。推荐方案四 + 方案一组合使用：优先信任前端传的格式，信不过则自己判断。

方案五：暴力尝试所有格式（不推荐）
依次用 Pillow 以每种格式打开，效率低且损坏图片可能误判。

for fmt in ['png', 'jpg', 'gif', 'webp', 'bmp']:
try:
Image.open(io.BytesIO(data)).verify()
return fmt
except:
continue

复制代码

浪费计算资源，不推荐使用。

三、实战：完整的后端处理函数
结合 Magic Bytes 快速判断和 Pillow 验证，并支持兜底。

import base64
from PIL import Image
import io
def get_image_info(base64_string):
"""
综合判断图片格式，返回 (format_name, extension, pillow_image)
"""
# 1. 先去掉 data URL 前缀
if ',' in base64_string:
base64_string = base64_string.split(',', 1)[1]
# 2. Magic Bytes 判断（最快）
fmt_map = {
b'\x89PNG\r\n\x1a\n': ('png', 'PNG'),
b'\xff\xd8\xff': ('jpg', 'JPEG'),
b'GIF8': ('gif', 'GIF'),
b'RIFF': ('webp', 'WEBP'),
b'BM': ('bmp', 'BMP'),
b'\x00\x00\x01\x00': ('ico', 'ICO'),
}
try:
header = base64.b64decode(base64_string)[:12]
for magic, (ext, name) in fmt_map.items():
if header.startswith(magic):
if ext == 'webp' and b'WEBP' not in header:
continue
# 解码后用 Pillow 验证完整性
img = Image.open(io.BytesIO(base64.b64decode(base64_string)))
img.verify()
return ext, name, img
except Exception:
pass
# 3. 兜底：让 Pillow 尝试
try:
img_data = base64.b64decode(base64_string)
img = Image.open(io.BytesIO(img_data))
img.verify()
return img.format.lower(), img.format, img
except Exception as e:
return None, None, f"无效图片: {e}"

复制代码

调用示例：ext, name, img = get_image_info(b64_str) 返回格式名、扩展名和图像对象。

四、方案对比与核心结论
| 方案 | 准确率 | 速度 | 依赖 | 推荐场景 |
|------|--------|------|------|----------|
| Magic Bytes | 极高 | 极快 | 无 | 通用首选 |
| Pillow 尝试 | 高 | 快 | Pillow | 已有 Pillow 依赖时 |
| Data URL 前缀 | 高 | 极快 | 无 | 前端传 Data URL 时 |
| 前端传格式 | 极高 | 极快 | 无 | 工程首选，配合兜底 |
| 暴力枚举 | 低 | 慢 | Pillow | 不推荐 |

核心结论：Base64 不会破坏原始二进制数据，解码后检查前 12 个字节即可判断格式。PNG 最好认：以 iVBORw0KGgo 开头。JPG 次之：以 /9j/4 开头。工程上最稳妥的做法：前端传 file.type + 后端 Magic Bytes 兜底，双重保险。

热心网友7 · 发表于 1 小时前

楼主的总结很实用，特别是方案一（解码后检查 Magic Bytes）的方案，零依赖而且准确度高，很适合在轻量服务里用。我自己之前一直依赖 Pillow 兜底，但看了这个才意识到 Magic Bytes 其实更可靠，尤其是处理前端传过来的 base64 时不会因为 Pillow 没装或图片损坏导致误判。不过想请教一下，如果遇到 WebP 格式的 base64 字符串，方案一里 `b'WEBP' in header` 会不会因为 header 只取前12个字节而漏掉？毕竟 WebP 的 'WEBP' 好像在偏移 8 的位置，12个字节应该能覆盖到吧？另外有没有考虑过图片可能带 `data:image/xxx;base64,` 前缀时先剥离，这个你已经有处理了，很好。感谢分享！

Python 判断 Base64 图片格式：解码后检查 Magic Bytes 与 Pillow 兜底方案

Re: Python 判断 Base64 图片格式：解码后检查 Magic Bytes 与 Pillow 兜底方案

指导单位

旗下站点

联系我们