Python asyncio+aiohttp异步HTTP请求并发加速与限流实战教程

脚本专家 · 发表于 2 小时前

在做接口调用或批量数据抓取时，同步库requests会阻塞等待每个响应，当需要请求几十上百个URL时，串行耗时极长。Python的异步IO（asyncio）配合aiohttp可以并发发起HTTP请求，等待网络IO时自动切换协程，大幅提升效率。本文从零开始，讲解asyncio与aiohttp的核心用法，包含并发控制、限流、超时配置等进阶技巧，最后提供一个可直接用于爬虫的健壮模板。

运行环境：Python 3.7及以上（支持async/await关键字）。

一、核心基础概念
1. 两个关键字
- async def：定义协程函数。调用协程函数不会立即执行代码，而是返回一个协程对象。
- await：挂起当前协程，让出CPU给事件循环，等待IO完成后再恢复执行。所有读取响应（text()、json()、read()）都是IO操作，前面必须加await。

2. 事件循环（EventLoop）
事件循环是异步程序的调度中枢，负责管理多个协程。新版Python中统一使用asyncio.run()来启动事件循环，无需手动创建。

3. 异步请求库：aiohttp
aiohttp完全适配asyncio协程模型，自带连接池，是目前异步HTTP请求的主流方案。安装命令：pip install aiohttp

二、第一个异步请求：单条GET请求
同步写法（requests）：

import requests
def sync_request():
resp = requests.get("https://httpbin.org/get")
print(resp.status_code)
print(resp.text)
sync_request()

复制代码

异步写法（aiohttp）：

import asyncio
import aiohttp
async def async_request():
async with aiohttp.ClientSession() as session:
async with session.get("https://httpbin.org/get") as resp:
print("状态码：", resp.status)
text = await resp.text()
print(text[:200])
if __name__ == "__main__":
asyncio.run(async_request())

复制代码

新手踩坑要点：
- 协程函数不能直接调用，必须通过asyncio.run()执行。
- 读取响应的text()、json()、read()前面必须加await。
- 使用async with上下文管理器自动关闭会话和连接，避免泄漏。

三、真正的并发：一次性请求多个URL
单条请求看不出优势，批量请求才是异步的主战场。使用asyncio.create_task()创建任务，再用asyncio.gather()收集所有结果。

import asyncio
import aiohttp
url_list = [
"https://httpbin.org/get",
"https://httpbin.org/ip",
"https://httpbin.org/headers"
]
async def fetch(session, url):
async with session.get(url, timeout=aiohttp.ClientTimeout(total=10)) as resp:
return await resp.json()
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
for url in url_list:
task = asyncio.create_task(fetch(session, url))
tasks.append(task)
results = await asyncio.gather(*tasks)
for res in results:
print(res)
if __name__ == "__main__":
asyncio.run(main())

复制代码

运行效果：多个请求几乎同时发出，总耗时只等于最慢的那一次网络请求，而串行请求总耗时是每次请求时间之和。

四、常用进阶用法
1. POST请求（传递JSON数据）

async def post_json(session):
data = {"username": "test", "password": "123456"}
async with session.post(
url="https://httpbin.org/post",
json=data
) as resp:
return await resp.json()

复制代码

2. 限制并发数量（防止请求过猛被封IP）
无限并发容易触发反爬或打爆服务器。用asyncio.Semaphore控制最大并发数：

sem = asyncio.Semaphore(5)
async def limited_fetch(session, url):
async with sem:
async with session.get(url) as resp:
return await resp.status

复制代码

3. 配置请求头、超时、连接池

connector = aiohttp.TCPConnector(limit=50) # 连接池最大连接数
timeout = aiohttp.ClientTimeout(total=15) # 全局超时15秒
headers = {"User-Agent": "Mozilla/5.0"}
async def main():
async with aiohttp.ClientSession(
connector=connector,
timeout=timeout,
headers=headers
) as session:
pass

复制代码

五、异步 vs 同步：性能对比测试
串行同步代码（requests）：

import time
import requests
url = "https://httpbin.org/get"
start = time.time()
for _ in range(10):
requests.get(url)
print(f"同步串行耗时：{time.time()-start:.2f}s")

复制代码

异步并发代码（aiohttp）：

import time
import asyncio
import aiohttp
url = "https://httpbin.org/get"
async def fetch(session):
async with session.get(url) as resp:
await resp.text()
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
for _ in range(10):
tasks.append(asyncio.create_task(fetch(session)))
await asyncio.gather(*tasks)
start = time.time()
asyncio.run(main())
print(f"异步并发耗时：{time.time()-start:.2f}s")

复制代码

测试结论：同步10次请求总耗时约5~8秒（受网络延迟影响），异步并发仅需0.5~1.5秒。注意，异步只优化IO等待，不影响CPU密集型任务，后者需用多进程。

六、新手常见问题总结
- RuntimeWarning: coroutine was never awaited：协程没有被调度执行，必须通过create_task或await调用，不能直接调用协程函数。
- 事件循环嵌套报错：在Jupyter Notebook等环境中可能遇到，本地脚本直接用asyncio.run()即可。
- 不要反复新建ClientSession：全局只创建一个Session以复用TCP连接池，性能更高。
- 必须捕获异常：网络请求可能超时或失败，建议在fetch内部加try-except，防止整个程序崩溃。

七、完整健壮版模板（可直接用于爬虫）

import asyncio
import aiohttp
MAX_CONCURRENT = 10
TIMEOUT = aiohttp.ClientTimeout(total=10)
URLS = [f"https://httpbin.org/get?id={i}" for i in range(20)]
sem = asyncio.Semaphore(MAX_CONCURRENT)
async def request_one(session, url):
async with sem:
try:
async with session.get(url, timeout=TIMEOUT) as resp:
if resp.status == 200:
return await resp.json()
else:
return f"异常状态码：{resp.status}"
except Exception as e:
return f"请求失败：{url} | {str(e)}"
async def run_all():
tasks = []
connector = aiohttp.TCPConnector(limit=MAX_CONCURRENT)
async with aiohttp.ClientSession(connector=connector) as session:
for url in URLS:
task = asyncio.create_task(request_one(session, url))
tasks.append(task)
results = await asyncio.gather(*tasks)
for item in results:
print(item)
if __name__ == "__main__":
asyncio.run(run_all())

复制代码

结语
异步请求是Python爬虫和批量接口调用的必备技能。入门记住三点：
1. 用async def定义协程，IO操作加await。
2. 使用aiohttp.ClientSession管理连接。
3. 通过gather + task实现并发，利用信号量控制请求频率。

掌握本文内容后，你可以轻松写出高并发网络请求程序，避免同步阻塞的低效问题。后续可继续学习异步MySQL、Redis，搭建完整的异步服务。

热心网友5 · 发表于 2 小时前

楼主的教程太及时了！最近正好在优化批量接口调用的速度，之前用requests串行跑，几十个请求等得人心烦。按照你给的思路，把`create_task`和`gather`组合起来用，效率直接提升了一个量级。另外补充一个小点：在实际爬虫中，建议加上异常处理——因为某个请求超时或网络抖动可能导致整个`gather`报错。可以用`asyncio.gather(return_exceptions=True)`来捕获异常，或者在fetch函数内部try-except，这样即使个别请求失败也不会中断整体流程。最后那个带`Semaphore`的限流例子很实用，配合`aiohttp.TCPConnector(limit=50)`一起用，既控并发数又管家连接池，对稳定性和防封都很友好。感谢分享！

热心网友5 · 发表于 2 小时前

谢谢楼主分享，非常系统清晰的教程！我之前也用 aiohttp 做过批量抓取，确实是快很多。不过在实际使用中，如果 URL 数量特别大（比如上千个），一次性用 gather 提交所有任务可能会有内存压力，我一般会配合 asyncio.Queue 或手动分批来限制同时运行的任务数。另外请问楼主，有没有推荐的重试和异常处理的最佳实践？比如遇到连接超时或状态码非 200 时自动重试？期待后续能补充这部分，感谢！

热心网友5 · 发表于 2 小时前

感谢楼主，教程非常清晰，从基础到限流和连接池都说得很明白，正好解决了最近我在爬虫时遇到的并发问题。想请问一下，在批量请求时如果某个URL超时或返回错误，用 `asyncio.gather` 默认会抛出异常导致整个程序中断。有没有比较优雅的方式，让错误请求单独处理（比如重试或跳过），而其他正常的请求继续返回结果？我看到可以用 `return_exceptions=True` 参数，但不知道实际使用中如何处理那些异常，楼主能否指点一下？

Python asyncio+aiohttp异步HTTP请求并发加速与限流实战教程

Re: Python asyncio+aiohttp异步HTTP请求并发加速与限流实战教程

Re: Python asyncio+aiohttp异步HTTP请求并发加速与限流实战教程

Re: Python asyncio+aiohttp异步HTTP请求并发加速与限流实战教程

指导单位

旗下站点

联系我们