Python 下载大文件，哪种方式速度更快？

通常，我们都会用 requests 库去下载，这个库用起来太方便了。

方法一
使用以下流式代码，无论下载文件的大小如何，Python 内存占用都不会增加：
def download_file(url):
local_filename = url.split('/')[-1]
# 注意传入参数 stream=True
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return local_filename

如果你有对 chunk 编码的需求，那就不该传入 chunk_size 参数，且应该有 if 判断。
def download_file(url):
local_filename = url.split('/')[-1]
# 注意传入参数 stream=True
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(local_filename, 'w') as f:
for chunk in r.iter_content():
if chunk:
f.write(chunk.decode(utf-8))
return local_filename
iter_content[1] 函数本身也可以解码，只需要传入参数 decode_unicode = True 即可。
请注意，使用 iter_content 返回的字节数并不完全是 chunk_size，它是一个通常更大的随机数，并且预计在每次迭代中都会有所不同。
方法二
使用 Response.raw[2] 和 shutil.copyfileobj[3]
import requests
import shutil

def download_file(url):
local_filename = url.split('/')[-1]
with requests.get(url, stream=True) as r:
with open(local_filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)

return local_filename
这将文件流式传输到磁盘而不使用过多的内存，并且代码更简单。
注意：根据文档，Response.raw 不会解码，因此如果需要可以手动替换 r.raw.read 方法
response.raw.read = functools.partial(response.raw.read, decode_content=True)
速度

方法二更快。方法一如果 2-3 MB/s 的话，方法二可以达到近 40 MB/s。

更多相关技术内容咨询欢迎前往并持续关注好学星城论坛了解详情。

想高效系统的学习Python编程语言，推荐大家关注一个微信公众号：Python编程学习圈。每天分享行业资讯、技术干货供大家阅读，关注即可免费领取整套Python入门到进阶的学习资料以及教程，感兴趣的小伙伴赶紧行动起来吧。

发表于 2024-11-30 09:17
阅读 ( 326 )
分类：Python开发

Python 下载大文件，哪种方式速度更快？

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »