文件操作与数据处理：Python 实战指南！

在数据分析、自动化办公和爬虫开发中，文件操作和数据处理是核心技能。Python 提供了强大的库和工具，能够高效地处理各种文件格式和数据。本文将带你深入学习文件读写、CSV/JSON/Excel 数据处理、正则表达式以及日志记录与异常处理，并通过实践目标，帮助你掌握如何自动化处理 Excel 报表和爬取网页数据。

一、文件读写：掌握 Python 的文件操作基础

（一）使用 open() 函数

open() 函数是 Python 中最基本的文件操作工具，可以用来打开文件并进行读写操作。文件操作通常包括打开文件、读写文件内容和关闭文件。

示例：读取文件

file_path = 'example.txt'

file = open(file_path, 'r', encoding='utf-8')

# 打开文件，'r' 表示读取模式

content = file.read()

# 读取文件内容

print(content)

file.close() # 关闭文件

示例：写入文件

file_path = 'output.txt'

file = open(file_path, 'w', encoding='utf-8')

# 打开文件，'w' 表示写入模式

file.write("Hello, World!")

# 写入内容

file.close()

# 关闭文件

（二）使用 with 上下文管理

为了避免忘记关闭文件，Python 提供了 with 上下文管理器，它可以自动管理文件的打开和关闭。

示例：读取文件

file_path = 'example.txt'

with open(file_path, 'r', encoding='utf-8') as file:

content = file.read()

print(content)

示例：写入文件

file_path = 'output.txt'

with open(file_path, 'w', encoding='utf-8') as file:

file.write("Hello, World!")

二、CSV/JSON/Excel 数据处理：高效处理结构化数据

（一）CSV 文件处理

CSV（逗号分隔值）文件是一种常见的数据存储格式，Python 的 csv 模块可以方便地读写 CSV 文件。

示例：读取 CSV 文件

import csv

file_path = 'data.csv'

with open(file_path, 'r', encoding='utf-8') as file:

reader = csv.reader(file)

for row in reader:

print(row)

示例：写入 CSV 文件

import csv

file_path ='output.csv'

data =[['Name','Age'],['Alice',25],['Bob',30]]

withopen(file_path,'w', newline='', encoding='utf-8')asfile:

writer = csv.writer(file)

writer.writerows(data)

（二）JSON 数据处理

JSON（JavaScript Object Notation）是一种轻量级的数据交换格式，Python 的 json 模块可以方便地处理 JSON 数据。

示例：读取 JSON 文件

import json

file_path = 'data.json'

with open(file_path, 'r', encoding='utf-8') as file:

data = json.load(file)

print(data)

示例：写入 JSON 文件

import json

file_path = 'output.json'

data = {'name': 'Alice', 'age': 25}

with open(file_path, 'w', encoding='utf-8') as file:

json.dump(data, file, ensure_ascii=False, indent=4)

（三）Excel 数据处理

Excel 文件是一种常见的表格数据格式，pandas 库提供了强大的工具来读写 Excel 文件。

示例：读取 Excel 文件

import pandas as pd

file_path = 'data.xlsx'

df = pd.read_excel(file_path)

print(df)

示例：写入 Excel 文件

import pandas as pd

file_path = 'output.xlsx'

data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}

df = pd.DataFrame(data)

df.to_excel(file_path, index=False)

三、正则表达式：强大的文本匹配工具

正则表达式是一种用于匹配字符串的模式描述语言，Python 的 re 模块提供了正则表达式的功能。

示例：匹配和提取文本

import re

text = "Contact us at support@example.com or feedback@example.com"

emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)

print(emails)

四、日志记录与异常处理：确保程序的健壮性

（一）异常处理

在 Python 中，可以使用 try-except 块来捕获和处理异常。

示例：处理文件读取异常

try:

with open('nonexistent_file.txt', 'r') as file:

content = file.read()

except FileNotFoundError:

print("文件未找到！")

（二）日志记录

日志记录是程序开发中的重要部分，可以帮助开发者跟踪程序的运行情况和调试问题。Python 的 logging 模块提供了日志记录的功能。

示例：配置日志记录

import logging

logging.basicConfig(level=logging.DEBUG, filename='app.log', filemode='w',

format='%(name)s - %(levelname)s - %(message)s')

logging.info('程序开始运行')

try:

with open('nonexistent_file.txt', 'r') as file:

content = file.read()

except FileNotFoundError:

logging.error('文件未找到！')

五、实践目标：自动化处理 Excel 报表与爬取网页数据

（一）自动化处理 Excel 报表

假设你有一个包含多个工作表的 Excel 文件，需要合并这些工作表并进行数据清洗。以下是一个示例代码：

import pandas as pd

# 读取 Excel 文件

file_path ='report.xlsx'

xls = pd.ExcelFile(file_path)

# 合并所有工作表

df_list =[]

for sheet_name in xls.sheet_names:

df = pd.read_excel(xls, sheet_name=sheet_name)

df_list.append(df)

combined_df = pd.concat(df_list, ignore_index=True)

# 数据清洗

combined_df.dropna(inplace=True)

# 删除缺失值

combined_df.drop_duplicates(inplace=True)

# 删除重复值

# 保存清洗后的数据

output_path ='cleaned_report.xlsx'

combined_df.to_excel(output_path, index=False)

（二）爬取网页数据并存储为结构化格式

假设你需要爬取一个网页上的数据，并将其存储为 JSON 格式。以下是一个示例代码：

import requests

from bs4 import BeautifulSoup

import json

# 爬取网页数据

url ='https://example.com/data'

response = requests.get(url)

soup = BeautifulSoup(response.content,'html.parser')

# 提取数据

data_list =[]

for item in soup.find_all('div', class_='data-item'):

title = item.find('h2').text

description = item.find('p').text

data_list.append({'title': title,'description': description})

# 保存为 JSON 文件

file_path ='data.json'

withopen(file_path,'w', encoding='utf-8')asfile:

json.dump(data_list,file, ensure_ascii=False, indent=4)

通过本文的介绍，你已经掌握了文件操作、CSV/JSON/Excel 数据处理、正则表达式以及日志记录与异常处理的核心内容。通过实践目标的实现，你不仅能够自动化处理 Excel 报表，还能爬取网页数据并存储为结构化格式。希望本文能够帮助你在数据处理和自动化任务中更加得心应手，提升你的编程能力和工作效率。

更多相关技术内容咨询欢迎前往并持续关注好学星城论坛了解详情。

想高效系统的学习Python编程语言，推荐大家关注一个微信公众号：Python编程学习圈。每天分享行业资讯、技术干货供大家阅读，关注即可免费领取整套Python入门到进阶的学习资料以及教程，感兴趣的小伙伴赶紧行动起来吧。

发表于 2025-04-08 14:36
阅读 ( 24 )
分类：Python开发

文件操作与数据处理：Python 实战指南！

你可能感兴趣的文章

相关问题

0 条评论

作家榜 »