Python 文件读取错误处理方案

一、错误检测机制

文件存在性检测：在读取文件前，使用 os.path.exists 方法检查文件是否存在。

import os

file_path = 'your_file_path'
if not os.path.exists(file_path):
    raise FileNotFoundError(f"文件 {file_path} 不存在")

权限检测：尝试打开文件时捕获 PermissionError 异常。如果需要在打开前检测权限，可以使用 os.access 方法，该方法可检查调用进程是否可以按指定模式访问文件。

import os

file_path = 'your_file_path'
if not os.access(file_path, os.R_OK):
    raise PermissionError(f"没有读取文件 {file_path} 的权限")

文件类型检测：根据文件扩展名或者文件头部信息（magic number）判断文件类型。例如，对于文本文件，可以简单地尝试以文本模式打开；对于二进制文件，可以使用 struct 模块检查文件头部信息。对于常见文件类型，Python 有一些第三方库可以帮助检测，如 python - magic 库（需要安装）。

import magic

file_path = 'your_file_path'
mime_type = magic.from_file(file_path, mime=True)
if mime_type not in ['text/plain', 'application/json']:  # 假设只处理文本和 JSON 文件
    raise ValueError(f"不支持的文件类型: {mime_type}")

二、异常处理策略

通用异常捕获：使用 try - except 语句捕获所有可能的异常，然后根据不同类型的异常进行不同处理。

try:
    with open('your_file_path', 'r') as f:
        data = f.read()
except FileNotFoundError as e:
    # 处理文件不存在错误
    pass
except PermissionError as e:
    # 处理权限不足错误
    pass
except UnicodeDecodeError as e:
    # 处理文本解码错误
    pass
except Exception as e:
    # 处理其他未预期的错误
    pass

特定异常处理：对于不同类型的文件读取错误，进行特定处理。例如，FileNotFoundError 可以记录文件路径并提示用户文件缺失；PermissionError 可以提示用户检查权限设置。

try:
    with open('your_file_path', 'r') as f:
        data = f.read()
except FileNotFoundError as e:
    logging.error(f"文件未找到: {e.filename}")
    # 可以尝试从备份路径读取等恢复操作
except PermissionError as e:
    logging.error(f"权限不足，无法读取文件: {e.filename}")
    # 可以尝试以管理员身份重新读取等恢复操作

三、错误日志记录与管理

使用 logging 模块：Python 的 logging 模块提供了强大的日志记录功能。

import logging

logging.basicConfig(
    level=logging.ERROR,
    format='%(asctime)s - %(levelname)s - %(message)s',
    filename='file_read_errors.log'
)

try:
    with open('your_file_path', 'r') as f:
        data = f.read()
except FileNotFoundError as e:
    logging.error(f"文件未找到: {e.filename}")
except PermissionError as e:
    logging.error(f"权限不足，无法读取文件: {e.filename}")

日志级别设置：使用不同的日志级别（DEBUG、INFO、WARN、ERROR、CRITICAL）来区分不同严重程度的错误。在开发阶段，可以将日志级别设置为 DEBUG 以便获取更多详细信息；在生产环境，设置为 ERROR 或更高级别以减少日志量。
日志滚动：对于大型项目，日志文件可能会变得很大。可以使用 logging.handlers.RotatingFileHandler 进行日志滚动，防止日志文件过大。

import logging
from logging.handlers import RotatingFileHandler

logger = logging.getLogger(__name__)
logger.setLevel(logging.ERROR)

handler = RotatingFileHandler('file_read_errors.log', maxBytes=1024*1024, backupCount=5)
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)

logger.addHandler(handler)

四、错误恢复策略

重试机制：对于一些临时性错误（如由于文件系统短暂繁忙导致的读取失败），可以使用重试机制。tenacity 库提供了方便的重试功能。

from tenacity import retry, stop_after_attempt, wait_fixed

@retry(stop=stop_after_attempt(3), wait=wait_fixed(1))
def read_file():
    with open('your_file_path', 'r') as f:
        return f.read()

备份文件读取：如果主文件读取失败，可以尝试从备份文件读取数据。

try:
    with open('primary_file_path', 'r') as f:
        data = f.read()
except FileNotFoundError:
    try:
        with open('backup_file_path', 'r') as f:
            data = f.read()
    except FileNotFoundError:
        logging.error("主文件和备份文件均未找到")

跳过错误文件：在处理多个文件时，如果某个文件读取错误，可以记录错误并跳过该文件，继续处理其他文件。

file_paths = ['file1.txt', 'file2.txt', 'file3.txt']
for file_path in file_paths:
    try:
        with open(file_path, 'r') as f:
            data = f.read()
            # 处理文件数据
    except Exception as e:
        logging.error(f"读取文件 {file_path} 时出错: {e}")
        continue

五、高并发场景下的可靠性和性能保证

线程安全的日志记录：在高并发场景下，多个线程可能同时记录日志。确保日志记录操作是线程安全的，logging 模块默认是线程安全的。
资源隔离：为每个并发任务分配独立的文件读取资源，避免资源竞争。例如，使用线程池或进程池，每个工作线程或进程处理独立的文件读取任务。

import concurrent.futures

def read_file_worker(file_path):
    try:
        with open(file_path, 'r') as f:
            return f.read()
    except Exception as e:
        logging.error(f"读取文件 {file_path} 时出错: {e}")
        return None

file_paths = ['file1.txt', 'file2.txt', 'file3.txt']
with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(read_file_worker, file_paths))

限制并发数：使用信号量（Semaphore）或其他类似机制限制同时进行的文件读取操作数量，防止系统资源耗尽。

import threading
import time

semaphore = threading.Semaphore(5)  # 最多允许 5 个并发操作

def read_file_with_semaphore(file_path):
    with semaphore:
        try:
            with open(file_path, 'r') as f:
                time.sleep(1)  # 模拟文件读取操作
                return f.read()
        except Exception as e:
            logging.error(f"读取文件 {file_path} 时出错: {e}")
            return None

错误处理的一致性：确保在高并发环境下，每个任务的错误处理策略一致，避免部分任务因为错误处理不当导致系统不稳定。可以通过封装错误处理逻辑为函数，在每个并发任务中调用该函数来保证一致性。

星途面试题库

面试题：Python文件读取错误处理在复杂场景下的优化与设计

知识考点

面试题答案

Python 文件读取错误处理方案

一、错误检测机制

二、异常处理策略

三、错误日志记录与管理

四、错误恢复策略

五、高并发场景下的可靠性和性能保证