面试题：Python利用文本文件与正则表达式实现递归目录下特定文本替换

实现方案

遍历目录：使用os.walk函数遍历复杂目录结构，获取所有文本文件路径。
读取文件内容：针对每个文件，根据其编码格式（通过chardet库猜测编码）读取文件内容。
正则匹配与替换：使用re模块的sub函数，根据给定的正则表达式模式匹配并替换字符串。
写入文件：将替换后的内容写回原文件，注意保持原文件编码。
记录日志：将每次替换操作记录到一个单独的日志文件中。

核心代码

import os
import re
import chardet


def replace_pattern_in_files(directory):
    pattern = r'old_value_(\d{1,2})'
    replacement = lambda match: f'new_value_{match.group(1)}'
    log_file = open('replacement_log.txt', 'w', encoding='utf - 8')

    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                with open(file_path, 'rb') as f:
                    raw_data = f.read()
                    encoding = chardet.detect(raw_data)['encoding']
                try:
                    with open(file_path, 'r', encoding=encoding) as f:
                        content = f.read()
                    new_content = re.sub(pattern, replacement, content)
                    if new_content != content:
                        with open(file_path, 'w', encoding=encoding) as f:
                            f.write(new_content)
                        log_message = f'File: {file_path}, replaced old_value with new_value\n'
                        log_file.write(log_message)
                except UnicodeDecodeError:
                    log_message = f'Failed to decode file: {file_path}\n'
                    log_file.write(log_message)
    log_file.close()


if __name__ == "__main__":
    target_directory = '.'
    replace_pattern_in_files(target_directory)

上述代码首先定义了要匹配的正则表达式模式和替换字符串的方式。然后通过os.walk遍历指定目录下的所有文本文件，使用chardet库猜测文件编码，读取文件内容并进行替换，若有替换则写回文件并记录日志。日志文件记录每次成功替换的文件路径以及替换操作，若文件解码失败也会记录相应信息。

面试题：Python利用文本文件与正则表达式实现递归目录下特定文本替换

知识考点

面试题答案

实现方案

核心代码