面试题：Python字符串删除操作专家级性能优化

1. 数据读取

技术：使用Python的with open逐行读取大文件，避免一次性将整个文件读入内存。例如：

with open('large_log_file.log', 'r', encoding='utf - 8') as f:
    for line in f:
        pass

性能优化点：设置合适的encoding，确保字符编码处理正确，减少编码转换开销。如果文件非常大，可以考虑使用mmap模块，它能将文件映射到内存，像访问普通字符串一样访问文件内容，在某些场景下能提升读取效率。
应对内存溢出：逐行读取，不将整个文件内容一次性读入内存，这样能有效避免因文件过大导致的内存溢出。

2. 字符串删除操作

技术：使用正则表达式re模块进行匹配删除操作。例如，要删除以特定前缀开头的子串：

import re
pattern = re.compile(r'^特定前缀.*? ')
line = "特定前缀abc def 特定前缀ghi"
new_line = pattern.sub('', line)

性能优化点：预编译正则表达式，如上述代码中使用re.compile，这样在多次使用相同模式匹配时可提高效率。如果模式比较简单，也可以考虑使用字符串的startswith方法结合split等方法来实现，通常纯字符串操作比正则表达式效率更高，例如：

prefix = "特定前缀"
line = "特定前缀abc def 特定前缀ghi"
if line.startswith(prefix):
    new_line = line[len(prefix):]

应对内存溢出：每次只对单行数据进行操作，处理完一行释放相关内存资源，避免积累过多中间数据导致内存溢出。

3. 结果输出

技术：使用with open将处理后的结果写入新文件。例如：

with open('output_file.log', 'w', encoding='utf - 8') as out_f:
    with open('large_log_file.log', 'r', encoding='utf - 8') as f:
        for line in f:
            new_line = pattern.sub('', line)
            out_f.write(new_line + '\n')

性能优化点：可以设置buffering参数来调整写入缓冲区大小，默认值为io.DEFAULT_BUFFER_SIZE，如果数据量非常大，可以适当增大缓冲区大小，减少磁盘I/O次数。例如with open('output_file.log', 'w', encoding='utf - 8', buffering = 65536) as out_f:。
应对内存溢出：同样，边处理边写入，避免在内存中积累大量处理后的结果。

完整代码示例

import re


def process_log_file(input_file, output_file, prefix):
    pattern = re.compile(rf'^{prefix}.*? ')
    with open(output_file, 'w', encoding='utf - 8', buffering = 65536) as out_f:
        with open(input_file, 'r', encoding='utf - 8') as f:
            for line in f:
                new_line = pattern.sub('', line)
                out_f.write(new_line + '\n')


if __name__ == "__main__":
    input_file = 'large_log_file.log'
    output_file = 'output_file.log'
    prefix = "特定前缀"
    process_log_file(input_file, output_file, prefix)

星途面试题库

面试题：Python字符串删除操作专家级性能优化

知识考点

面试题答案

1. 数据读取

2. 字符串删除操作

3. 结果输出

完整代码示例