面试题：Python处理百万位大型文件时的多线程与多进程选择

多线程和多进程适用场景、优缺点

多线程

适用场景：I/O 密集型任务，如文件读取、网络请求等场景，因为线程切换开销小，在等待 I/O 操作完成时可以切换到其他线程继续执行，提高 CPU 利用率。
优点：线程间通信简单，共享进程资源，创建和销毁开销相对较小。
缺点：由于 Python 的全局解释器锁（GIL），在同一时间只有一个线程能执行 Python 字节码，对于 CPU 密集型任务无法利用多核优势。

多进程

适用场景：CPU 密集型任务，能够充分利用多核 CPU 的优势，提高计算速度。处理大型文件时，如果对每行数据的处理涉及复杂计算，多进程更为合适。
优点：每个进程有独立的内存空间，不存在 GIL 限制，可以充分利用多核 CPU 资源。
缺点：进程间通信相对复杂，创建和销毁开销较大，占用系统资源更多。

使用 `multiprocessing` 库并行处理文件示例

import multiprocessing


def process_line(line):
    # 简单文本转换示例，将每行字符串转为大写
    return line.upper()


def process_file_with_multiprocessing(file_path):
    with open(file_path, 'r', encoding='utf - 8') as f:
        lines = f.readlines()
    pool = multiprocessing.Pool()
    results = pool.map(process_line, lines)
    pool.close()
    pool.join()
    with open('output.txt', 'w', encoding='utf - 8') as f:
        for result in results:
            f.write(result)


if __name__ == '__main__':
    file_path = 'large_file.txt'
    process_file_with_multiprocessing(file_path)

使用 `threading` 库并行处理文件示例

import threading


def process_line(line, result_list):
    # 简单文本转换示例，将每行字符串转为大写
    result = line.upper()
    result_list.append(result)


def process_file_with_threading(file_path):
    result_list = []
    with open(file_path, 'r', encoding='utf - 8') as f:
        lines = f.readlines()
    threads = []
    for line in lines:
        t = threading.Thread(target=process_line, args=(line, result_list))
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    with open('output.txt', 'w', encoding='utf - 8') as f:
        for result in result_list:
            f.write(result)


file_path = 'large_file.txt'
process_file_with_threading(file_path)

星途面试题库

面试题：Python处理百万位大型文件时的多线程与多进程选择

知识考点

面试题答案

多线程和多进程适用场景、优缺点

多线程

多进程

使用 `multiprocessing` 库并行处理文件示例

使用 `threading` 库并行处理文件示例

面试题：Python处理百万位大型文件时的多线程与多进程选择

知识考点

面试题答案

多线程和多进程适用场景、优缺点

多线程

多进程

使用 multiprocessing 库并行处理文件示例

使用 threading 库并行处理文件示例

使用 `multiprocessing` 库并行处理文件示例

使用 `threading` 库并行处理文件示例