面试题：Python字典在大规模数据处理与优化中的应用

假设你要处理一个非常大的文本文件，文件中每行是一个JSON格式的数据，其中包含'user_id'和'event_type'等多个字段。由于内存有限，不能一次性将所有数据读入内存。请设计一个用Python字典高效处理此文件的方案，统计每种'event_type'下不同'user_id'出现的次数，同时要考虑性能优化，说明你的设计思路并编写核心代码。

13.9万热度

难度

编程语言Python

设计思路

逐行读取大文本文件，避免一次性将整个文件读入内存。
使用Python字典来统计每种event_type下不同user_id出现的次数。外层字典的键为event_type，内层字典的键为user_id，值为出现次数。
为了优化性能，在读取文件时可以使用with open语句来确保文件正确关闭，并且在处理JSON数据时，使用json.loads方法。

核心代码

import json

result = {}
with open('large_text_file.txt', 'r') as file:
    for line in file:
        try:
            data = json.loads(line)
            event_type = data.get('event_type')
            user_id = data.get('user_id')
            if event_type and user_id:
                if event_type not in result:
                    result[event_type] = {}
                if user_id not in result[event_type]:
                    result[event_type][user_id] = 1
                else:
                    result[event_type][user_id] += 1
        except json.JSONDecodeError:
            continue

print(result)

面试题：Python字典在大规模数据处理与优化中的应用

知识考点

面试题答案

设计思路

核心代码