面试题：Python列表性能优化与高级操作

思路

使用collections.Counter来统计每个元素出现的次数，Counter是一个专门用于计数的字典子类，效率较高。
从统计结果中获取出现次数最多的前10个元素，可以使用Counter.most_common方法，该方法会返回一个按出现次数从高到低排序的列表，取前10个元素即可。

代码实现

from collections import Counter


def find_top_10(large_list):
    counter = Counter(large_list)
    top_10 = counter.most_common(10)
    return top_10


# 示例用法
large_list = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] * 100000
result = find_top_10(large_list)
print(result)

不同Python版本的影响及优化

Python 2.x：
- collections.Counter在Python 2.7中引入。如果在Python 2.6及以下版本，没有Counter，可以手动实现一个类似功能，使用普通字典来统计元素出现次数。例如：

def find_top_10_old(large_list):
    count_dict = {}
    for num in large_list:
        if num in count_dict:
            count_dict[num] += 1
        else:
            count_dict[num] = 1
    sorted_items = sorted(count_dict.items(), key=lambda x: x[1], reverse=True)
    return sorted_items[:10]


# 示例用法
large_list = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] * 100000
result = find_top_10_old(large_list)
print(result)

Python 3.x：
- 从Python 3.5开始，字典保持插入顺序。Counter是基于字典的，这对于需要依赖插入顺序的场景（虽然这里不涉及）有影响。但整体Counter.most_common的功能在Python 3.x版本中表现稳定。
- 如果在内存非常紧张的情况下，可以考虑使用生成器来逐块处理large_list，而不是一次性将整个列表传入Counter。例如：

from collections import Counter


def read_large_list_in_chunks(file_path, chunk_size=10000):
    with open(file_path) as f:
        chunk = []
        for line in f:
            num = int(line.strip())
            chunk.append(num)
            if len(chunk) == chunk_size:
                yield chunk
                chunk = []
        if chunk:
            yield chunk


def find_top_10_from_file(file_path):
    counter = Counter()
    for chunk in read_large_list_in_chunks(file_path):
        counter.update(chunk)
    top_10 = counter.most_common(10)
    return top_10


# 假设large_list数据保存在文件中
file_path = 'large_list.txt'
result = find_top_10_from_file(file_path)
print(result)

这样可以减少内存占用，适用于无法一次性将整个列表读入内存的情况。

面试题：Python列表性能优化与高级操作

知识考点

面试题答案

思路

代码实现

不同Python版本的影响及优化