MST

星途 面试题库

面试题:Python文本分析之词频统计

给定一个文本文件,使用Python编写代码统计文件中每个单词出现的频率,并按照词频从高到低排序输出,要求处理文本时忽略大小写,同时去除常见的标点符号。
39.2万 热度难度
编程语言Python

知识考点

AI 面试

面试题答案

一键面试
import string
from collections import Counter

def count_word_frequency(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        text = file.read()
        # 转换为小写
        text = text.lower()
        # 去除标点符号
        translator = str.maketrans('', '', string.punctuation)
        text = text.translate(translator)
        # 分割单词
        words = text.split()
        # 统计词频
        word_counter = Counter(words)
        # 按照词频从高到低排序
        sorted_word_counter = sorted(word_counter.items(), key=lambda item: item[1], reverse=True)
        return sorted_word_counter

# 示例调用
file_path = 'your_file.txt'
result = count_word_frequency(file_path)
for word, count in result:
    print(f'{word}: {count}')