MST

星途 面试题库

面试题:Java中StringTokenizer在多线程文本处理优化场景下的应用

有一个非常大的文本文件,文件内容按行分隔,每行是一个类似'key:value'格式的键值对,可能有数十万行甚至更多。现在需要在多线程环境下,使用StringTokenizer对每行内容进行解析,提取出键和值,并将结果存储到一个线程安全的集合中。请设计并实现一个高效的Java多线程解决方案,考虑如何避免线程安全问题、优化性能,并且要对异常情况进行合理处理。描述整体设计思路,并编写关键部分的代码。
24.1万 热度难度
编程语言Java

知识考点

AI 面试

面试题答案

一键面试

设计思路

  1. 数据结构选择:使用ConcurrentHashMap作为线程安全的集合来存储解析后的键值对。
  2. 多线程处理:使用线程池来管理线程,避免频繁创建和销毁线程带来的开销。
  3. 任务划分:将大文件按行分割,每个线程负责处理一部分行数据。
  4. 异常处理:在解析过程中捕获异常,并进行适当的日志记录,确保程序的健壮性。

关键代码

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.StringTokenizer;

public class LargeFileParser {
    private static final int THREAD_POOL_SIZE = 10;
    private final String filePath;
    private final ConcurrentHashMap<String, String> resultMap = new ConcurrentHashMap<>();

    public LargeFileParser(String filePath) {
        this.filePath = filePath;
    }

    public void parseFile() {
        ExecutorService executorService = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
        try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
            String line;
            int lineNumber = 0;
            while ((line = reader.readLine()) != null) {
                int currentLine = lineNumber++;
                executorService.submit(() -> {
                    try {
                        parseLine(line, currentLine);
                    } catch (Exception e) {
                        System.err.println("Error parsing line " + currentLine + ": " + e.getMessage());
                    }
                });
            }
        } catch (IOException e) {
            System.err.println("Error reading file: " + e.getMessage());
        } finally {
            executorService.shutdown();
            try {
                if (!executorService.awaitTermination(60, TimeUnit.SECONDS)) {
                    executorService.shutdownNow();
                    if (!executorService.awaitTermination(60, TimeUnit.SECONDS)) {
                        System.err.println("Pool did not terminate");
                    }
                }
            } catch (InterruptedException ie) {
                executorService.shutdownNow();
                Thread.currentThread().interrupt();
            }
        }
    }

    private void parseLine(String line, int lineNumber) {
        StringTokenizer tokenizer = new StringTokenizer(line, ":");
        if (tokenizer.countTokens() != 2) {
            throw new IllegalArgumentException("Invalid line format at line " + lineNumber + ": " + line);
        }
        String key = tokenizer.nextToken();
        String value = tokenizer.nextToken();
        resultMap.put(key, value);
    }

    public ConcurrentHashMap<String, String> getResultMap() {
        return resultMap;
    }

    public static void main(String[] args) {
        if (args.length != 1) {
            System.err.println("Usage: java LargeFileParser <filePath>");
            return;
        }
        LargeFileParser parser = new LargeFileParser(args[0]);
        parser.parseFile();
        ConcurrentHashMap<String, String> result = parser.getResultMap();
        System.out.println("Parsed result size: " + result.size());
    }
}