面试题：HBase中如何配置MapReduce以支持不同数据源作为输入

配置HBase输入：

在MapReduce作业配置中，设置HBase表作为输入源。首先，需要导入相关的HBase和Hadoop依赖包。
使用TableMapReduceUtil.initTableMapperJob方法来初始化HBase表的Mapper作业。例如：

Configuration conf = HBaseConfiguration.create();
Job job = Job.getInstance(conf, "HBase and Text File Join");
TableMapReduceUtil.initTableMapperJob(
    "your_table_name", // HBase表名
    scan, // 可以设置扫描条件
    YourHBaseMapper.class, // 自定义的HBase Mapper类
    Text.class, // Mapper输出的Key类型
    Text.class, // Mapper输出的Value类型
    job);

配置文本文件输入：
- 使用FileInputFormat.addInputPath方法将HDFS上的文本文件添加为MapReduce作业的输入路径。例如：
```
Path inputPath = new Path("/path/to/your/text/file");
FileInputFormat.addInputPath(job, inputPath);
```

自定义Mapper类：

编写一个自定义的Mapper类，该类需要继承Mapper类。在map方法中，需要判断输入数据的来源是HBase表还是文本文件。
对于HBase表的数据，从context.getInputSplit()中判断是否为TableSplit，如果是，则处理HBase数据。例如：

public class YourHBaseMapper extends TableMapper<Text, Text> {
    @Override
    protected void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
        // 处理HBase数据
        Text key = new Text(Bytes.toString(row.get()));
        Text val = new Text(Bytes.toString(value.getValue(Bytes.toBytes("cf"), Bytes.toBytes("col"))));
        context.write(key, val);
    }
}

对于文本文件的数据，从context.getInputSplit()中判断是否为FileSplit，如果是，则处理文本文件数据。例如：

public class YourTextMapper extends Mapper<LongWritable, Text, Text, Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 处理文本文件数据
        String[] parts = value.toString().split(",");
        Text newKey = new Text(parts[0]);
        Text newVal = new Text(parts[1]);
        context.write(newKey, newVal);
    }
}

Reducer阶段关联数据：

编写一个自定义的Reducer类，继承Reducer类。在reduce方法中，通过相同的Key来关联HBase表数据和文本文件数据。例如：

public class YourReducer extends Reducer<Text, Text, Text, Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        List<String> hbaseData = new ArrayList<>();
        List<String> textData = new ArrayList<>();
        for (Text val : values) {
            // 根据数据特征判断来源，分别放入不同列表
            if (val.toString().startsWith("hbase_")) {
                hbaseData.add(val.toString());
            } else {
                textData.add(val.toString());
            }
        }
        // 进行关联操作，这里简单示例打印关联结果
        for (String hbaseVal : hbaseData) {
            for (String textVal : textData) {
                context.write(key, new Text(hbaseVal + " joined with " + textVal));
            }
        }
    }
}

设置Reducer并提交作业：

在MapReduce作业配置中，设置Reducer类。例如：

job.setReducerClass(YourReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true)? 0 : 1);

这样就可以配置一个MapReduce作业，既能从HBase表中读取数据，又能从文本文件中读取数据，并在处理逻辑中对两者数据进行关联操作。

面试题：HBase中如何配置MapReduce以支持不同数据源作为输入

知识考点

面试题答案