MST

星途 面试题库

面试题:ElasticSearch中如何通过顶部命中聚合实现高亮数据提取

在ElasticSearch中,假设已有一个包含文章内容的索引,要求通过顶部命中聚合获取每篇文章前5条匹配结果,并对匹配的关键词进行高亮显示,简述实现步骤和关键代码片段。
43.9万 热度难度
数据库ElasticSearch

知识考点

AI 面试

面试题答案

一键面试
  1. 实现步骤
    • 构建查询:首先构建一个针对文章内容的查询,例如使用match查询来匹配关键词。
    • 设置高亮:为查询结果设置高亮显示,指定需要高亮的字段为文章内容字段。
    • 顶部命中聚合:使用顶部命中聚合(top_hits),并设置size为5,以获取每篇文章前5条匹配结果。
  2. 关键代码片段(以Python的Elasticsearch客户端为例)
from elasticsearch import Elasticsearch

es = Elasticsearch()

query = {
    "query": {
        "match": {
            "article_content": "关键词" # 将"关键词"替换为实际要匹配的词
        }
    },
    "highlight": {
        "fields": {
            "article_content": {}
        }
    },
    "aggs": {
        "article_top_hits": {
            "top_hits": {
                "size": 5
            }
        }
    }
}

response = es.search(index="your_index_name", body=query)

在上述代码中:

  • match查询指定了要匹配的字段article_content和关键词。
  • highlight部分设置了对article_content字段进行高亮。
  • aggs中的top_hits聚合确保每篇文章获取前5条匹配结果。your_index_name需替换为实际的索引名称。

如果使用其他语言客户端,原理类似,只是语法会有所不同。例如在Java中使用Elasticsearch Java High - Level REST Client:

import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.topn.TopHitsAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;

import java.io.IOException;
import java.util.Map;

public class ElasticsearchExample {
    public static void main(String[] args) throws IOException {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")));

        SearchRequest searchRequest = new SearchRequest("your_index_name");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(QueryBuilders.matchQuery("article_content", "关键词"));

        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("article_content");
        searchSourceBuilder.highlighter(highlightBuilder);

        TopHitsAggregationBuilder topHitsAggregationBuilder = AggregationBuilders.topHits("article_top_hits").size(5);
        searchSourceBuilder.aggregation(topHitsAggregationBuilder);

        searchRequest.source(searchSourceBuilder);

        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

        // 处理结果
        for (SearchHit hit : searchResponse.getHits().getHits()) {
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            HighlightField highlightField = highlightFields.get("article_content");
            // 处理高亮结果
        }

        client.close();
    }
}

在Java代码中,同样通过matchQuery构建查询,HighlightBuilder设置高亮,topHits聚合获取前5条结果。your_index_name需替换为实际索引名称,关键词需替换为实际要匹配的词。