面试题答案
一键面试实现思路
- 时间范围筛选:利用Elasticsearch的日期范围查询,获取过去一个月内的购买记录。
- 用户维度聚合:按用户ID进行聚合,统计每个用户购买商品的种类。
- 排序取前10:对聚合结果按商品种类数进行降序排序,取前10个用户。
- 计算平均数:对这前10个用户购买商品种类数求和并计算平均数。
ElasticSearch聚合语句
假设索引名为ecommerce_purchases
,文档结构如下:
{
"user_id": "12345",
"product_id": "prod_123",
"purchase_time": "2024-10-01T12:00:00Z"
}
聚合查询如下:
{
"query": {
"range": {
"purchase_time": {
"gte": "now-1M/M",
"lt": "now/M"
}
}
},
"aggs": {
"top_users": {
"terms": {
"field": "user_id",
"size": 10,
"order": {
"unique_product_count": "desc"
}
},
"aggs": {
"unique_product_count": {
"cardinality": {
"field": "product_id"
}
}
}
}
}
}
查询结果会包含前10个购买商品种类最多的用户及其购买商品种类数。
计算平均数可在应用程序层面处理,例如用Python:
from elasticsearch import Elasticsearch
es = Elasticsearch("http://localhost:9200")
response = es.search(index="ecommerce_purchases", body={
"query": {
"range": {
"purchase_time": {
"gte": "now-1M/M",
"lt": "now/M"
}
}
},
"aggs": {
"top_users": {
"terms": {
"field": "user_id",
"size": 10,
"order": {
"unique_product_count": "desc"
}
},
"aggs": {
"unique_product_count": {
"cardinality": {
"field": "product_id"
}
}
}
}
}
})
total_count = 0
for bucket in response['aggregations']['top_users']['buckets']:
total_count += bucket['unique_product_count']['value']
average = total_count / len(response['aggregations']['top_users']['buckets']) if response['aggregations']['top_users']['buckets'] else 0
print(f"平均数: {average}")