02 es深分页问题
es查询时会根据from和to两个参数来查询数据,from 参数的值很大时(如 from=10000 以上),ElasticSearch 需要跳过大量数据来获取指定页的内容,这种情况称为深度分页
es数据是存储在多个分片(Shard)上的,每个分片本质上是一个独立的 Lucene 索引。分页查询会在每个分片上独立执行查询,然后将结果合并和排序
解决es深分页问题可以使用以下两种方式
游标API是处理深分页的最佳方式。它允许你保持一个搜索上下文一段时间,然后在这个上下文中检索数据。这种方式适用于需要跨多个请求检索大量数据的情况
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| RestHighLevelClient client = SearchRequest searchRequest = new SearchRequest("your_index"); SearchScrollRequest scrollRequest = new SearchScrollRequest("scroll_id"); scrollRequest.scroll(TimeValue.timeValueMinutes(1L));
Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchAllQuery()); searchSourceBuilder.size(10); searchRequest.source(searchSourceBuilder); searchRequest.scroll(scroll);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); String scrollId = searchResponse.getScrollId();
SearchHits hits = searchResponse.getHits(); for (SearchHit hit : hits) { }
while (hits.getHits().length > 0) { scrollRequest.setScrollId(scrollId); searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT); scrollId = searchResponse.getScrollId(); hits = searchResponse.getHits(); for (SearchHit hit : hits) { } client.clearScroll(new ClearScrollRequest().addScrollId(scrollId)); } scroll_id查询存在问题是不实时,因为查询是基于快照的;scroll_id会占用资源,需要适时清理
|
2.使用search_after参数
search_after参数可以用来实现类似分页的效果,但它不是基于页码的,而是基于上一条记录的排序值。这对于需要基于时间或ID排序的分页很有用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchAllQuery()); searchSourceBuilder.size(10); searchSourceBuilder.sort("your_sort_field", SortOrder.ASC);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); SearchHits hits = searchResponse.getHits();
for (SearchHit hit : hits) { }
Object[] lastSortValue = hits.getAt(hits.getHits().length - 1).getSortValues(); searchSourceBuilder.searchAfter(lastSortValue);
|
使用search_after在并发读写时,会有数据分页中断或重复的问题,需要结合PIT来使用;
详见博客 ElasticSearch中的深度分页问题