背景
了解分词过程
概述
lucene的查询过程:
(String query , String field ) -> Query
整个过程是将字符串"how old"
切割成一个个Term Query
最后会构造成一棵语法树:
should:[how,old]
背景
lucene 的分词是一个基本的话题,主要是利用:incrementToken
这个抽象方法以及继承AttributeSource
这个类
public abstract class TokenStream extends AttributeSource implements Closeable {
public abstract boolean incrementToken() throws IOException;
}
lucene boolean clause
lucene 的bolean 子句有四种:
- MUST
- FILTER
- SHOULD
- MUST_NOT
堆栈
<init>:202, TermQuery (org.apache.lucene.search)
newTermQuery:640, QueryBuilder (org.apache.lucene.util)
add:408, QueryBuilder (org.apache.lucene.util)
analyzeMultiBoolean:427, QueryBuilder (org.apache.lucene.util)
createFieldQuery:364, QueryBuilder (org.apache.lucene.util)
createFieldQuery:257, QueryBuilder (org.apache.lucene.util)
newFieldQuery:468, QueryParserBase (org.apache.lucene.queryparser.classic)
getFieldQuery:457, QueryParserBase (org.apache.lucene.queryparser.classic)
MultiTerm:680, QueryParser (org.apache.lucene.queryparser.classic)
Query:233, QueryParser (org.apache.lucene.queryparser.classic)
TopLevelQuery:223, QueryParser (org.apache.lucene.queryparser.classic)
parse:136, QueryParserBase (org.apache.lucene.queryparser.classic)
testParse:20, ParseTest (com.dinosaur.lucene.demo)
排序算分
BlockMaxMaxscoreScorer
的matches
会将所有的分词算出来,然后计算分数总和
score:250, BM25Similarity$BM25Scorer (org.apache.lucene.search.similarities)
score:60, LeafSimScorer (org.apache.lucene.search)
score:75, TermScorer (org.apache.lucene.search)
matches:240, BlockMaxMaxscoreScorer$2 (org.apache.lucene.search)
doNext:85, TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator (org.apache.lucene.search)
advance:78, TwoPhaseIterator$TwoPhaseIteratorAsDocIdSetIterator (org.apache.lucene.search)
score:232, BooleanWeight$2 (org.apache.lucene.search)
score:38, BulkScorer (org.apache.lucene.search)
search:776, IndexSearcher (org.apache.lucene.search)
search:694, IndexSearcher (org.apache.lucene.search)
search:688, IndexSearcher (org.apache.lucene.search)
searchAfter:523, IndexSearcher (org.apache.lucene.search)
search:538, IndexSearcher (org.apache.lucene.search)
doPagingSearch:161, SearchFiles (com.dinosaur.lucene.skiptest)
testSearch:131, SearchFiles (com.dinosaur.lucene.skiptest)