背景
需要编译和了解lucene代码
编译
因为lucene锁死了版本,所以要切换成jdk17,我本地是jdk18
clone代码
## clone 代码
git clone https://github.com/apache/lucene.git
### 切换目录
cd lucene
### 编译
./gradlew
## 如果是翻墙,可以使用代理,这样会快一点
## 指定域名和端口
./gradlew -DsocksProxyHost=192.168.1.102 -DsocksProxyPort=1081
启动和测试
### 打包demo
./gradlew lucene:demo:jar
### 执行demo
java -cp /home/ubuntu/lucene-9.1.0/lucene/demo/build/classes/java/main:/home/ubuntu/lucene-9.1.0/lucene/core/build/classes/java/main/ org.apache.lucene.demo.IndexFiles -
操作系统是ubuntu
切换jdk17命令如下:
### 安装jdk17
sudo apt install openjdk-17-jdk
# Configure Java 切换java
sudo update-alternatives --config java
# Configure Java Compiler 切换javac
sudo update-alternatives --config javac
### 查看切换之后的命令,java 已经是17了
java --version
openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)
遇到的错误
gradle-wrapper.jar 下载不下来,跳过证书:
wget --no-check-certificate https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar
然后放到{$luceneGitDir}/gradle/wrapper/
下面 , 这里luceneGitDir
是你的git clone
下来的lucuene 目录
相关代码
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setUseCompoundFile(false); // 生成多个文件
写入header
对应的jdb调试
main[1] stop in org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136
Deferring breakpoint org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136.
It will be set after the class is loaded.
main[1] cont
> Set deferred breakpoint org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136
Breakpoint hit: "thread=main", org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(), line=136 bci=180
136 CodecUtil.writeIndexHeader(
main[1] list
132
133 fieldsStream =
134 directory.createOutput(
135 IndexFileNames.segmentFileName(segment, segmentSuffix, FIELDS_EXTENSION), context);
136 => CodecUtil.writeIndexHeader(
137 fieldsStream, formatName, VERSION_CURRENT, si.getId(), segmentSuffix);
138 assert CodecUtil.indexHeaderLength(formatName, segmentSuffix)
139 == fieldsStream.getFilePointer();
140
141 indexWriter =
main[1] print formatName
formatName = "Lucene90StoredFieldsFastData"
对应堆栈
[1] org.apache.lucene.store.OutputStreamIndexOutput.writeByte (OutputStreamIndexOutput.java:54)
[2] org.apache.lucene.codecs.CodecUtil.writeBEInt (CodecUtil.java:653)
[3] org.apache.lucene.codecs.CodecUtil.writeHeader (CodecUtil.java:82)
[4] org.apache.lucene.codecs.CodecUtil.writeIndexHeader (CodecUtil.java:125)
[5] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init> (Lucene90CompressingStoredFieldsWriter.java:128)
[6] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter (Lucene90CompressingStoredFieldsFormat.java:140)
[7] org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsWriter (Lucene90StoredFieldsFormat.java:154)
[8] org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter (StoredFieldsConsumer.java:49)
[9] org.apache.lucene.index.StoredFieldsConsumer.startDocument (StoredFieldsConsumer.java:56)
[10] org.apache.lucene.index.IndexingChain.startStoredFields (IndexingChain.java:556)
[11] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:587)
[12] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[13] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[14] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[15] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[16] org.apache.lucene.index.IndexWriter.addDocument (IndexWriter.java:1,469)
[17] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:271)
[18] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[19] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[20] java.nio.file.Files.walkFileTree (Files.java:2,725)
[21] java.nio.file.Files.walkFileTree (Files.java:2,797)
[22] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[23] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)
倒排索引
main[1] where
[1] org.apache.lucene.index.TermsHashPerField.initStreamSlices (TermsHashPerField.java:150)
[2] org.apache.lucene.index.TermsHashPerField.add (TermsHashPerField.java:198)
[3] org.apache.lucene.index.IndexingChain$PerField.invert (IndexingChain.java:1,224)
[4] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:729)
[5] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
[6] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[7] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[8] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[9] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:277)
[11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[13] java.nio.file.Files.walkFileTree (Files.java:2,725)
[14] java.nio.file.Files.walkFileTree (Files.java:2,797)
[15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)
写入内容
main[1] where
[1] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.writeField (Lucene90CompressingStoredFieldsWriter.java:276)
[2] org.apache.lucene.index.StoredFieldsConsumer.writeField (StoredFieldsConsumer.java:65)
[3] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:749)
[4] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
[5] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[6] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[7] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[8] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[9] org.apache.lucene.index.IndexWriter.addDocument (IndexWriter.java:1,469)
[10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:271)
[11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[13] java.nio.file.Files.walkFileTree (Files.java:2,725)
[14] java.nio.file.Files.walkFileTree (Files.java:2,797)
[15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)
查看fdt文件
hexdump -C _0.fdt
00000000 3f d7 6c 17 1c 4c 75 63 65 6e 65 39 30 53 74 6f |?.l..Lucene90Sto|
00000010 72 65 64 46 69 65 6c 64 73 46 61 73 74 44 61 74 |redFieldsFastDat|
00000020 61 00 00 00 01 85 88 12 2b 0c 73 6b 95 30 38 76 |a.......+.sk.08v|
00000030 c9 0a 2a 52 29 00 00 0a 00 01 00 1c 02 06 03 07 |..*R)...........|
00000040 07 07 07 07 07 07 07 07 20 00 1a 60 2f 68 6f 6d |........ ..`/hom|
00000050 65 2f 60 75 62 75 6e 74 75 60 2f 64 6f 63 2f 6d |e/`ubuntu`/doc/m|
00000060 60 6f 6e 67 6f 2e 74 60 78 74 00 1a 2f 68 60 6f |`ongo.t`xt../h`o|
00000070 6d 65 2f 75 62 60 75 6e 74 75 2f 64 60 6f 63 2f |me/ub`untu/d`oc/|
00000080 68 65 6c 60 6c 6f 2e 74 78 74 c0 28 93 e8 00 00 |hel`lo.txt.(....|
00000090 00 00 00 00 00 00 c8 75 0a 41 |.......u.A|
0000009a
fdt描述
然后分析fdt格式:
[1-4]
代表第一个字节到第四个字节
[1-4]
前四位字节是大端的magic number CODEC_MAGIC = 0x3fd76c17
[5-33]
第五个字节描述字符串长度,后面的[6-33]
是具体的字符串,也就是16进制1c
也就是10进制的28
, 因为字符串长度是28的字符串Lucene90StoredFieldsFastData
[34-37]
字符串后面是写死的版本大端的1
[38-53]
16字节用唯一id描述这个文件
缓冲池
TermsHashPerField
持有三个缓冲池intPool,bytePool,termBytePool
TermsHashPerField(
int streamCount,
IntBlockPool intPool,
ByteBlockPool bytePool,
ByteBlockPool termBytePool,
Counter bytesUsed,
TermsHashPerField nextPerField,
String fieldName,
IndexOptions indexOptions) {
this.intPool = intPool;
this.bytePool = bytePool;
this.streamCount = streamCount;
this.fieldName = fieldName;
this.nextPerField = nextPerField;
assert indexOptions != IndexOptions.NONE;
this.indexOptions = indexOptions;
PostingsBytesStartArray byteStarts = new PostingsBytesStartArray(this, bytesUsed);
bytesHash = new BytesRefHash(termBytePool, HASH_INIT_SIZE, byteStarts);
}
生成term
main[1] where
[1] org.apache.lucene.util.BytesRefHash.add (BytesRefHash.java:247)
[2] org.apache.lucene.index.TermsHashPerField.add (TermsHashPerField.java:193)
[3] org.apache.lucene.index.IndexingChain$PerField.invert (IndexingChain.java:1,224)
[4] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:729)
[5] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
[6] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
[7] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
[8] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
[9] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
[10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:277)
[11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
[12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
[13] java.nio.file.Files.walkFileTree (Files.java:2,725)
[14] java.nio.file.Files.walkFileTree (Files.java:2,797)
[15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
[16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)
arch 查询
main[1] where
[1] org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact (SegmentTermsEnum.java:476)
[2] org.apache.lucene.index.TermStates.loadTermsEnum (TermStates.java:117)
[3] org.apache.lucene.index.TermStates.build (TermStates.java:102)
[4] org.apache.lucene.search.TermQuery.createWeight (TermQuery.java:227)
[5] org.apache.lucene.search.IndexSearcher.createWeight (IndexSearcher.java:885)
[6] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:686)
[7] org.apache.lucene.search.IndexSearcher.searchAfter (IndexSearcher.java:532)
[8] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:542)
[9] org.apache.lucene.demo.SearchFiles.doPagingSearch (SearchFiles.java:180)
[10] org.apache.lucene.demo.SearchFiles.main (SearchFiles.java:150)