Skip to content

lucene 编译安装

Posted on:June 19, 2022 at 08:49 PM

背景

需要编译和了解lucene代码

编译

因为lucene锁死了版本,所以要切换成jdk17,我本地是jdk18

clone代码

## clone 代码
git clone https://github.com/apache/lucene.git

### 切换目录
cd lucene

### 编译
./gradlew

## 如果是翻墙,可以使用代理,这样会快一点
## 指定域名和端口
./gradlew -DsocksProxyHost=192.168.1.102 -DsocksProxyPort=1081

启动和测试

### 打包demo
./gradlew lucene:demo:jar

### 执行demo
java -cp  /home/ubuntu/lucene-9.1.0/lucene/demo/build/classes/java/main:/home/ubuntu/lucene-9.1.0/lucene/core/build/classes/java/main/  org.apache.lucene.demo.IndexFiles -

操作系统是ubuntu切换jdk17命令如下:

### 安装jdk17
sudo apt install openjdk-17-jdk
# Configure Java  切换java
sudo update-alternatives --config java

# Configure Java Compiler   切换javac
sudo update-alternatives --config javac


### 查看切换之后的命令,java  已经是17了
 java --version
openjdk 17.0.3 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, sharing)

遇到的错误

gradle-wrapper.jar 下载不下来,跳过证书:

wget --no-check-certificate  https://raw.githubusercontent.com/gradle/gradle/v7.3.3/gradle/wrapper/gradle-wrapper.jar

然后放到{$luceneGitDir}/gradle/wrapper/ 下面 , 这里luceneGitDir 是你的git clone 下来的lucuene 目录

相关代码

      IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
      iwc.setUseCompoundFile(false);  // 生成多个文件

写入header

对应的jdb调试

main[1] stop in  org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136
Deferring breakpoint org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136.
It will be set after the class is loaded.
main[1] cont
> Set deferred breakpoint org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter:136

Breakpoint hit: "thread=main", org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(), line=136 bci=180
136          CodecUtil.writeIndexHeader(

main[1] list
132    
133          fieldsStream =
134              directory.createOutput(
135                  IndexFileNames.segmentFileName(segment, segmentSuffix, FIELDS_EXTENSION), context);
136 =>       CodecUtil.writeIndexHeader(
137              fieldsStream, formatName, VERSION_CURRENT, si.getId(), segmentSuffix);
138          assert CodecUtil.indexHeaderLength(formatName, segmentSuffix)
139              == fieldsStream.getFilePointer();
140    
141          indexWriter =
main[1] print formatName
 formatName = "Lucene90StoredFieldsFastData"

对应堆栈

  [1] org.apache.lucene.store.OutputStreamIndexOutput.writeByte (OutputStreamIndexOutput.java:54)
  [2] org.apache.lucene.codecs.CodecUtil.writeBEInt (CodecUtil.java:653)
  [3] org.apache.lucene.codecs.CodecUtil.writeHeader (CodecUtil.java:82)
  [4] org.apache.lucene.codecs.CodecUtil.writeIndexHeader (CodecUtil.java:125)
  [5] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init> (Lucene90CompressingStoredFieldsWriter.java:128)
  [6] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter (Lucene90CompressingStoredFieldsFormat.java:140)
  [7] org.apache.lucene.codecs.lucene90.Lucene90StoredFieldsFormat.fieldsWriter (Lucene90StoredFieldsFormat.java:154)
  [8] org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter (StoredFieldsConsumer.java:49)
  [9] org.apache.lucene.index.StoredFieldsConsumer.startDocument (StoredFieldsConsumer.java:56)
  [10] org.apache.lucene.index.IndexingChain.startStoredFields (IndexingChain.java:556)
  [11] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:587)
  [12] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
  [13] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
  [14] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
  [15] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
  [16] org.apache.lucene.index.IndexWriter.addDocument (IndexWriter.java:1,469)
  [17] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:271)
  [18] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
  [19] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
  [20] java.nio.file.Files.walkFileTree (Files.java:2,725)
  [21] java.nio.file.Files.walkFileTree (Files.java:2,797)
  [22] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
  [23] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

倒排索引

main[1] where
  [1] org.apache.lucene.index.TermsHashPerField.initStreamSlices (TermsHashPerField.java:150)
  [2] org.apache.lucene.index.TermsHashPerField.add (TermsHashPerField.java:198)
  [3] org.apache.lucene.index.IndexingChain$PerField.invert (IndexingChain.java:1,224)
  [4] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:729)
  [5] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
  [6] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
  [7] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
  [8] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
  [9] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
  [10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:277)
  [11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
  [12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
  [13] java.nio.file.Files.walkFileTree (Files.java:2,725)
  [14] java.nio.file.Files.walkFileTree (Files.java:2,797)
  [15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
  [16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

写入内容

main[1] where
  [1] org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.writeField (Lucene90CompressingStoredFieldsWriter.java:276)
  [2] org.apache.lucene.index.StoredFieldsConsumer.writeField (StoredFieldsConsumer.java:65)
  [3] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:749)
  [4] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
  [5] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
  [6] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
  [7] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
  [8] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
  [9] org.apache.lucene.index.IndexWriter.addDocument (IndexWriter.java:1,469)
  [10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:271)
  [11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
  [12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
  [13] java.nio.file.Files.walkFileTree (Files.java:2,725)
  [14] java.nio.file.Files.walkFileTree (Files.java:2,797)
  [15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
  [16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

查看fdt文件

hexdump -C _0.fdt
00000000  3f d7 6c 17 1c 4c 75 63  65 6e 65 39 30 53 74 6f  |?.l..Lucene90Sto|
00000010  72 65 64 46 69 65 6c 64  73 46 61 73 74 44 61 74  |redFieldsFastDat|
00000020  61 00 00 00 01 85 88 12  2b 0c 73 6b 95 30 38 76  |a.......+.sk.08v|
00000030  c9 0a 2a 52 29 00 00 0a  00 01 00 1c 02 06 03 07  |..*R)...........|
00000040  07 07 07 07 07 07 07 07  20 00 1a 60 2f 68 6f 6d  |........ ..`/hom|
00000050  65 2f 60 75 62 75 6e 74  75 60 2f 64 6f 63 2f 6d  |e/`ubuntu`/doc/m|
00000060  60 6f 6e 67 6f 2e 74 60  78 74 00 1a 2f 68 60 6f  |`ongo.t`xt../h`o|
00000070  6d 65 2f 75 62 60 75 6e  74 75 2f 64 60 6f 63 2f  |me/ub`untu/d`oc/|
00000080  68 65 6c 60 6c 6f 2e 74  78 74 c0 28 93 e8 00 00  |hel`lo.txt.(....|
00000090  00 00 00 00 00 00 c8 75  0a 41                    |.......u.A|
0000009a

fdt描述

然后分析fdt格式: [1-4]代表第一个字节到第四个字节

[1-4]前四位字节是大端的magic number CODEC_MAGIC = 0x3fd76c17 [5-33] 第五个字节描述字符串长度,后面的[6-33]是具体的字符串,也就是16进制1c也就是10进制的28 , 因为字符串长度是28的字符串Lucene90StoredFieldsFastData [34-37]字符串后面是写死的版本大端的1 [38-53] 16字节用唯一id描述这个文件

缓冲池

TermsHashPerField持有三个缓冲池intPool,bytePool,termBytePool

  TermsHashPerField(
      int streamCount,
      IntBlockPool intPool,
      ByteBlockPool bytePool,
      ByteBlockPool termBytePool,
      Counter bytesUsed,
      TermsHashPerField nextPerField,
      String fieldName,
      IndexOptions indexOptions) {
    this.intPool = intPool;
    this.bytePool = bytePool;
    this.streamCount = streamCount;
    this.fieldName = fieldName;
    this.nextPerField = nextPerField;
    assert indexOptions != IndexOptions.NONE;
    this.indexOptions = indexOptions;
    PostingsBytesStartArray byteStarts = new PostingsBytesStartArray(this, bytesUsed);
    bytesHash = new BytesRefHash(termBytePool, HASH_INIT_SIZE, byteStarts);
  }

生成term

main[1] where
  [1] org.apache.lucene.util.BytesRefHash.add (BytesRefHash.java:247)
  [2] org.apache.lucene.index.TermsHashPerField.add (TermsHashPerField.java:193)
  [3] org.apache.lucene.index.IndexingChain$PerField.invert (IndexingChain.java:1,224)
  [4] org.apache.lucene.index.IndexingChain.processField (IndexingChain.java:729)
  [5] org.apache.lucene.index.IndexingChain.processDocument (IndexingChain.java:620)
  [6] org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments (DocumentsWriterPerThread.java:241)
  [7] org.apache.lucene.index.DocumentsWriter.updateDocuments (DocumentsWriter.java:432)
  [8] org.apache.lucene.index.IndexWriter.updateDocuments (IndexWriter.java:1,531)
  [9] org.apache.lucene.index.IndexWriter.updateDocument (IndexWriter.java:1,816)
  [10] org.apache.lucene.demo.IndexFiles.indexDoc (IndexFiles.java:277)
  [11] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:212)
  [12] org.apache.lucene.demo.IndexFiles$1.visitFile (IndexFiles.java:208)
  [13] java.nio.file.Files.walkFileTree (Files.java:2,725)
  [14] java.nio.file.Files.walkFileTree (Files.java:2,797)
  [15] org.apache.lucene.demo.IndexFiles.indexDocs (IndexFiles.java:206)
  [16] org.apache.lucene.demo.IndexFiles.main (IndexFiles.java:157)

arch 查询

相关阅读

main[1] where
  [1] org.apache.lucene.codecs.lucene90.blocktree.SegmentTermsEnum.seekExact (SegmentTermsEnum.java:476)
  [2] org.apache.lucene.index.TermStates.loadTermsEnum (TermStates.java:117)
  [3] org.apache.lucene.index.TermStates.build (TermStates.java:102)
  [4] org.apache.lucene.search.TermQuery.createWeight (TermQuery.java:227)
  [5] org.apache.lucene.search.IndexSearcher.createWeight (IndexSearcher.java:885)
  [6] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:686)
  [7] org.apache.lucene.search.IndexSearcher.searchAfter (IndexSearcher.java:532)
  [8] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:542)
  [9] org.apache.lucene.demo.SearchFiles.doPagingSearch (SearchFiles.java:180)
  [10] org.apache.lucene.demo.SearchFiles.main (SearchFiles.java:150)

相关阅读