Lucene介紹與應(yīng)用

genedna 發(fā)布于2019-08-19 11:46 / 3070人閱讀

摘要：創(chuàng)建用來對(duì)查詢語句進(jìn)行詞法分析和語言處理。調(diào)用對(duì)查詢語法樹進(jìn)行搜索，得到結(jié)果。代碼中用到了分詞器，是第三方實(shí)現(xiàn)的分詞器，繼承自的類，針對(duì)中文文本進(jìn)行處理的分詞器。

Lucene介紹與應(yīng)用

GitHub地址:https://github.com/yizuoliang...

一、全文檢索介紹

1.數(shù)據(jù)結(jié)構(gòu)

結(jié)構(gòu)化數(shù)據(jù)：

指具有“固定格式” 或“限定長度”的數(shù)據(jù)； 

例如：數(shù)據(jù)庫中的數(shù)據(jù)、元數(shù)據(jù)……

非結(jié)構(gòu)化數(shù)據(jù)

指不定長度或無固定格式的數(shù)據(jù)；

例如：文本、圖片、視頻、圖表……

2.數(shù)據(jù)的搜索

順序掃描法

從第一個(gè)文件掃描到最后一個(gè)文件，把每一個(gè)文件內(nèi)容從開頭掃到結(jié)尾，直到掃完所有的文件。

全文檢索法

 將非結(jié)構(gòu)化數(shù)據(jù)中的一部分信息提取出來，重新組織，建立索引，使其變得有一定結(jié)構(gòu)，然后對(duì)此有一定結(jié)構(gòu)的數(shù)據(jù)進(jìn)行搜索，從而達(dá)到搜索相對(duì)較快的目的。

3.全文檢索

例如：新華字典。字典的拼音表和部首檢字表就相當(dāng)于字典的索引，我們可以通過查找索引從而找到具體的字解釋。如果沒有創(chuàng)建索引，就要從字典的首頁一頁頁地去查找。

這種先建立索引，再對(duì)索引進(jìn)行搜索的過程就叫全文檢索(Full-text Search) 。

全文檢索的核心

創(chuàng)建索引：將從所有的結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)提取信息，創(chuàng)建索引的過程。

搜索索引：就是得到用戶的查詢請(qǐng)求，搜索創(chuàng)建的索引，然后返回結(jié)果的過程。

4.倒排索引

倒排索引（英文：InvertedIndex），也稱為反向索引，是一種索引方法，實(shí)現(xiàn)“單詞-文檔矩陣”的一種具體存儲(chǔ)形式，常被用于存儲(chǔ)在全文搜索下某個(gè)單詞與文檔的存儲(chǔ)位置的映射，通過倒排索引，可以根據(jù)單詞快速獲取包含這個(gè)單詞的文檔列表。

倒排索引的結(jié)構(gòu)主要由兩個(gè)部分組成：“單詞詞典”和“倒排表”。

索引方法例子

3個(gè)文檔內(nèi)容為：

    1.php是過去最流行的語言。

    2.java是現(xiàn)在最流行的語言。

    3. Python是未來流行的語言。

倒排索引的創(chuàng)建

1.使用分詞系統(tǒng)將文檔切分成單詞序列，每個(gè)文檔就成了由由單詞序列構(gòu)成的數(shù)據(jù)流；

2.給不同的單詞賦予唯一的單詞id,記錄下對(duì)應(yīng)的單詞;

3.同時(shí)記錄單詞出現(xiàn)的文檔,形成倒排列表。每個(gè)單詞都指向了文檔(Document)鏈表。

倒排索引的查詢

假如說用戶需要查詢:   “現(xiàn)在流行”

1.將用戶輸入進(jìn)行分詞,分為”現(xiàn)在”和”流行”;

2.取出包含字符串“現(xiàn)在”的文檔鏈表;

3.取出包含字符串“流行”的文檔鏈表;

4.通過合并鏈表,找出包含有”現(xiàn)在”或者”流行”的鏈表。

倒排索引原理

當(dāng)然倒排索引的結(jié)構(gòu)也不是上面說的那么簡單，索引系統(tǒng)還可以記錄除此之外的更多信息。詞對(duì)應(yīng)的倒排列表不僅記錄了文檔編號(hào)還記錄了單詞頻率信息。詞頻信息在搜索結(jié)果時(shí)，是重要的排序依據(jù)。這里先了解下，后面的評(píng)分計(jì)算就要用到這個(gè)。

索引和搜索流程圖

二、Lucene入門

? Lucene是一套用于全文檢索和搜尋的開源程序庫，由Apache軟件基金會(huì)支持和提供;

? 基于java的全文檢索工具包, Lucene并不是現(xiàn)成的搜索引擎產(chǎn)品，但可以用來制作搜索引擎產(chǎn)品；

? 官網(wǎng)：http://lucene.apache.org/ 。

1.Lucene的總體結(jié)構(gòu)

從lucene的總體架構(gòu)圖可以看出：

1.Lucene庫提供了創(chuàng)建索引和搜索索引的API。

 2.應(yīng)用程序需要做的就是收集文檔數(shù)據(jù)，創(chuàng)建索引；通過用戶輸入查詢索引的得到返回結(jié)果。

2.Lucene的幾個(gè)基本概念

Index（索引）：類似數(shù)據(jù)庫的表的概念，但它完全沒有約束，可以修改和添加里面的文檔，文檔里的內(nèi)容可以任意定義。

Document（文檔）：類似數(shù)據(jù)庫內(nèi)的行的概念，一個(gè)Index內(nèi)會(huì)包含多個(gè)Document。

Field（字段）：一個(gè)Document會(huì)由一個(gè)或多個(gè)Field組成，分詞就是對(duì)Field 分詞。

Term（詞語）和Term Dictionary（詞典）：Lucene中索引和搜索的最小單位，一個(gè)Field會(huì)由一個(gè)或多個(gè)Term組成，Term是由Field經(jīng)過Analyzer（分詞）產(chǎn)生。Term Dictionary即Term詞典，是根據(jù)條件查找Term的基本索引。

3.Lucene創(chuàng)建索引過程

Lucene創(chuàng)建索引過程如下：

1.創(chuàng)建一個(gè)IndexWriter用來寫索引文件，它有幾個(gè)參數(shù)，INDEX_DIR就是索引文件所存放的位置，Analyzer便是用來對(duì)文檔進(jìn)行詞法分析和語言處理的。

2.創(chuàng)建一個(gè)Document代表我們要索引的文檔。將不同的Field加入到文檔中。不同類型的信息用不同的Field來表示

3.IndexWriter調(diào)用函數(shù)addDocument將索引寫到索引文件夾中。

4.Lucene搜索過程

搜索過程如下：

1.IndexReader將磁盤上的索引信息讀入到內(nèi)存，INDEX_DIR就是索引文件存放的位置。

2.創(chuàng)建IndexSearcher準(zhǔn)備進(jìn)行搜索。

3.創(chuàng)建Analyer用來對(duì)查詢語句進(jìn)行詞法分析和語言處理。

4.創(chuàng)建QueryParser用來對(duì)查詢語句進(jìn)行語法分析。

5.QueryParser調(diào)用parser進(jìn)行語法分析，形成查詢語法樹，放到Query中。

6.IndexSearcher調(diào)用search對(duì)查詢語法樹Query進(jìn)行搜索，得到結(jié)果TopScoreDocCollector。

三、Lucene入門案例一

1.案例一代碼

引入lucene的jar包

public class LuceneTest {

    public static void main(String[] args) throws Exception {
        // 1. 準(zhǔn)備中文分詞器
        IKAnalyzer analyzer = new IKAnalyzer();
        // 2. 創(chuàng)建索引
        List productNames = new ArrayList<>();
        productNames.add("小天鵝TG100-1420WDXG");
        productNames.add("小天鵝TB80-easy60W?洗漂脫時(shí)間自由可調(diào)，京東微聯(lián)智能APP手機(jī)控制");
        productNames.add("小天鵝TG90-1411DG?洗滌容量:9kg 脫水容量:9kg 顯示屏:LED數(shù)碼屏顯示");
        productNames.add("小天鵝TP75-V602?流線蝶形波輪，超強(qiáng)噴淋漂洗");
        productNames.add("小天鵝TG100V20WDG?大件洗，無旋鈕外觀，智能WiFi");
        productNames.add("小天鵝TD80-1411DG 洗滌容量:8kg 脫水容量:8kg 顯示屏:LED數(shù)碼屏顯示");
        productNames.add("海爾XQB90-BZ828 洗滌容量:9kg 脫水容量:9kg 顯示屏:LED數(shù)碼屏顯示");
        productNames.add("海爾G100818HBG 極簡智控面板，V6蒸汽烘干，深層潔凈");
        productNames.add("海爾G100678HB14SU1 洗滌容量:10kg 脫水容量:10kg 顯示屏:LED數(shù)碼屏顯");
        productNames.add("海爾XQB80-KM12688 智能自由洗，超凈洗");
        productNames.add("海爾EG8014HB39GU1 手機(jī)智能，一鍵免熨燙，空氣凈化洗");
        productNames.add("海爾G100818BG 琥珀金機(jī)身，深層潔凈，輕柔雪紡洗");
        productNames.add("海爾G100728BX12G 安全磁鎖，健康下排水");
        productNames.add("西門子XQG80-WD12G4C01W 衣干即停，熱風(fēng)除菌，低噪音");
        productNames.add("西門子XQG80-WD12G4681W 智能烘干，變速節(jié)能，無刷電機(jī)");
        productNames.add("西門子XQG100-WM14U568LW 洗滌容量:10kg 脫水容量:10kg 顯示屏:LED");
        productNames.add("西門子XQG80-WM10N1C80W 除菌、洗滌分離，防過敏程序");
        productNames.add("西門子XQG100-WM14U561HW 洗滌容量:10kg 脫水容量:10kg 顯示屏:LED");
        productNames.add("西門子XQG80-WM12L2E88W 洗滌容量:8kg 脫水容量:8kg 顯示屏:LED觸摸");
        Directory index = createIndex(analyzer, productNames);

        // 3. 查詢器
        String keyword = "西門子 LED";
        Query query = new QueryParser("name", analyzer).parse(keyword);
        // 4. 搜索
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        int numberPerPage = 1000;
        System.out.printf("當(dāng)前一共有%d條數(shù)據(jù)%n"+"
", productNames.size());
        System.out.printf("查詢關(guān)鍵字是："%s"%n"+"
", keyword);
        ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
        // 5. 顯示查詢結(jié)果
        showSearchResults(searcher, hits, query, analyzer);
        // 6. 關(guān)閉查詢
        reader.close();
    }

    private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer)
            throws Exception {
        System.out.println("找到 " + hits.length + " 個(gè)命中.  
");
        System.out.println("序號(hào)	匹配度得分	結(jié)果  
");

        SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", "");
        Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));

        for (int i = 0; i < hits.length; ++i) {
            ScoreDoc scoreDoc= hits[i];
            int docId = scoreDoc.doc;
            Document d = searcher.doc(docId);
            List fields = d.getFields();
            System.out.print((i + 1));
            System.out.print("	" + scoreDoc.score);
            for (IndexableField f : fields) {
                TokenStream tokenStream = analyzer.tokenStream(f.name(), new StringReader(d.get(f.name())));
                String fieldContent = highlighter.getBestFragment(tokenStream, d.get(f.name()));                
                System.out.print("	" + fieldContent);
            }
            System.out.println("
");
        }
    }


    private static Directory createIndex(IKAnalyzer analyzer, List products) throws IOException {
        //存在內(nèi)存中,新建一個(gè)詞典
        Directory index = new RAMDirectory();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter writer = new IndexWriter(index, config);
        for (String name : products) {
            addDoc(writer, name);
        }
        writer.close();
        return index;
    }
    
    /**
     * 添加文檔內(nèi)容
     * @param w
     * @param name
     * @throws IOException
     */
    private static void addDoc(IndexWriter w, String name) throws IOException {
        //創(chuàng)建一個(gè)文檔
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        w.addDocument(doc);
    }
}

2.代碼解析

創(chuàng)建索引

private static Directory createIndex(IKAnalyzer analyzer, List products) throws IOException {
    //存在內(nèi)存中,新建一個(gè)詞典
    Directory index = new RAMDirectory();
    IndexWriterConfig config = new IndexWriterConfig(analyzer);
    IndexWriter writer = new IndexWriter(index, config);
    for (String name : products) {
        addDoc(writer, name);
    }
    writer.close();
    return index;
}

private static void addDoc(IndexWriter w, String name) throws IOException {
    //創(chuàng)建一個(gè)文檔
    Document doc = new Document();
    doc.add(new TextField("name", name, Field.Store.YES));
    w.addDocument(doc);
}

上面代碼是將List中的內(nèi)容保存在文檔中，使用analyzer分詞器分詞，創(chuàng)建索引，索引保存在內(nèi)存中。 IndexWriter 對(duì)象用來寫索引的。

查詢索引

// 3. 查詢器
String keyword = "西門子 智能";
Query query = new QueryParser("name", analyzer).parse(keyword);
// 4. 搜索
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
int numberPerPage = 1000;
System.out.printf("當(dāng)前一共有%d條數(shù)據(jù)%n", productNames.size());
System.out.printf("查詢關(guān)鍵字是："%s"%n", keyword);
ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
// 5. 顯示查詢結(jié)果
showSearchResults(searcher, hits, query, analyzer);
// 6. 關(guān)閉查詢
reader.close();

上面代碼是查詢代碼，首先對(duì)構(gòu)建查詢條件Query對(duì)象，讀取索引，創(chuàng)建IndexSearcher 查詢對(duì)象，傳入查詢條件，得到查詢結(jié)果，將結(jié)果解析出來，返回。

分詞器

創(chuàng)建索引和查詢都要用到分詞器，在Lucene中分詞主要依靠Analyzer類解析實(shí)現(xiàn)。Analyzer類是一個(gè)抽象類，分詞的具體規(guī)則是由子類實(shí)現(xiàn)的，不同的語言規(guī)則，要有不同的分詞器， Lucene默認(rèn)的StandardAnalyzer是不支持中文的分詞。

代碼中用到了IKAnalyzer分詞器，IKAnalyzer是第三方實(shí)現(xiàn)的分詞器，繼承自Lucene的Analyzer類，針對(duì)中文文本進(jìn)行處理的分詞器。

打分機(jī)制

從案例返回結(jié)果來看,有一列匹配度得分,得分越高的排在越前面,排在前面的查詢結(jié)果也越準(zhǔn)確。

打分公式：

   Lucene庫也實(shí)現(xiàn)了上面的打分算法，查詢結(jié)果也會(huì)根據(jù)分?jǐn)?shù)進(jìn)行排序。

高亮顯示

SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", "");
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));

將查詢結(jié)果放到html頁面，就會(huì)發(fā)現(xiàn)查詢結(jié)果里關(guān)鍵字被標(biāo)記為紅色。在 Lucene庫的org.apache.lucene.search.highlight包中提供了關(guān)于高亮顯示檢索關(guān)鍵字的方法，可以對(duì)返回的結(jié)果中出現(xiàn)了的關(guān)鍵字進(jìn)行標(biāo)記。

四、Lucene入門案例二

1.案例介紹

1.將14萬條商品詳細(xì)信息到mysql數(shù)據(jù)庫;

2.使用Lucene庫創(chuàng)建索引;

3.使用Luncene查詢索引,并做分頁操作,得到返回查詢到的數(shù)據(jù),并記錄查詢時(shí)長;

4.使用JDBC連接mysql數(shù)據(jù)庫,采用like查詢,對(duì)商品進(jìn)行分頁操作,返回查詢到的數(shù)據(jù),記錄查詢時(shí)長;

5.比較mysql的模糊查詢與Lucene全文檢索查詢。

2.案例二代碼

引入lucene的jar包,和mysql的驅(qū)動(dòng)包,創(chuàng)建數(shù)據(jù)庫product表,插入數(shù)據(jù).

/**
 * 商品bean類
 * @author yizl
 *
 */
public class Product {
    /**
     * 商品id
     */
    private int id;
    /**
     * 商品名稱
     */
    private String name;
    /**
     * 商品類型
     */
    private String category;
    /**
     * 商品價(jià)格
     */
    private float price;
    /**
     * 商品產(chǎn)地
     */
    private String place;
    /**
     * 商品條形碼
     */
    private String code;
    
    ......    
    
}


public class TestLucene {

    private static ProductDao dao = new ProductDao();

    public static void main(String[] args) throws Exception {
        // 1. 準(zhǔn)備中文分詞器
        IKAnalyzer analyzer = new IKAnalyzer();
        // 2. 索引
        Directory index = createIndex(analyzer);
        // 3. 查詢器
        Scanner s = new Scanner(System.in);

        while (true) {
            System.out.print("請(qǐng)輸入查詢關(guān)鍵字：");
            String keyword = s.nextLine();
            System.out.println("當(dāng)前關(guān)鍵字是：" + keyword);
            long start = System.currentTimeMillis();
            // 查詢名字字段
            Query query = new QueryParser("name", analyzer).parse(keyword);
            // 4. 搜索
            IndexReader reader = DirectoryReader.open(index);
            IndexSearcher searcher = new IndexSearcher(reader);
            ScoreDoc[] hits = pageSearch(query, searcher, 1, 10);
            // 5. 顯示查詢結(jié)果
            showSearchResults(searcher, hits, query, analyzer);
            // 6. 關(guān)閉查詢
            reader.close();
            System.out.println("使用Lucene查詢索引,耗時(shí):" + (System.currentTimeMillis() - start) + "毫秒");

            System.out.println("-----------------------分割線-------------------------------");
            // 7.通過數(shù)據(jù)庫進(jìn)行模糊查詢
            selectProductOfName(keyword);
        }
    }

    /**
     * 通過mysql商品名查詢
     */
    private static void selectProductOfName(String str) {
        long start = System.currentTimeMillis();
        ResultBean> resultBean = dao.selectProductOfName(str, 1, 10);
        PageBean pageBean = resultBean.getPageBean();
        List products = resultBean.getData();
        System.out.println("查詢出的總條數(shù)	:" + pageBean.getTotal() + "條");
        System.out.println("當(dāng)前第" + pageBean.getPageNow() + "頁,每頁顯示" + pageBean.getPageSize() + "條數(shù)據(jù)");
        System.out.println("序號(hào)	結(jié)果");
        for (int i = 0; i < products.size(); i++) {
            Product product = products.get(i);
            System.out.print((i + 1));
            System.out.print("	" + product.getId());
            System.out.print("	" + product.getName());
            System.out.print("	" + product.getPrice());
            System.out.print("	" + product.getPlace());
            System.out.print("	" + product.getCode());
            System.out.println("
");
        }

        System.out.println("使用mysql查詢,耗時(shí):" + (System.currentTimeMillis() - start) + "毫秒");
    }

    /**
     * 顯示找到的結(jié)果
     * 
     * @param searcher
     * @param hits
     * @param query
     * @param analyzer
     * @throws Exception
     */
    private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer)
            throws Exception {
        System.out.println("序號(hào)	匹配度得分	結(jié)果");
        for (int i = 0; i < hits.length; ++i) {
            ScoreDoc scoreDoc = hits[i];
            int docId = scoreDoc.doc;
            Document d = searcher.doc(docId);
            List fields = d.getFields();
            System.out.print((i + 1));
            System.out.print("	" + scoreDoc.score);
            for (IndexableField f : fields) {
                System.out.print("	" + d.get(f.name()));
            }
            System.out.println("
");
        }
    }

    /**
     * 分頁查詢
     * 
     * @param query
     * @param searcher
     * @param pageNow
     *            當(dāng)前第幾頁
     * @param pageSize
     *            每頁顯示條數(shù)
     * @return
     * @throws IOException
     */
    private static ScoreDoc[] pageSearch(Query query, IndexSearcher searcher, int pageNow, int pageSize)
            throws IOException {
        TopDocs topDocs = searcher.search(query, pageNow * pageSize);
        System.out.println("查詢到的總條數(shù)	" + topDocs.totalHits);
        System.out.println("當(dāng)前第" + pageNow + "頁,每頁顯示" + pageSize + "條數(shù)據(jù)");
        ScoreDoc[] alllScores = topDocs.scoreDocs;
        List hitScores = new ArrayList<>();

        int start = (pageNow - 1) * pageSize;
        int end = pageSize * pageNow;
        for (int i = start; i < end; i++)
            hitScores.add(alllScores[i]);

        ScoreDoc[] hits = hitScores.toArray(new ScoreDoc[] {});
        return hits;
    }

    /**
     * 創(chuàng)建Index,將數(shù)據(jù)存入內(nèi)存中
     * 
     * @param analyzer
     * @return
     * @throws IOException
     */
    private static Directory createIndex(IKAnalyzer analyzer) throws IOException {
        long start = System.currentTimeMillis();
        Directory index = new RAMDirectory();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        IndexWriter writer = new IndexWriter(index, config);
        List products = dao.selectAllProduct();
        int total = products.size();
        int count = 0;
        int per = 0;
        int oldPer = 0;
        for (Product p : products) {
            addDoc(writer, p);
            count++;
            per = count * 100 / total;
            if (per != oldPer) {
                oldPer = per;
                System.out.printf("索引中，總共要添加 %d 條記錄，當(dāng)前添加進(jìn)度是： %d%% %n", total, per);
            }

        }
        System.out.println("索引創(chuàng)建耗時(shí):" + (System.currentTimeMillis() - start) + "毫秒");
        writer.close();
        return index;
    }

    /**
     * 往lucene中添加字段
     * 
     * @param w
     * @param p
     * @throws IOException
     */
    private static void addDoc(IndexWriter w, Product p) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("id", String.valueOf(p.getId()), Field.Store.YES));
        doc.add(new TextField("name", p.getName(), Field.Store.YES));
        doc.add(new TextField("category", p.getCategory(), Field.Store.YES));
        doc.add(new TextField("price", String.valueOf(p.getPrice()), Field.Store.YES));
        doc.add(new TextField("place", p.getPlace(), Field.Store.YES));
        doc.add(new TextField("code", p.getCode(), Field.Store.YES));
        w.addDocument(doc);
    }
}



public class ProductDao {

    private static String url = "jdbc:mysql://localhost:3306/lucene?useUnicode=true&characterEncoding=utf8";
    private static String user = "root";
    private static String password = "root";
    
    public static Connection getConnection() throws ClassNotFoundException, SQLException {
        Connection conn = null;
        // 通過工具類獲取連接對(duì)象
        Class.forName("com.mysql.jdbc.Driver");
        conn = DriverManager.getConnection(url, user, password);
        return conn;
    }

    /**
     * 批量增加商品
     * @param pList
     */
    public void insertProduct(List pList) {
        String insertProductTop="INSERT INTO `product` (`id`, `name`, "
                + "`category`, `price`, `place`, `code`) VALUES ";
        Connection conn = null;
        Statement stmt = null;
        try {
            conn = getConnection();
            // 3.創(chuàng)建Statement對(duì)象
            stmt = conn.createStatement();
            int count=0;
            // 4.sql語句
            StringBuffer sb = new StringBuffer();
            for (int i = 0,len=pList.size(); i < len; i++) {
                Product product = pList.get(i);
                sb.append("(" + product.getId() + ","" + product.getName() 
                + "","" + product.getCategory()+ ""," + product.getPrice()
                + ","" + product.getPlace() + "","" + product.getCode() + "")");
                if (i==len-1) {
                    sb.append(";");
                    break;
                }else {
                    sb.append(",");
                }
                //數(shù)據(jù)量太大會(huì)導(dǎo)致一次執(zhí)行不了,一次最多執(zhí)行20000條
                if(i%20000==0&&i!=0) {
                    sb.deleteCharAt(sb.length()-1);
                    sb.append(";");
                    String sql = insertProductTop+sb;
                    count += stmt.executeUpdate(sql);
                    //將sb清空
                    sb.delete(0, sb.length());
                }
            }
            
            String sql = insertProductTop+sb;
            // 5.執(zhí)行sql
            count += stmt.executeUpdate(sql);
            System.out.println("影響了" + count + "行");
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        } finally {
            close(conn, stmt);
        }
    }
    
    /**
     * 關(guān)閉資源
     * @param conn
     * @param stmt
     */
    private void close(Connection conn, Statement stmt) {
        // 關(guān)閉資源
        if (stmt != null) {
            try {
                stmt.close();
            } catch (SQLException e) {
                e.printStackTrace();
                throw new RuntimeException(e);
            }
        }
        if (conn != null) {
            try {
                conn.close();
            } catch (SQLException e) {
                e.printStackTrace();
                throw new RuntimeException(e);
            }
        }
    }

    // 
    public void deleteAllProduct() {
        Connection conn = null;
        Statement stmt = null;
        try {
            conn = getConnection();
            // 3.創(chuàng)建Statement對(duì)象
            stmt = conn.createStatement();
            // 4.sql語句
            String sql = "delete from product";
            // 5.執(zhí)行sql
            int count = stmt.executeUpdate(sql);
            System.out.println("影響了" + count + "行");

        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        } finally {
            // 關(guān)閉資源
            close(conn, stmt);
        }
    }

    /**
     * 查詢所有商品
     */
    public List selectAllProduct() {
        List pList=new ArrayList<>();
        Connection conn = null;
        Statement stmt = null;
        try {
            conn = getConnection();
            // 3.創(chuàng)建Statement對(duì)象
            stmt = conn.createStatement();
            // 4.sql語句
            String sql = "select * from product";
            // 5.執(zhí)行sql
            ResultSet rs = stmt.executeQuery(sql);
            while (rs.next()) {
                Product product=new Product();
                product.setId(rs.getInt("id"));
                product.setName(rs.getString("name"));
                product.setCategory(rs.getString("category"));
                product.setPlace(rs.getString("place"));
                product.setPrice(rs.getFloat("price"));
                product.setCode(rs.getString("code"));
                pList.add(product);
            }
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        } finally {
            // 關(guān)閉資源
            close(conn, stmt);
        }
        return pList;
    }
    /**
     * 通過商品名模糊匹配商品
     * @param strName
     * @param pageNow
     * @param pageSize
     * @return
     */
    public ResultBean> selectProductOfName(String strName, int pageNow, int pageSize) {
        ResultBean> resultBean=new ResultBean>();
        PageBean pageBean =new PageBean();
        pageBean.setPageNow(pageNow);
        pageBean.setPageSize(pageSize);
        List pList=new ArrayList<>();
        Connection conn = null;
        PreparedStatement pstmt = null;
        try {
            conn = getConnection();
            // sql語句
            String sql = "SELECT id,name,category,place,price,code FROM product"
                    + " where name like ? limit "+(pageNow-1)*pageSize+","+pageSize; 
            // 3.創(chuàng)建PreparedStatement對(duì)象,sql預(yù)編譯
            pstmt = conn.prepareStatement(sql);
            // 4.設(shè)定參數(shù)
            pstmt.setString(1, "%" + strName + "%" );                  
            // 5.執(zhí)行sql,獲取查詢的結(jié)果集  
            ResultSet rs = pstmt.executeQuery();
            while (rs.next()) {
                Product product=new Product();
                product.setId(rs.getInt("id"));
                product.setName(rs.getString("name"));
                product.setCategory(rs.getString("category"));
                product.setPlace(rs.getString("place"));
                product.setPrice(rs.getFloat("price"));
                product.setCode(rs.getString("code"));
                pList.add(product);
            }
            
            String selectCount = "SELECT count(1) c FROM product"
                    + " where name like ? ";
            pstmt = conn.prepareStatement(selectCount);
            pstmt.setString(1, "%" + strName + "%" ); 
            ResultSet rs1 = pstmt.executeQuery();
            int count=0;
            while (rs1.next()) {
                count = rs1.getInt("c");
            }
            pageBean.setTotal(count);
            resultBean.setPageBean(pageBean);
            resultBean.setData(pList);
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        } finally {
            // 關(guān)閉資源
            if (pstmt != null) {
                try {
                    pstmt.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                    throw new RuntimeException(e);
                }
            }
            if (conn != null) {
                try {
                    conn.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                    throw new RuntimeException(e);
                }
            }
        }
        return resultBean;
    
    }
}

/**
 * 返回結(jié)果bean
 * @author yizl
 *
 * @param 
 */
public class ResultBean {
    
    /**
     * 分頁信息
     */
    private PageBean pageBean;
    
    /**
     * 狀態(tài)碼
     */
    private Integer code;
    /**
     * 提示信息
     */
    private String msg;
    /**
     * 返回?cái)?shù)據(jù)
     */
    private T data;
    

/**
 * 分頁bean
 * @author yizl
 *
 */
public class PageBean {
    /**
     * 當(dāng)前頁數(shù)
     */
    private Integer pageNow;

    /**
     * 每頁條數(shù)
     */
    private Integer pageSize;
    /**
     * 總數(shù)
     */
    private Integer total;

4.Lucene的分頁查詢

private static ScoreDoc[] pageSearch(Query query, IndexSearcher searcher, int pageNow, int pageSize)
throws IOException {
TopDocs topDocs = searcher.search(query, pageNow * pageSize);
System.out.println("查詢到的總條數(shù)	" + topDocs.totalHits);
System.out.println("當(dāng)前第" + pageNow + "頁,每頁顯示" + pageSize + "條數(shù)據(jù)");
ScoreDoc[] alllScores = topDocs.scoreDocs;
List hitScores = new ArrayList<>();

int start = (pageNow - 1) * pageSize;
int end = pageSize * pageNow;
for (int i = start; i < end; i++)
hitScores.add(alllScores[i]);
ScoreDoc[] hits = hitScores.toArray(new ScoreDoc[] {});
return hits;
}

先把所有的命中數(shù)查詢出來，在進(jìn)行分頁，有點(diǎn)是查詢快，缺點(diǎn)是內(nèi)存消耗大。

5.結(jié)果比較分析

1.14萬條數(shù)據(jù),從創(chuàng)建lucene索引耗時(shí):11678毫秒,創(chuàng)建索引還是比較耗時(shí)的,但是索引只用創(chuàng)建一次,后面都查詢都可以使用；
2.從查詢時(shí)間來看,使用Lucene查詢,基本都在10ms左右,mysql查詢耗時(shí)在150ms以上,查詢速度方面有很大的提升，特別是數(shù)據(jù)量大的時(shí)候更加明顯；

3.從查詢精準(zhǔn)度來說，輸入單個(gè)的詞語可能都能查詢到結(jié)果，輸入組合詞語，mysql可以匹配不了，Lucene依然可以查詢出來，將匹配度高的結(jié)果排在前面，更精準(zhǔn)。

6.Lucene索引與mysql數(shù)據(jù)庫對(duì)比

	Lucene全文檢索	mysql數(shù)據(jù)庫
索引	將數(shù)據(jù)源中的數(shù)據(jù)--建立反向索引,查詢快	對(duì)于like查詢來說,傳統(tǒng)數(shù)據(jù)庫的索引不起作用,還是要全表掃描，查詢慢
匹配效果	詞元(term)匹配,通過語言分析接口進(jìn)行關(guān)鍵字拆分，匹配度高	模糊匹配,可能不能匹配相關(guān)的詞組
匹配度	有匹配度算法,匹配度高的得分高排前面	無匹配程度算法,隨機(jī)排列
關(guān)鍵字標(biāo)記	提供高亮顯示的Api,可以對(duì)查詢結(jié)果的關(guān)鍵字高亮標(biāo)記	沒有直接使用的api,需要自己封裝

五、總結(jié)

首先我們了解全文檢索方法，全文檢索搜索非結(jié)構(gòu)化數(shù)據(jù)速度快等優(yōu)點(diǎn)，倒排索引是現(xiàn)在最常用的全文檢索方法，索引的核心就是怎么創(chuàng)建索引和查詢索引。至于怎么實(shí)現(xiàn)創(chuàng)建和查詢，Apache軟件基金會(huì)很貼心的為我們Java程序員提供了Lucene開源庫，它為我們提供了創(chuàng)建和查詢索引的api，這就是我們學(xué)習(xí)Lucene的目的。

云服務(wù)器 GPU云服務(wù)器云服務(wù)器應(yīng)用介紹 webrtc介紹及簡單應(yīng)用與index介紹數(shù)據(jù)科學(xué)與大數(shù)據(jù)技術(shù)介紹

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://specialneedsforspecialkids.com/yun/77833.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

genedna

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

#11.11#酷銳云：全場(chǎng)7折，美國二區(qū)5折優(yōu)惠，活動(dòng)機(jī)型19元/月，香港CN2線路21元/月

閱讀 3319·2021-11-08 13:12
利用ScreenToGif便攜免費(fèi)屏幕錄制轉(zhuǎn)Gif動(dòng)畫軟件（輕便便攜占用資源?。?/a>

閱讀 2756·2021-10-15 09:41

玩轉(zhuǎn)指針，手撕c語言——（指針進(jìn)階）

閱讀 1451·2021-10-08 10:05
粉絲推薦的 3 個(gè) GitHub 項(xiàng)目

閱讀 3300·2021-10-08 10:04
擁抱數(shù)字化變革｜2021 全球產(chǎn)品經(jīng)理大會(huì)亮點(diǎn)回顧

閱讀 2102·2021-09-29 09:34
css透明度之rgba和opacity的區(qū)別及兼容

閱讀 2472·2019-08-30 15:55
CSS預(yù)編譯器

閱讀 2979·2019-08-30 15:45
vue——一個(gè)頁面實(shí)現(xiàn)音樂播放器

閱讀 2577·2019-08-29 14:17

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長期優(yōu)惠，快來選購！

Lucene介紹與應(yīng)用

相關(guān)文章

Lucene解析 - 基本概念

Lucene解析 - 基本概念

**SpringBoot+Lucene案例介紹**

發(fā)表評(píng)論

0條評(píng)論

genedna

男|高級(jí)講師

TA的文章

#11.11#酷銳云：全場(chǎng)7折，美國二區(qū)5折優(yōu)惠，活動(dòng)機(jī)型19元/月，香港CN2線路21元/月

利用ScreenToGif便攜免費(fèi)屏幕錄制轉(zhuǎn)Gif動(dòng)畫軟件（輕便便攜占用資源?。?/a>

玩轉(zhuǎn)指針，手撕c語言——（指針進(jìn)階）

粉絲推薦的 3 個(gè) GitHub 項(xiàng)目

擁抱數(shù)字化變革｜2021 全球產(chǎn)品經(jīng)理大會(huì)亮點(diǎn)回顧

css透明度之rgba和opacity的區(qū)別及兼容

CSS預(yù)編譯器

vue——一個(gè)頁面實(shí)現(xiàn)音樂播放器

最新活動(dòng)

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長期優(yōu)惠，快來選購！

Lucene介紹與應(yīng)用

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長期優(yōu)惠，快來選購！