搜索引擎之全文搜索算法功能實(shí)現(xiàn)（基于Lucene）

zlyBear 發(fā)布于2019-08-16 10:59 / 3531人閱讀

摘要：之前做去轉(zhuǎn)盤網(wǎng)的時(shí)候，我已經(jīng)公開了非全文搜索的代碼，需要的朋友希望能夠前去閱讀我的博客。如果有什么疑問可以加群如果群滿了就麻煩去趟去轉(zhuǎn)盤找下最新的群加了即可，謝謝您的閱讀。

之前做去轉(zhuǎn)盤網(wǎng)的時(shí)候，我已經(jīng)公開了非全文搜索的代碼，需要的朋友希望能夠前去閱讀我的博客。本文主要討論如何進(jìn)行全文搜索，由于本人花了很長時(shí)間設(shè)計(jì)了新作：觀點(diǎn)，觀點(diǎn)對全文搜索的要求還是很高的，所以我又花了不少時(shí)間研究全文搜索，你可以先體驗(yàn)下：點(diǎn)我搜索。廢話也不多說了，直接上代碼：

public Map  articleSearchAlgorithms(SearchCondition condition,IndexSearcher searcher) throws ParseException, IOException{
         
            Map map =new HashMap();
             String[] filedsList=condition.getFiledsList();
             String keyWord=condition.getKeyWord();
             int currentPage=condition.getCurrentPage();
             int pageSize=condition.getPageSize();
             String sortField=condition.getSortField();
             boolean isASC=condition.isDESC();
             String sDate=condition.getsDate();
            String eDate=condition.geteDate();
            String classify=condition.getClassify();
             
            
            //過濾終結(jié)字符
            keyWord=escapeExprSpecialWord(keyWord);
            
            BooleanQuery q1 = new BooleanQuery();
            BooleanQuery q2 = new BooleanQuery();
             BooleanQuery booleanQuery = new BooleanQuery(); //boolean查詢
             
             if(classify!=null&&(classify.equals("guanzhi")||classify.equals("opinion")||classify.equals("write"))){
                 String typeId="1";//默認(rèn)言論
                 if(classify.equals("guanzhi")){
                     typeId="2";
                 }
                 if(classify.equals("opinion")){
                     typeId="3";
                 }
                 Query termQuery = new TermQuery(new Term("typeId",typeId)); 
                 q1.add(termQuery,BooleanClause.Occur.MUST);
             }

             if(sDate!=null&&eDate!=null){//是否范圍查詢由這兩個(gè)參數(shù)決定
                Query rangeQuery = new TermRangeQuery("writingTime", new BytesRef(sDate), new BytesRef(eDate),true, true);
                q1.add(rangeQuery,BooleanClause.Occur.MUST);
             }

            Sort sort = new Sort(); // 排序
            sort.setSort(SortField.FIELD_SCORE);
            if(sortField!=null){
                sort.setSort(new SortField(sortField, SortField.Type.STRING, isASC));
            }
            
            int start = (currentPage - 1) * pageSize;
            int hm = start + pageSize;
            
            TopFieldCollector res = TopFieldCollector.create(sort,hm,false, false, false, false);

            //完全匹配查詢
            Term t0=new Term(filedsList[1],keyWord);
            TermQuery termQuery = new TermQuery(t0);//兩種高度匹配的查詢
            q2.add(termQuery,BooleanClause.Occur.SHOULD);
            
            //前綴匹配
            Term t1=new Term(filedsList[1],keyWord);
            PrefixQuery prefixQuery=new PrefixQuery(t1);
            q2.add(prefixQuery,BooleanClause.Occur.SHOULD);
            
            //短語，相似度匹配，適用于分詞的內(nèi)容
            for(int i=0;i0){
                booleanQuery.add(q1,BooleanClause.Occur.MUST);
            }
            if(q2!=null && q2.toString().length()>0){
                 booleanQuery.add(q2,BooleanClause.Occur.MUST);
            }
            
            searcher.search(booleanQuery, res);
            long amount = res.getTotalHits(); 
            TopDocs tds = res.topDocs(start, pageSize);
            map.put("amount",amount);
            map.put("tds",tds);
            map.put("query",booleanQuery);
            return map;
    }

注意下：上面代碼的搜索條件（SearchCondition ）是觀點(diǎn)網(wǎng)的具體需求，您可以按照您自己的搜索條件做改動(dòng)，這里也很難適配所有讀者。

public Map searchArticle(SearchCondition condition) throws Exception{
            
        Map map =new HashMap();
        List list=new ArrayList();
        
         DirectoryReader reader=condition.getReader();
         String URL=condition.getURL();
         boolean isHighligth=condition.isHighlight();
         String keyWord=condition.getKeyWord();
         IndexSearcher searcher=getSearcher(reader,URL);
        
        try{
            Map output=articleSearchAlgorithms(condition,searcher);
            if(output==null){
                map.put("amount",0L);
                map.put("source",null);
                return map;
            }
            
            map.put("amount", output.get("amount"));
            TopDocs tds = (TopDocs) output.get("tds");
            ScoreDoc[] sd = tds.scoreDocs;
            Query query =(Query) output.get("query");
            
            for (int i = 0; i < sd.length; i++) {
                
                Document doc = searcher.doc(sd[i].doc);

                String id = doc.get("id");
                /**********************start*************************需要處理的放一塊兒********************/
                String temp=doc.get("title");
                String title =temp; //默認(rèn)不高亮
                if(isHighligth){
                    //高亮文章標(biāo)題
                    Highlighter highlighterTitle = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
                    highlighterTitle.setTextFragmenter(new SimpleFragmenter(40)); // 字長度
                    TokenStream ts = analyzer.tokenStream("title", new StringReader(temp));
                    title= highlighterTitle.getBestFragment(ts,temp); 
                    if(title==null){
                        title=temp.replace(keyWord,""+keyWord+"");//高亮處理插件bug，加這句話避免
                    }
                }
                
                String temp1=HtmlEnDecode.htmlEncode(doc.get("content"));
                String content=temp1;//使用自己封裝的方法來轉(zhuǎn)義
                
                if(isHighligth){
                    //做高亮處理,content
                    Highlighter highlighterContent = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
                    highlighterContent.setTextFragmenter(new SimpleFragmenter(Constant.HIGHLIGHT_CONTENT_LENGTH)); // 字長度
                    //temp1=StringEscapeUtils.escapeHtml(temp1);//將漢字轉(zhuǎn)義導(dǎo)致高亮失效
                    TokenStream ts1 = analyzer.tokenStream("content", new StringReader(temp1));
                    content = highlighterContent.getBestFragment(ts1,temp1);
                    
                    if(content==null){
                        content=temp1.replace(keyWord,""+keyWord+"");//高亮處理插件bug，加這句話避免
                        
                        //假設(shè)遇上這種情況做處理，其他的高亮器會自動(dòng)截圖
                        content=subContent(content);//截取處理
                        content=HtmlEnDecode.htmldecode(content);//html解碼
                        content=SubStringHTML.sub(content,Constant.HIGHLIGHT_CONTENT_LENGTH);
                    }
                }
                /*---------------------------------------不斷變動(dòng)的數(shù)據(jù)放一塊兒----------------------------*/
                
                Write write=writeDao.getArticle(Long.parseLong(id));
                if(write!=null){
                    write.setTitle(title);
                    write.setContent(content);
                    
                    Date writingTime=write.getWritingTime();
                    String timeGap=DateUtil.dateGap(writingTime);//timeGap
                    write.setTimeGap(timeGap);
                    
                    list.add(write);
                }
            }
            
        }catch(Exception e){
            e.printStackTrace();
        }
        map.put("source",list);
        return map;
    }

注意上面，這是具體的搜索代碼，不同的應(yīng)用場景有不同的需求，請您按照自己的需求封裝對象，查詢數(shù)據(jù)庫等，代碼毫無保留，絕對可用。

如果有什么疑問可以加qq群：284205104 如果群滿了就麻煩去趟去轉(zhuǎn)盤找下最新的群加了即可，謝謝您的閱讀。

GPU云服務(wù)器云服務(wù)器搜索功能java實(shí)現(xiàn) lucene 搜索全文搜索引擎全文搜索

文章版權(quán)歸作者所有，未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址：http://specialneedsforspecialkids.com/yun/70851.html

發(fā)表評論

登陸后可評論

0條評論

zlyBear

男|高級講師

我要關(guān)注我要私信

TA的文章

Python【賦值語句】專講，可不能只會 a=b 啊！建議掌握！

閱讀 2541·2021-10-09 09:44
前端面試每日3+1——第103天

閱讀 644·2019-08-30 15:44
重學(xué)前端學(xué)習(xí)筆記（六）--JavaScript類型有哪些你不知道的細(xì)節(jié)？

閱讀 3004·2019-08-29 18:46
關(guān)于程序員寫好 ppt 的幾點(diǎn)總結(jié) - 前端張大胖

閱讀 1139·2019-08-29 18:38
第一次構(gòu)建react前端項(xiàng)目

閱讀 563·2019-08-26 10:44
Vue+Vue-router+Vuex項(xiàng)目實(shí)戰(zhàn)

閱讀 2436·2019-08-23 16:07
學(xué)習(xí) PixiJS — 交互工具

閱讀 1098·2019-08-23 15:38
Cesium的3D在多個(gè)單頁面應(yīng)用中,內(nèi)存只增不減致內(nèi)存溢出問題的解決

閱讀 4104·2019-08-23 14:02

国产xxxx99真实实拍_久久不雅视频_高清韩国a级特黄毛片_嗯老师别我我受不了了小说

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

搜索引擎之全文搜索算法功能實(shí)現(xiàn)（基于Lucene）

相關(guān)文章

Lucene構(gòu)建個(gè)人搜索引擎解析

Lucene就是這么容易

發(fā)表評論

0條評論

zlyBear

男|高級講師

TA的文章

Python【賦值語句】專講，可不能只會 a=b 啊！建議掌握！

前端面試每日3+1——第103天

重學(xué)前端學(xué)習(xí)筆記（六）--JavaScript類型有哪些你不知道的細(xì)節(jié)？

關(guān)于程序員寫好 ppt 的幾點(diǎn)總結(jié) - 前端張大胖

第一次構(gòu)建react前端項(xiàng)目

Vue+Vue-router+Vuex項(xiàng)目實(shí)戰(zhàn)

學(xué)習(xí) PixiJS — 交互工具

Cesium的3D在多個(gè)單頁面應(yīng)用中,內(nèi)存只增不減致內(nèi)存溢出問題的解決

最新活動(dòng)