hibernate提供了全文索引功能,非常棒,這里簡(jiǎn)要介紹下它的用法,
1. 在pom.xml引入包依賴
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
< dependency > < groupId >org.hibernate</ groupId > < artifactId >hibernate-search-orm</ artifactId > < version >${hibernate-search.version}</ version > </ dependency > < dependency > < groupId >org.apache.lucene</ groupId > < artifactId >lucene-analyzers-smartcn</ artifactId > < version >${lucene.version}</ version > </ dependency > < dependency > < groupId >org.apache.lucene</ groupId > < artifactId >lucene-queryparser</ artifactId > < version >${lucene.version}</ version > </ dependency > < dependency > < groupId >org.apache.lucene</ groupId > < artifactId >lucene-analyzers-phonetic</ artifactId > < version >${lucene.version}</ version > </ dependency > |
hibernate配置 search index保存路徑
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
<bean id= "sessionFactory" class = "org.springframework.orm.hibernate4.LocalSessionFactoryBean" destroy-method= "destroy" > <property name= "dataSource" ref= "poolingDataSource" /> <property name= "configLocation" > <value> classpath:hibernate.cfg.xml </value> </property> <property name= "hibernateProperties" > <props> <prop key= "hibernate.dialect" >${hibernate.dialect}</prop> <!-- Booleans can be easily used in expressions by declaring HQL query substitutions in Hibernate configuration --> <prop key= "hibernate.query.substitutions" > true 'Y' , false 'N' </prop> <!-- http: //ehcache.org/documentation/integrations/hibernate --> <!-- http: //www.tutorialspoint.com/hibernate/hibernate_caching.htm --> <prop key= "hibernate.cache.use_second_level_cache" > true </prop> <!-- org.hibernate.cache.ehcache.EhCacheRegionFactory --> <prop key= "hibernate.cache.region.factory_class" >org.hibernate.cache.ehcache.EhCacheRegionFactory</prop> <!-- hibernate只會(huì)緩存使用load()方法獲得的單個(gè)持久化對(duì)象,如果想緩存使用findall()、 list()、Iterator()、createCriteria()、createQuery() 等方法獲得的數(shù)據(jù)結(jié)果集的話,就需要設(shè)置hibernate.cache.use_query_cache true --> <prop key= "hibernate.cache.use_query_cache" > true </prop> <prop key= "net.sf.ehcache.configurationResourceName" >ehcache-hibernate.xml</prop> <!-- Hibernate Search index directory --> ***<prop key= "hibernate.search.default.indexBase" >indexes/</prop>*** </props> </property> </bean> |
對(duì)需要搜索的類加上Indexed Annotation,然后對(duì)類中可以被搜索的字段加上@Field Annotation,通常Enum字段不需要Analyzer進(jìn)行詞法分析,其他字段則需要,對(duì)于不需要Projection(返回部分字段)的情況下,不需要在index中存儲(chǔ)實(shí)際數(shù)據(jù)。可以通過(guò)AnalyzerDef來(lái)定義不同的詞法分析器以及對(duì)于的特殊詞過(guò)濾器
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
@Indexed @AnalyzerDef ( name= "enTopicAnalyzer" , charFilters={ @CharFilterDef (factory=HTMLStripCharFilterFactory. class ) }, tokenizer= @TokenizerDef (factory=StandardTokenizerFactory. class ), filters={ @TokenFilterDef (factory=StandardFilterFactory. class ), @TokenFilterDef (factory=StopFilterFactory. class ), @TokenFilterDef (factory=PhoneticFilterFactory. class , params = { @Parameter (name= "encoder" , value= "DoubleMetaphone" ) }), @TokenFilterDef (factory=SnowballPorterFilterFactory. class , params = { @Parameter (name= "language" , value= "English" ) }) } ) public class Topic { ...... @Field (index=Index.YES, analyze=Analyze.YES, store=Store.NO) @Analyzer (definition = "enTopicAnalyzer" ) private String title; ...... @Field (index=Index.YES, analyze=Analyze.YES, store=Store.NO) @Analyzer (definition = "enTopicAnalyzer" ) private String content; ...... @Enumerated (EnumType.STRING) @Field (index=Index.YES, analyze=Analyze.NO, store=Store.NO, bridge= @FieldBridge (impl=EnumBridge. class )) private TopicStatus status; ... } |
通過(guò)代碼對(duì)已有數(shù)據(jù)創(chuàng)建index
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
ApplicationContext context = new ClassPathXmlApplicationContext( "spring-resources.xml" ); SessionFactory sessionFactory = (SessionFactory) context.getBean( "sessionFactory" ); Session sess = sessionFactory.openSession(); FullTextSession fullTextSession = Search.getFullTextSession(sess); try { fullTextSession.createIndexer().startAndWait(); } catch (InterruptedException e) { LOG.error(e.getMessage(), e); } finally { fullTextSession.close(); } ((AbstractApplicationContext)context).close(); |
創(chuàng)建查詢fulltextsession,按照query條件獲取結(jié)果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
FullTextSession fullTextSession = Search .getFullTextSession(getSession()); QueryBuilder queryBuilder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity(Show. class ).get(); org.apache.lucene.search.Query luceneQuery = null ; luceneQuery = queryBuilder.keyword() // .wildcard() .onFields( "title" , "content" ).matching(query.getKeyword()) // .matching("*" + query.getKeyword() + "*") .createQuery(); FullTextQuery hibernateQuery = fullTextSession.createFullTextQuery( luceneQuery, Show. class ); return hibernateQuery.list(); |
note:
1. 在一次測(cè)試過(guò)程中,修改了value object,添加了新的index,忘記了rebuildIndex,結(jié)果unit test沒(méi)問(wèn)題,生成環(huán)境就出錯(cuò)了。
2. 搜索還不是很強(qiáng)大,比如搜索測(cè),含有測(cè)試的結(jié)果可能就搜索不出來(lái)
中文詞法分析
hibernate search底層使用Lucene,所以Lucene可以使用的中文分詞,hibernate search都可以用來(lái)支持中文詞法分析,比較常用的詞法分析器包括paoding,IKAnalyzer,mmseg4j 等等。具體可以參考分詞分析 最近分析。hibernate search默認(rèn)的分詞器是org.apache.lucene.analysis.standard.StandardAnalyzer,中文按字分詞,顯然不符合我們的需求。
這里介紹一下如何在hibernate中配置中文分詞,選擇的是Lucene自帶的中文分詞–。使用可以通過(guò)3種方式,一種是在hibernate的配置文件設(shè)置詞法分析方法,另外一種是在每個(gè)需要被搜索的類中定義分詞方法,最后一種是對(duì)單個(gè)字段配置。這里介紹下前2種的配置方式。
hibernate配置方式:
1
|
<property name="hibernate.search.analyzer"> org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer</property> |
被搜索類配置中文分詞:
1
2
|
@Indexed @Analyzer (impl=SmartChineseAnalyzer. class ) |
同時(shí)需要在maven中引入相關(guān)包依賴
1
2
3
4
5
|
<dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-smartcn</artifactId> <version>${lucene.version}</version> </dependency> |
多條件查詢
hibernate search可以通過(guò)多組合條件來(lái)實(shí)現(xiàn)多條件查詢,這里簡(jiǎn)單介紹一下多條件查詢的一個(gè)實(shí)踐。
如果只是單個(gè)條件查詢,那么這個(gè)查詢就可以很簡(jiǎn)單
luceneQuery = queryBuilder.keyword().onFields("title", "content").matching(query.getKeyword()).createQuery()
如果是多條件并查詢,那么就需要使用到Must Join,如果是多條件或查詢,就需要使用should Join,這里舉個(gè)Must Join的例子
1
2
3
4
5
6
|
//must true MustJunction term = queryBuilder.bool().must(queryBuilder.keyword() .onFields( "title" , "content" ) .matching(query.getKeyword()).createQuery()); //must false term.must(queryBuilder.keyword() .onField( "status" ) .matching(query.getExcludeStatus()).createQuery()).not(); |
完整例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
|
private FullTextQuery findByKeywordQuery(TopicQuery query) { FullTextSession fullTextSession = Search .getFullTextSession(getSession()); QueryBuilder queryBuilder = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity(Topic. class ).get(); org.apache.lucene.search.Query luceneQuery = null ; if ( null == query.getStatus() && null == query.getUsername() && null == query.getExcludeStatus()) { luceneQuery = queryBuilder.keyword() // .wildcard() .onFields( "title" , "content" ).matching(query.getKeyword()) // .matching("*" + query.getKeyword() + "*") .createQuery(); if (LOG.isDebugEnabled()){ LOG.debug( "create clean keyword search query: " + luceneQuery.toString()); } } else { MustJunction term = queryBuilder.bool().must(queryBuilder.keyword() .onFields( "title" , "content" ) .matching(query.getKeyword()).createQuery()); if ( null != query.getStatus()){ term.must(queryBuilder.keyword() // .wildcard() .onField( "status" ) .matching(query.getStatus()).createQuery()); } if ( null != query.getExcludeStatus()){ term.must(queryBuilder.keyword() .onField( "status" ) .matching(query.getExcludeStatus()).createQuery()).not(); } if ( null != query.getUsername()){ term.must(queryBuilder.keyword() // .wildcard() .onField( "owner.username" ) .ignoreFieldBridge() .matching(query.getUsername()).createQuery()); } luceneQuery =term.createQuery(); if (LOG.isDebugEnabled()){ LOG.debug( "create complicated keyword search query: " + luceneQuery.toString()); } } // BooleanQuery FullTextQuery hibernateQuery = fullTextSession.createFullTextQuery( luceneQuery, Topic. class ); return hibernateQuery; } |