通常我们需要做的合并sphinx索引时,需要考虑到使用‘过滤’和‘去旧’的方法,这二个方法在我们使用inderer索引命令时,通过–merge合并索引时,需要单独增加其他参数来处理,具体可以看以下介绍:

过滤

合并两个已有的索引比重新对所有数据做索引更有效率,而且有时候必须这样做(例如在“主索引+增量索引”分区模式中应合并主索引和增量索引,而不是简单地重新索引“主索引对应的数据)。因此indexer有这个选项。合并索引一般比重新索引快,但在大型索引上仍然不是一蹴而就。基本上,待合并的两个索引都会被读入内存一次,而合并后的内容需要写入磁盘一次。例如,合并100GB和1GB的两个索引将导致202GB的IO操作(但很可能还是比重新索引少)

基本的命令语法如下:

indexer --merge DSTINDEX SRCINDEX [--rotate]

SRCINDEX的内容被合并到DSTINDEX中,因此只有DSTINDEX索引会被改变。若DSTINDEX已经被searchd用于提供服务,则–rotate参数是必须的。最初设计的使用模式是,将小量的更新从SRCINDEX合并到DSTINDEX中。因此,当属性被合并时,一旦出现了重复的文档ID,SRCINDEX中的属性值更优先(会覆盖DSTINDEX中的值)。不过要注意,“旧的”关键字并不会被自动删除。例如,在DSTINDEX中有一个叫做“old”的关键字与文档123相关联,而在SRCINDEX中则有关键字“new”与同一个文档相关,那么在合并后用这两个关键字都能找到文档123。您可以给出一个显式条件来将文档从DSTINDEX中移除,以便应对这种情况,相关的开关是–merge-dst-range:

indexer --merge main delta --merge-dst-range deleted 0 0

这个开关允许您在合并过程中对目标索引实施过滤。过滤器可以有多个,只有满足全部过滤条件的文档才会在最终合并后的索引中出现。在上述例子中,过滤器只允许“deleted”为0的那些条件通过,而去除所有标记为已删除(“deleted”)的记录(可以通过调用UpdateAttributes() 设置文档的属性)。

去旧(强行更新)

情景描述:假设有一旧关键字“去钓鱼”,与论坛贴子“周末的活动”相关联,用“去钓鱼”可以搜索到这篇帖子。之后楼主把贴子关键字部分改成了“去河边钓鱼”。如果用sphinx的indexer生成增量索引bbsattend,然后用

indexer --merge bbs bbsattend --rotate

合成到主索引bbs后。用“去河边钓鱼”搜不到结果,用“去钓鱼”可以搜到“去河边钓鱼”的结果。
解决办法:加–merge-killists option(选项)

indexer --merge bbs bbsattend --rotate --merge-killists

参考脚本

mkdir /usr/local/sphinx/scripts
 
 
----0.全部主索引新建
 
#!/bin/bash
#ocpyang@126.com
#main_index_update.sh
/usr/local/sphinx/bin/indexer src2  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src3  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src4  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src5  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
 
 
 
----1.增量索引
#!/bin/bash
#ocpyang@126.com
#delta_index_update.sh
/usr/local/sphinx/bin/indexer src2_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src3_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src4_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src5_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
 
 
---2.合并索引
 
#!/bin/bash
#ocpyang@126.com
#merge_daily_index.sh
# merge "main + delta" indexes
 
##1. index abount tblpnr
/usr/local/sphinx/bin/indexer --merge src2 src2_delta -c /usr/local/sphinx/etc/sphinx.conf --rotate >> /usr/local/sphinx/var/log/index_merge.log 2>&1
 
if [ "$?" -eq 0 ]; then
            /usr/local/mysql/bin/mysql -h127.0.0.1 -uroot -ppassword  -e "REPLACE INTO jinri.sph_counter SELECT 2, MAX(id),max(update_time) FROM jinripnr.tblpnr"
fi
 
 
##2. index abount tblticketno
/usr/local/sphinx/bin/indexer --merge src3 src3_delta -c /usr/local/sphinx/etc/sphinx.conf --rotate >> /usr/local/sphinx/var/log/index_merge.log 2>&1
 
if [ "$?" -eq 0 ]; then
        /usr/local/mysql/bin/mysql -h127.0.0.1 -uroot -ppassword -e "REPLACE INTO jinri.sph_counter SELECT 3, MAX(id),max(update_time) FROM jinritickno.tblticketno"
fi
 
 
##3. index abount tblpassengername
/usr/local/sphinx/bin/indexer --merge src4 src4_delta -c /usr/local/sphinx/etc/sphinx.conf --rotate >> /usr/local/sphinx/var/log/index_merge.log 2>&1
 
if [ "$?" -eq 0 ]; then
        /usr/local/mysql/bin/mysql -h127.0.0.1 -uroot -ppassword -e "REPLACE INTO jinri.sph_counter SELECT 4, MAX(id),max(update_time) FROM jinripname.tblpassengername"
fi
 
 
 
##4. index abount tblorderno
/usr/local/sphinx/bin/indexer --merge src5 src5_delta -c /usr/local/sphinx/etc/sphinx.conf --rotate  >> /usr/local/sphinx/var/log/index_merge.log 2>&1
 
if [ "$?" -eq 0 ]; then
        /usr/local/mysql/bin/mysql -h127.0.0.1 -uroot -ppassword -e "REPLACE INTO jinri.sph_counter SELECT 5, MAX(id),max(update_time) FROM jinriorderno.tblorderno"
fi
 
####再次新建增量索引
#delta_index_update.sh
/usr/local/sphinx/bin/indexer src2_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src3_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src4_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
/usr/local/sphinx/bin/indexer src5_delta  -c /usr/local/sphinx/etc/sphinx.conf  --rotate > /dev/null 2>&1
 
 
 
 
 
 
# crontab -l
 
# crontab -e
 
*/5 * * * *  /usr/local/mysql/scripts/delta_index_update.sh
0 2 * * *    /usr/local/mysql/scripts/merge_daily_index.sh
 
 
---合并语法
/usr/local/sphinx/bin/indexer --merge src3 src3_delta -c /usr/local/sphinx/etc/sphinx.conf \
--rotate --merge-dst-range deleted 0 0  --merge-killlists
 
 
--查看job执行日志
tail -f /var/log/cron

我的脚本

定时任务


1 */1 * * *   sh /usr/local/coreseek/script_index.sh
*/10 * * * *  sh /usr/local/coreseek/script_delta.sh

script_delta.sh

增量脚本

# cat  script_delta.sh
#!/bin/bash

 /usr/local/coreseek/bin/indexer   wk_article_delta --rotate >>/usr/local/coreseek/var/log/delta.log
 /usr/local/coreseek/bin/indexer   wk_courses_delta --rotate >>/usr/local/coreseek/var/log/delta.log
 /usr/local/coreseek/bin/indexer   wk_wenda_post_delta  --rotate >>/usr/local/coreseek/var/log/delta.log

script_index.sh

# cat script_index.sh
#!/bin/bash

/usr/local/coreseek/bin/indexer wk_courses --rotate >>/usr/local/coreseek/var/log/index.log
/usr/local/coreseek/bin/indexer wk_article --rotate >>/usr/local/coreseek/var/log/index.log
/usr/local/coreseek/bin/indexer wk_wenda_post --rotate >>/usr/local/coreseek/var/log/index.log

script_merge

# cat script_merge.sh
#!/bin/bash


/usr/local/coreseek/bin/indexer  --merge wk_article wk_article_delta --rotate --merge-dst-range deleted 0 0  --merge-killlists --rotate >>/usr/local/coreseek/var/log/index.log

/usr/local/coreseek/bin/indexer --merge wk_courses  wk_courses_delta --rotate --merge-dst-range deleted 0 0  --merge-killlists >>/usr/local/coreseek/var/log/index.log


/usr/local/coreseek/bin/indexer  --merge wk_wenda_post wk_wenda_post_delta  --rotate --merge-dst-range deleted 0 0  --merge-killlists >>/usr/local/coreseek/var/log/index.log
Last modification:January 5, 2020
如果觉得我的文章对你有用,请随意赞赏