Elasticsearch
This site uses Elasticsearch for it's search functionality under the hood.
Elasticsearch | |
---|---|
Summary | |
Description: | This site uses Elasticsearch for an amazing search experience! |
More | |
Notes: | The What, Why and Advantages of Elasticsearch |
Test: | Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g.Search for 'ssh-agent' in the File namespace |
Contents
About[edit | edit source]
Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video [1] but you have to submit an email to view it. (joe@example.com probably works)
Community[edit | edit source]
There is a Discourse forum at https://discuss.elastic.co/
Features[edit | edit source]
See mw:Help:CirrusSearch for help on how to best use the search functionality (including regex searches).
Search Tips[edit | edit source]
This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the search results for 'Ansible'. Just prefix your search term with the ~
character. Now when you press enter, you'll go to the search results page with a full listing of results.
Video[edit | edit source]
- Building Elasticsearch: From Idea to {code} to Adoption The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. Shay Banon, creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and it�s rapid adoption by users worldwide. -- RISE | August 2015
Troubleshooting[edit | edit source]
Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04). Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): /usr/share/elasticsearch/bin/elasticsearch
Usage: /usr/share/elasticsearch/bin/elasticsearch [-vdh] [-p pidfile] [-D prop] [-X prop] Start elasticsearch. -d daemonize (run in background) -p pidfile write PID to <pidfile> -h --help print command line options -v print elasticsearch version, then exit -D prop set JAVA system property -X prop set non-standard JAVA system property --prop=val --prop val set elasticsearch property (i.e. -Des.<prop>=<val>)
Elasticsearch for MediaWiki[edit | edit source]
To improve the out-of-the-box search experience with MediaWiki, you should install the mw:Extension:CirrusSearch. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the Elasticsearch system (you can use yum
or apt
repositories for that).
Wikitech gives some information about how WMF uses Elasticsearch at https://wikitech.wikimedia.org/wiki/Search
This system has three components: Elastica, CirrusSearch, and Elasticsearch.
- Elastica
- Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the Elastica library. It has no configuration.
- CirrusSearch
- CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
- Elasticsearch
- is a Java application, so you need Java installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup.
Where is my Elasticsearch?[edit | edit source]
Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this:
curl -XGET "http://localhost:9200/_nodes/settings?pretty=true"
Other direct commands
curl 'localhost:9200/_tasks?pretty'
curl 'localhost:9200/_cat/nodes?pretty'
curl 'localhost:9200/_nodes?pretty'
curl 'localhost:9200/_nodes/settings?pretty=true'
curl 'localhost:9200/_cat/health?pretty'
curl 'localhost:9200/_cluster/health?pretty=true'
curl 'localhost:9200/_cluster/state?pretty'
The configuration for Elasticsearch is normally held in two files: /etc/elasticsearch/elasticsearch.yml
and /etc/elasticsearch/logging.yml
Starting / Stopping[edit | edit source]
Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD
Upgrading[edit | edit source]
Old versions of Meza run the legacy REL1_27 release of MediaWiki, while the "beta" Meza runs REL1_28. Our goal should be to run "stable" REL1_29 asap.
MediaWiki | ElasticSearch | Elastica | CirrusSearch | Cluster Restart? | Reindex? |
---|---|---|---|---|---|
REL1_27 | 1.x | REL1_27 | REL1_27 | n/a | n/a |
REL1_28 | 2.x | REL1_28 | REL1_28 | restart | yes [2] |
REL1_29 | 5.3+ | REL1_29 | REL1_29 | restart | yes [3] |
REL1_30 | 6.x | REL1_30 | REL1_30 | no [4] | Depends [5] |
With version 6.0 of ElasticSearch already released, we should immediately upgrade MediaWiki to REL1_29 which is compatible with ElasticSearch 6.x. In the table above es6.x is listed as compatible with mwREL1_30, but in reality we can use es6.x starting in mwREL1_29.
Elastic Co. provides an ansible role to manage your installation (including a 2.x branch for older setups). Their guide to upgrading covers the nitty gritty.
Reindexing[edit | edit source]
The most basic form of the reindex API just copies documents from one index into another. You might reindex to change the name of a field. Usually though, you are reindexing because you are forced to during a major version upgrade.
To assist in the upgrade process there is a plugin that assists with the tasks.
Also, you can reindex from a remote (cluster) so that you can upgrade without downtime because once the new cluster is ready, you can just switch to it with minimal disruption. [6]
If you are re-indexing your existing Meza installation, you can sudo meza maint rebuild monolith --tags search-index
Installation[edit | edit source]
Here's a quick example of how we got all the parts installed on an Ubuntu server.
# is the curl extension to PHP installed?
php -i |grep -C2 curl
# no curl?
sudo apt-get install php5-curl
pushd extensions
java -version
# no java
sudo apt-get install default-jre
# need the jdk
sudo apt-get install default-jdk
# add JAVA_HOME to /etc/environment
sudo update-alternatives --config java
echo 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java' |sudo tee -a /etc/environment
source /etc/environment
echo $JAVA_HOME
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
#### don't do this because 2.1.1 is too new
#### sudo apt-get update && sudo apt-get install elasticsearch
#### get the 1.7.x version and install that
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.4.deb
sudo dpkg -i elasticsearch-1.7.4.deb
echo PATH=$PATH:/usr/share/elasticsearch/bin/ | sudo tee -a /etc/environment
source /etc/environment
which elasticsearch
sudo service elasticsearch start
# check with curl (see below)
# using SysV init
sudo update-rc.d elasticsearch defaults 95 10
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CirrusSearch.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Elastica.git
cd Elastica
composer install
# load Special:Version to check
sudo -u www-data php ./w/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
sudo -u www-data php /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
sudo -u www-data php /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
Checking if elasticsearch is running
curl http://localhost:9200/
{
"name" : "Carmella Unuscione",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.1.1",
"build_hash" : "40e2c53a6b6c2972b3d13846e450e66f4375bd71",
"build_timestamp" : "2015-12-15T13:05:55Z",
"build_snapshot" : false,
"lucene_version" : "5.3.1"
},
"tagline" : "You Know, for Search"
}
// second time around with the older version installed
{
"status" : 200,
"name" : "Richard Rider",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "1.7.4",
"build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e",
"build_timestamp" : "2015-12-15T11:25:18Z",
"build_snapshot" : false,
"lucene_version" : "4.10.4"
},
"tagline" : "You Know, for Search"
}
Resources[edit | edit source]
- See mw:Help:CirrusSearch for help on how to best use the search functionality (including regex searches).
- https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php
- https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README
- https://wikitech.wikimedia.org/wiki/Search
Problems[edit | edit source]
We recently ran a 'rebuild-all' script to update Elasticsearch indexes
[centos@ip-10-0-50-189 .deploy-meza]$ time sudo ./elastic-rebuild-all.sh demo Rebuilding index for demo Output log: /opt/data-meza/logs/search-index/demo.2017-09-07_13:15:01.log elastic-build-index completed for "demo" at 2017-09-07_13:15:03 real 0m1.653s user 0m1.327s sys 0m0.199s [centos@ip-10-0-50-189 .deploy-meza]$ tail /opt/data-meza/logs/search-index/demo.2017-09-07_13:15:01.log Inferring index identifier...error Looks like the index has more than one identifier.
You should delete all but the one of them currently active. Here is the list: wiki_demo_content,wiki_demo_content_first
Notice: Undefined index: HTTP_HOST in /opt/htdocs/mediawiki/LocalSettings.php on line 219 [ wiki_demo] index(es) do not exist. Did you forget to run updateSearchIndexConfig? Notice: Undefined index: HTTP_HOST in /opt/htdocs/mediawiki/LocalSettings.php on line 219 [ wiki_demo] index(es) do not exist. Did you forget to run updateSearchIndexConfig? ******* Elastic Search build index complete! *******
Emphasis added
SOLVED[edit | edit source]
You can delete the unwanted index like this with curl:
curl -XDELETE "http://localhost:9200/wiki_cod_content"
See more about deleting wikis and all indexes at https://github.com/freephile/meza/blob/6658c795a4b5e5b1a5afcb05c62cf0bcc2d0203b/src/scripts/delete.wikis.sh
References[edit source]
- ↑ https://www.elastic.co/webinars/getting-started-elasticsearch
- ↑ For more information about upgrading from 1.x to 2.4, see Upgrading Elasticsearch in the Elasticsearch 2.4 Reference.
- ↑ For more information about upgrading from 2.4 to 5.6, see Upgrading Elasticsearch in the Elasticsearch 5.6 Reference.
- ↑ Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html
- ↑ Elasticsearch can read indices created in the previous major version. Older indices must be reindexed or deleted. Elasticsearch will fail to start if incompatible indices are present.
- ↑ https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html