Elasticsearch: Difference between revisions
No edit summary |
|||
| (11 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
{{Feature | {{Feature | ||
|image= | |image=Elasticsearch_logo.svg | ||
|imgdesc=Search | |imgdesc=Search | ||
|title= | |title= | ||
}} | }} | ||
{{#set:feature description = This site uses Elasticsearch for | {{#set:feature description = This site uses Elasticsearch for an amazing search experience! }} | ||
{{#set:feature notes = | {{#set:feature notes = [https://medium.com/@AIMDekTech/what-is-elasticsearch-why-elasticsearch-advantages-of-elasticsearch-47b81b549f4d The What, Why and Advantages of Elasticsearch] }} | ||
{{#set:feature tests = Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g. [https://wiki.freephile.org/wiki/index.php?title=Special:Search&profile=advanced&profile=advanced&fulltext=Search&search=ssh-agent&ns6=1 Search for 'ssh-agent' in the File namespace] }} | {{#set:feature tests = Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g.[https://wiki.freephile.org/wiki/index.php?title=Special:Search&profile=advanced&profile=advanced&fulltext=Search&search=ssh-agent&ns6=1 Search for 'ssh-agent' in the File namespace] }} | ||
{{#set:feature examples = }} | {{#set:feature examples = }} | ||
This site uses Elasticsearch for it's search functionality under the hood. | This site uses Elasticsearch for it's search functionality under the hood. | ||
==About== | |||
== About == | |||
Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works) | Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works) | ||
== Community == | ==Community== | ||
There is a Discourse forum at https://discuss.elastic.co/ | There is a Discourse forum at https://discuss.elastic.co/ | ||
== Features == | ==Features== | ||
See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | ||
=== Search Tips === | ===Search Tips=== | ||
This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the [Special:Search search results for 'Ansible']. Just prefix your search term with the <code>~</code> character. Now when you press enter, you'll go to the search results page with a full listing of results. | This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the [Special:Search search results for 'Ansible']. Just prefix your search term with the <code>~</code> character. Now when you press enter, you'll go to the search results page with a full listing of results. | ||
| Line 37: | Line 37: | ||
{{Messagebox | {{Messagebox | ||
| type = success | | type = success | ||
| text = Content as well as all files uploaded into the system are indexed. For example, [{{fullurl:Special:Search|search=fai|fulltext=Search|profile=all}} a search for "FAI"] lists both the [[Cloning]] article as well as the [[:File:Fai poster a4.pdf|PDF file]] And the file is not listed only because of the file name, but also because of the (indexed) file content. [{{fullurl: | | text = Content as well as all files uploaded into the system are indexed. For example, [{{fullurl:Special:Search|search=fai|fulltext=Search|profile=all}} a search for "FAI"] lists both the [[Cloning]] article as well as the [[:File:Fai poster a4.pdf|PDF file]] And the file is not listed only because of the file name, but also because of the (indexed) file content. [{{fullurl:Special:Search|search=ed%20roman|fulltext=Search|profile=all}} A search for "Ed Roman"] will bring up the Enterprise Java Beans Design Patterns PDF file ([{{fullurl:File:Ejbdesignpatterns.pdf|page=13}} see p. 13 where Ed Roman is mentioned].) | ||
}} | }} | ||
| Line 52: | Line 52: | ||
}} | }} | ||
== Video == | ==Video== | ||
* [https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and | |||
*[https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and it's rapid adoption by users worldwide. -- ''RISE | August 2015'' | |||
== Troubleshooting == | ==Troubleshooting== | ||
Make sure that you use the official packages from Elasticsearch, and NOT the Ubuntu packages. See below for the installation guide. Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04). Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): <code>/usr/share/elasticsearch/bin/elasticsearch</code> | Make sure that you use the official packages from Elasticsearch, and NOT the Ubuntu packages. See below for the installation guide. Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04). Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): <code>/usr/share/elasticsearch/bin/elasticsearch</code> | ||
| Line 74: | Line 75: | ||
For version 1.7 on Ubuntu 16.04, although the system uses SystemD, there is still a SysV init script that controls elasticsearch. I think this means that you can't get the logging info into the journalctl system... See /etc/init.d/elasticsearch According to the [https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html install guide], you should be able to edit the elasticsearch.service file, and take out the --quiet option to make it log to the journal. When that is enabled, you can do <code>journalctl --unit elasticsearch</code> to quickly see the info being logged. | For version 1.7 on Ubuntu 16.04, although the system uses SystemD, there is still a SysV init script that controls elasticsearch. I think this means that you can't get the logging info into the journalctl system... See /etc/init.d/elasticsearch According to the [https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html install guide], you should be able to edit the elasticsearch.service file, and take out the --quiet option to make it log to the journal. When that is enabled, you can do <code>journalctl --unit elasticsearch</code> to quickly see the info being logged. | ||
== Production Configuration == | === Out of memory === | ||
If Elasticsearch has died, and when you try to start it with <code>systemctl start elasticsearch</code> | |||
you may see an error like ''''Not enough space'''<nowiki/>' - which in this case is actually a memory error. If you look at disk space, there is no problem. | |||
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000080000000, 2147483648, 0) failed; error='Not enough space' (errno=12) | |||
Elasticsearch will write an error log to <code>/var/log/elasticsearch/hs_err*</code> where the PID is appended, eg. <code>_pid4061142.log</code> There is valuable info in that file to tell you specifically what's wrong; and suggests ways to fix it. In my case, I only had to stop both kibana and elasticsearch; and then start elasticsearch first to let it claim it's memory; followed by starting kibana | |||
There's a good chance that if you are running out of memory that it is because your virtual host is not configured to use any [[Swap]] partition. | |||
==Production Configuration== | |||
ElasticSearch looks for a configuration file to include, and uses a search path for that include. You can specify it on the command-line; through an environment variable; or just make sure that your file is found in the search path. (My default was found at /usr/share/elasticsearch/bin/elasticsearch.in.sh) | ElasticSearch looks for a configuration file to include, and uses a search path for that include. You can specify it on the command-line; through an environment variable; or just make sure that your file is found in the search path. (My default was found at /usr/share/elasticsearch/bin/elasticsearch.in.sh) | ||
< | <syntaxhighlight lang="bash"> | ||
# If an include wasn't specified in the environment, then search for one... | # If an include wasn't specified in the environment, then search for one... | ||
if [ "x$ES_INCLUDE" = "x" ]; then | if [ "x$ES_INCLUDE" = "x" ]; then | ||
| Line 95: | Line 104: | ||
. "$ES_INCLUDE" | . "$ES_INCLUDE" | ||
fi | fi | ||
</ | </syntaxhighlight> | ||
== Elasticsearch for MediaWiki == | ==Elasticsearch for MediaWiki== | ||
To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]]. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use <code>yum</code> or <code>apt</code> repositories for that). | To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]]. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use <code>yum</code> or <code>apt</code> repositories for that). | ||
| Line 104: | Line 113: | ||
This system has three components: Elastica, CirrusSearch, and Elasticsearch. | This system has three components: Elastica, CirrusSearch, and Elasticsearch. | ||
;Elastica :Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration. | |||
;CirrusSearch :CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch. | |||
;Elasticsearch :is a Java application, so you need [[Java]] installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup. | |||
< | ==Elasticsearch for QualityBox== | ||
<syntaxhighlight lang="haproxy"> | |||
# disallow PUT and DELETE methods through the web | # disallow PUT and DELETE methods through the web | ||
# administrators will need to use local curl commands to bypass the load-balancer | # administrators will need to use local curl commands to bypass the load-balancer | ||
| Line 130: | Line 140: | ||
server es1 127.0.0.1:9200 weight 1 check inter 1000 rise 5 fall 1 | server es1 127.0.0.1:9200 weight 1 check inter 1000 rise 5 fall 1 | ||
</ | </syntaxhighlight> | ||
You can add multi-wiki search like so using [https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/CirrusSearch.php $wgCirrusSearchEnableCrossProjectSearch], $wgCirrusSearchWikiToNameMap and $wgCirrusSearchInterwikiSources: | You can add multi-wiki search like so using [https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/CirrusSearch.php $wgCirrusSearchEnableCrossProjectSearch], $wgCirrusSearchWikiToNameMap and $wgCirrusSearchInterwikiSources: | ||
| Line 137: | Line 147: | ||
< | <syntaxhighlight lang="php"> | ||
if ( $wikiId !== 'commons' ) { | if ( $wikiId !== 'commons' ) { | ||
$wgCirrusSearchEnableCrossProjectSearch = true; | $wgCirrusSearchEnableCrossProjectSearch = true; | ||
| Line 149: | Line 159: | ||
</ | </syntaxhighlight> | ||
== Where is my Elasticsearch? == | ==Where is my Elasticsearch?== | ||
Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this: | Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this: | ||
< | <syntaxhighlight lang="bash"> | ||
curl -XGET "http://localhost:9200/_nodes/settings?pretty=true" | curl -XGET "http://localhost:9200/_nodes/settings?pretty=true" | ||
</ | </syntaxhighlight> | ||
Other direct commands | Other direct commands | ||
< | <syntaxhighlight lang="bash"> | ||
curl 'localhost:9200/_tasks?pretty' | curl 'localhost:9200/_tasks?pretty' | ||
curl 'localhost:9200/_cat/nodes?pretty' | curl 'localhost:9200/_cat/nodes?pretty' | ||
| Line 167: | Line 177: | ||
curl 'localhost:9200/_cat/indices?v' | curl 'localhost:9200/_cat/indices?v' | ||
</ | </syntaxhighlight> | ||
The configuration for Elasticsearch is normally held in two files: <code>/etc/elasticsearch/elasticsearch.yml</code> and <code>/etc/elasticsearch/logging.yml</code> | The configuration for Elasticsearch is normally held in two files: <code>/etc/elasticsearch/elasticsearch.yml</code> and <code>/etc/elasticsearch/logging.yml</code> | ||
== Starting / Stopping == | ==Starting / Stopping== | ||
Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD | Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD | ||
== Upgrading == | ==Upgrading== | ||
QualityBox 34 runs MediaWiki 1.34.x and Elasticsearch 6.x | |||
The best way to upgrade to QB 34 from QB 32.x (ES 5.x) is to [https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/HEAD/README blow away the index and start over]. | |||
* DO NOT follow the [https://www.elastic.co/guide/en/elastic-stack/6.8/upgrading-elastic-stack.html upgrade instructions]. Migration Assistant does not work for 5.6 in testing. | |||
* DO NOT prepare a 32.x host by [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/installing-xpack-es.html installing X-Pack] and preparing the indexes using the Migration Assistant. [https://github.com/elastic/elasticsearch/issues/30085#issuecomment-685286211 It does not work!]. And you'll avoid the hassle of registering for a (free) limited license. Plus the hassle of installing and upgrading Kibana just to get the Migration Assistant. | |||
* [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/breaking_60_indices_changes.html Breaking changes in 6.x] are taken care of by the Elastica Extension | |||
{| class="wikitable" | {| class="wikitable" | ||
|+ Version Dependencies | |+Version Dependencies | ||
|- | |- | ||
! MediaWiki !! ElasticSearch !! Elastica !! CirrusSearch !! Cluster Restart? !! Reindex? | !MediaWiki!!ElasticSearch!!Elastica!!CirrusSearch!!Cluster Restart?!!Reindex? | ||
|- | |- | ||
| REL1_27 || 1.x || REL1_27 || REL1_27 || n/a || n/a | |REL1_27||1.x||REL1_27||REL1_27||n/a||n/a | ||
|- | |- | ||
| REL1_28 || 2.x || REL1_28 || REL1_28 || restart || yes <ref>For more information about upgrading from 1.x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref> | |REL1_28||2.x||REL1_28||REL1_28||restart||yes <ref>For more information about upgrading from 1.x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref> | ||
|- | |- | ||
| REL1_29 || 5.3 | |REL1_29||5.3.x or 5.4.x||REL1_29||REL1_29||restart||yes <ref>For more information about upgrading from 2.4 to 5.6, see [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 5.6 Reference.</ref> | ||
|- | |- | ||
| REL1_30 || | |REL1_30||5.3.x or 5.4.x||REL1_30||REL1_30||no <ref>Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html</ref>||Depends <ref>Elasticsearch can read indices created in the '''previous major version'''. Older indices must be reindexed or deleted. Elasticsearch will fail to start if incompatible indices are present.</ref> | ||
|- | |||
|REL 1_31 and 1_32 | |||
|''5.5.x or 5.6.x'' | |||
| | |||
| | |||
| | |||
| | |||
|- | |||
|REL 1_33 and !_34 and 1_35 | |||
|6.5.x (6.5.4 rec) | |||
| | |||
| | |||
| | |||
| | |||
|} | |} | ||
Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older setups). Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty. | Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older setups). Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty. | ||
== Reindexing == | ==Reindexing== | ||
The most basic form of the reindex API just copies documents from one index into another. You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field. Usually though, you are reindexing because you are forced to during a major version upgrade. | The most basic form of the reindex API just copies documents from one index into another. You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field. Usually though, you are reindexing because you are forced to during a major version upgrade. | ||
| Line 209: | Line 236: | ||
== Monitoring == | ==Monitoring== | ||
It's suggested to use '''[[Kibana]]''' as a monitoring and management interface to Elasticsearch. | |||
You'll need to set xpack.monitoring.collection.enabled=true for Elasticsearch to do self-monitoring. If that is enabled in your elasticsearch.yml, then you can navigate in the left-panel of Kibana to "Stack Monitoring" under the "Management" section at the bottom. However that is called 'internal monitoring'. It has long been recommended to use a distinct 'external' service called '''Metricbeat''' for monitoring the health of your Elasticsearch cluster. | |||
[[File:Kibana-self-monitoring.png|thumb]] | |||
Although you can turn on monitoring in the Kibana Web interface, you can also use curl on the command line to inspect and set cluster settings. | |||
<syntaxhighlight lang="bash"> | |||
curl -X GET "localhost:9200/_cluster/settings?pretty" | |||
curl -X PUT "localhost:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d' | |||
{ | |||
"persistent": { | |||
"xpack.monitoring.collection.enabled": true | |||
} | |||
} | |||
</syntaxhighlight> | |||
== Installation == | ==Installation== | ||
Here's a quick example of how we got all the parts installed on an [[Ubuntu]] server. | Here's a quick example of how we got all the parts installed on an [[Ubuntu]] server. | ||
< | <syntaxhighlight lang="bash"> | ||
# is the curl extension to PHP installed? | # is the curl extension to PHP installed? | ||
php -i |grep -C2 curl | php -i |grep -C2 curl | ||
| Line 269: | Line 297: | ||
sudo -u www-data php /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip | sudo -u www-data php /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip | ||
sudo -u www-data php /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse | sudo -u www-data php /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse | ||
</ | </syntaxhighlight> | ||
Checking if elasticsearch is running | Checking if elasticsearch is running | ||
< | <syntaxhighlight lang="bash"> | ||
curl http://localhost:9200/ | curl http://localhost:9200/ | ||
</ | </syntaxhighlight> | ||
< | <syntaxhighlight lang="javascript"> | ||
{ | { | ||
"name" : "Carmella Unuscione", | "name" : "Carmella Unuscione", | ||
| Line 302: | Line 330: | ||
"tagline" : "You Know, for Search" | "tagline" : "You Know, for Search" | ||
} | } | ||
</ | </syntaxhighlight> | ||
==Resources== | |||
*See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | |||
* See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | *https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php | ||
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php | *https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README | ||
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README | *https://wikitech.wikimedia.org/wiki/Search | ||
* https://wikitech.wikimedia.org/wiki/Search | |||
== Problems == | ==Problems== | ||
We recently ran a 'rebuild-all' script to update Elasticsearch indexes | We recently ran a 'rebuild-all' script to update Elasticsearch indexes | ||
<pre> | <pre> | ||
| Line 338: | Line 367: | ||
===SOLVED=== | ===SOLVED=== | ||
You can delete the unwanted index like this with [[curl]]: | You can delete the unwanted index like this with [[curl]]: | ||
< | <syntaxhighlight lang="bash"> | ||
curl -XDELETE "http://localhost:9200/wiki_cod_content" | curl -XDELETE "http://localhost:9200/wiki_cod_content" | ||
</ | </syntaxhighlight> | ||
See more about deleting wikis and all indexes at https://github.com/freephile/meza/blob/6658c795a4b5e5b1a5afcb05c62cf0bcc2d0203b/src/scripts/delete.wikis.sh | See more about deleting wikis and all indexes at https://github.com/freephile/meza/blob/6658c795a4b5e5b1a5afcb05c62cf0bcc2d0203b/src/scripts/delete.wikis.sh | ||