Difference between revisions of "Elasticsearch"
Jump to navigation
Jump to search
m (→Video: replace badly encoded character) |
|||
(34 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | This site uses Elasticsearch for it's search functionality under the hood. | ||
{{Feature | {{Feature | ||
− | | | + | |explains= Search |
− | | | + | |description= This site uses Elasticsearch for the best possible search experience [[File:System-search.svg|link=Search|thumb|128px]] |
− | | | + | |notes= |
+ | |tests= | ||
+ | |examples= | ||
}} | }} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==About== | + | == About == |
Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works) | Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works) | ||
− | + | == Features == | |
− | |||
− | |||
− | ==Features== | ||
See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | ||
− | ===Search Tips=== | + | === Search Tips === |
− | This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the | + | This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the search results for 'Ansible'. Just prefix your search term with the <code>~</code> character. Now when you press enter, you'll go to the search results page with a full listing of results. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
{{Messagebox | {{Messagebox | ||
Line 33: | Line 22: | ||
| text = Different indexes are created for the entire contents of the wiki. Each index is weighted differently. So, for example, "Lead-in" text is the wikitext between the top of the page and the first heading. Words found here are deemed more relevant to a users search query than the same word if found in the body text of an article. So, in this wiki, [{{fullurl:Special:Search|search=yaml|fulltext=Search}} searching for the word "YAML"] puts the [[Ansible]] article ahead of the [[Eclipse]] article in search results. | | text = Different indexes are created for the entire contents of the wiki. Each index is weighted differently. So, for example, "Lead-in" text is the wikitext between the top of the page and the first heading. Words found here are deemed more relevant to a users search query than the same word if found in the body text of an article. So, in this wiki, [{{fullurl:Special:Search|search=yaml|fulltext=Search}} searching for the word "YAML"] puts the [[Ansible]] article ahead of the [[Eclipse]] article in search results. | ||
}} | }} | ||
− | |||
{{Messagebox | {{Messagebox | ||
| type = success | | type = success | ||
− | | text = Content as well as all files uploaded into the system are indexed. For example, [{{fullurl:Special:Search|search=fai|fulltext=Search|profile=all}} a search for "FAI"] lists both the [[Cloning]] article as well as the [[:File:Fai poster a4.pdf|PDF file]] And the file is not listed only because of the file name, but also because of the (indexed) file content. [{{fullurl: | + | | text = Content as well as all files uploaded into the system are indexed. For example, [{{fullurl:Special:Search|search=fai|fulltext=Search|profile=all}} a search for "FAI"] lists both the [[Cloning]] article as well as the [[:File:Fai poster a4.pdf|PDF file]] And the file is not listed only because of the file name, but also because of the (indexed) file content. [{{fullurl:S pecial:Search|search=ed%20roman|fulltext=Search|profile=all}} A search for "Ed Roman"] will bring up the Enterprise Java Beans Design Patterns PDF file ([{{fullurl:File:Ejbdesignpatterns.pdf|page=13}} see p. 13 where Ed Roman is mentioned].) |
}} | }} | ||
− | + | == Video == | |
− | + | * [https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and rapid adoption by users worldwide. -- ''RISE | August 2015'' | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==Video== | ||
− | |||
− | *[https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==Elasticsearch for MediaWiki== | + | == Elasticsearch for MediaWiki == |
To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]]. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use <code>yum</code> or <code>apt</code> repositories for that). | To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]]. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use <code>yum</code> or <code>apt</code> repositories for that). | ||
Line 104: | Line 36: | ||
This system has three components: Elastica, CirrusSearch, and Elasticsearch. | This system has three components: Elastica, CirrusSearch, and Elasticsearch. | ||
+ | ; Elastica : Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration. | ||
+ | ; CirrusSearch : CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch. | ||
+ | ; Elasticsearch : is a Java application, so you need [[Java]] installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup. | ||
− | + | == Where is my Elasticsearch? == | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==Where is my Elasticsearch?== | ||
Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this: | Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this: | ||
<source lang="bash"> | <source lang="bash"> | ||
− | curl | + | curl "localhost:9200/_nodes/settings?pretty=true" |
</source> | </source> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | == Starting / Stopping == | |
− | |||
− | |||
− | |||
− | |||
− | ==Starting / Stopping== | ||
Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD | Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD | ||
− | ==Upgrading== | + | == Upgrading == |
− | + | Old versions of Meza run the legacy REL1_27 release of MediaWiki, while the "beta" Meza runs REL1_28. Our goal should be to run "stable" REL1_29 asap. | |
− | |||
− | |||
− | |||
− | |||
− | |||
{| class="wikitable" | {| class="wikitable" | ||
− | |+Version Dependencies | + | |+ Version Dependencies |
|- | |- | ||
− | !MediaWiki!!ElasticSearch!!Elastica!!CirrusSearch!!Cluster Restart?!!Reindex? | + | ! MediaWiki !! ElasticSearch !! Elastica !! CirrusSearch !! Cluster Restart? !! Reindex? |
|- | |- | ||
− | |REL1_27||1.x||REL1_27||REL1_27||n/a||n/a | + | | REL1_27 || 1.x || REL1_27 || REL1_27 || n/a || n/a |
|- | |- | ||
− | |REL1_28||2.x||REL1_28||REL1_28||restart||yes <ref>For more information about upgrading from 1.x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref> | + | | REL1_28 || 2.x || REL1_28 || REL1_28 || restart || yes <ref>For more information about upgrading from 1.x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref> |
|- | |- | ||
− | |REL1_29||5.3 | + | | REL1_29 || 5.3+ || REL1_29 || REL1_29 || restart || yes <ref>For more information about upgrading from 2.4 to 5.6, see [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 5.6 Reference.</ref> |
|- | |- | ||
− | |REL1_30|| | + | | REL1_30 || 6.x || REL1_30 || REL1_30 || no <ref>Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html</ref> || Depends <ref>Elasticsearch can read indices created in the '''previous major version'''. Older indices must be reindexed or deleted. Elasticsearch will fail to start if incompatible indices are present.</ref> |
− | + | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|} | |} | ||
+ | With version 6.0 of ElasticSearch already released, we should immediately upgrade MediaWiki to REL1_29 which is compatible with ElasticSearch 6.x. In the table above es6.x is listed as compatible with mwREL1_30, but in reality we can use es6.x starting in mwREL1_29. | ||
Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older setups). Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty. | Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older setups). Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty. | ||
− | ==Reindexing== | + | == Reindexing == |
The most basic form of the reindex API just copies documents from one index into another. You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field. Usually though, you are reindexing because you are forced to during a major version upgrade. | The most basic form of the reindex API just copies documents from one index into another. You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field. Usually though, you are reindexing because you are forced to during a major version upgrade. | ||
Line 224: | Line 81: | ||
<ref>https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html</ref> | <ref>https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html</ref> | ||
− | + | == Installation == | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ==Installation== | ||
Here's a quick example of how we got all the parts installed on an [[Ubuntu]] server. | Here's a quick example of how we got all the parts installed on an [[Ubuntu]] server. | ||
<source lang="bash"> | <source lang="bash"> | ||
Line 321: | Line 156: | ||
</source> | </source> | ||
− | ==Resources== | + | == Resources == |
+ | * See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). | ||
+ | * https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php | ||
+ | * https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/HEAD/README | ||
+ | * https://wikitech.wikimedia.org/wiki/Search | ||
− | + | == Problems == | |
− | |||
− | |||
− | |||
− | |||
− | ==Problems== | ||
We recently ran a 'rebuild-all' script to update Elasticsearch indexes | We recently ran a 'rebuild-all' script to update Elasticsearch indexes | ||
<pre> | <pre> | ||
Line 356: | Line 190: | ||
===SOLVED=== | ===SOLVED=== | ||
You can delete the unwanted index like this with [[curl]]: | You can delete the unwanted index like this with [[curl]]: | ||
− | <source lang="bash | + | <source lang="bash> |
curl -XDELETE "http://localhost:9200/wiki_cod_content" | curl -XDELETE "http://localhost:9200/wiki_cod_content" | ||
</source> | </source> | ||
Line 365: | Line 199: | ||
[[Category:Search]] | [[Category:Search]] | ||
− | |||
− | |||
− |