Elasticsearch: Difference between revisions

From Freephile Wiki
(Link fixes and enhancements)
(link to Kibana article)
 
(5 intermediate revisions by the same user not shown)
Line 5: Line 5:
}}
}}
{{#set:feature description = This site uses Elasticsearch for an amazing search experience! }}
{{#set:feature description = This site uses Elasticsearch for an amazing search experience! }}
{{#set:feature notes = }}
{{#set:feature notes = [https://medium.com/@AIMDekTech/what-is-elasticsearch-why-elasticsearch-advantages-of-elasticsearch-47b81b549f4d The What, Why and Advantages of Elasticsearch] }}
{{#set:feature tests = Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g.[{{fullurl:Special:Search|profile=advanced|fulltext=Search|search=ssh-agent|ns6=1}}Search for 'ssh-agent' in the File namespace] }}
{{#set:feature tests = Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g.[https://wiki.freephile.org/wiki/index.php?title=Special:Search&profile=advanced&profile=advanced&fulltext=Search&search=ssh-agent&ns6=1 Search for 'ssh-agent' in the File namespace] }}
{{#set:feature examples =  }}
{{#set:feature examples =  }}
This site uses Elasticsearch for it's search functionality under the hood.
This site uses Elasticsearch for it's search functionality under the hood.




 
==About==
== About ==
Elasticsearch is a distributed RESTful search engine built for the cloud.  See https://www.elastic.co/about  I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works)
Elasticsearch is a distributed RESTful search engine built for the cloud.  See https://www.elastic.co/about  I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works)


== Community ==
==Community==
There is a Discourse forum at https://discuss.elastic.co/
There is a Discourse forum at https://discuss.elastic.co/


== Features ==
==Features==
See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).
See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).


=== Search Tips ===
===Search Tips===
This wiki supports Elasticsearch features.  So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page.  Instead, you prefer to actually see all the [Special:Search search results for 'Ansible'].  Just prefix your search term with the <code>~</code> character.  Now when you press enter, you'll go to the search results page with a full listing of results.
This wiki supports Elasticsearch features.  So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page.  Instead, you prefer to actually see all the [Special:Search search results for 'Ansible'].  Just prefix your search term with the <code>~</code> character.  Now when you press enter, you'll go to the search results page with a full listing of results.


Line 52: Line 51:
}}
}}


== Video ==
==Video==
* [https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and it�s rapid adoption by users worldwide. -- ''RISE | August 2015''


== Troubleshooting ==
*[https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and it's rapid adoption by users worldwide. -- ''RISE | August 2015''
 
==Troubleshooting==
Make sure that you use the official packages from Elasticsearch, and NOT the Ubuntu packages. See below for the installation guide. Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04).  Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): <code>/usr/share/elasticsearch/bin/elasticsearch</code>
Make sure that you use the official packages from Elasticsearch, and NOT the Ubuntu packages. See below for the installation guide. Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04).  Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): <code>/usr/share/elasticsearch/bin/elasticsearch</code>


Line 74: Line 74:
For version 1.7 on Ubuntu 16.04, although the system uses SystemD, there is still a SysV init script that controls elasticsearch. I think this means that you can't get the logging info into the journalctl system... See /etc/init.d/elasticsearch According to the [https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html install guide], you should be able to edit the elasticsearch.service file, and take out the --quiet option to make it log to the journal.  When that is enabled, you can do <code>journalctl --unit elasticsearch</code> to quickly see the info being logged.
For version 1.7 on Ubuntu 16.04, although the system uses SystemD, there is still a SysV init script that controls elasticsearch. I think this means that you can't get the logging info into the journalctl system... See /etc/init.d/elasticsearch According to the [https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html install guide], you should be able to edit the elasticsearch.service file, and take out the --quiet option to make it log to the journal.  When that is enabled, you can do <code>journalctl --unit elasticsearch</code> to quickly see the info being logged.


== Production Configuration ==
==Production Configuration==
ElasticSearch looks for a configuration file to include, and uses a search path for that include. You can specify it on the command-line; through an environment variable; or just make sure that your file is found in the search path. (My default was found at /usr/share/elasticsearch/bin/elasticsearch.in.sh)
ElasticSearch looks for a configuration file to include, and uses a search path for that include. You can specify it on the command-line; through an environment variable; or just make sure that your file is found in the search path. (My default was found at /usr/share/elasticsearch/bin/elasticsearch.in.sh)
<source lang="bash">
<source lang="bash">
Line 98: Line 98:




== Elasticsearch for MediaWiki ==
==Elasticsearch for MediaWiki==
To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]].  CirrusSearch is just a connector to the Elasticsearch engine.  Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use <code>yum</code> or <code>apt</code> repositories for that).   
To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]].  CirrusSearch is just a connector to the Elasticsearch engine.  Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use <code>yum</code> or <code>apt</code> repositories for that).   


Line 104: Line 104:


This system has three components: Elastica, CirrusSearch, and Elasticsearch.
This system has three components: Elastica, CirrusSearch, and Elasticsearch.
; Elastica : Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration.
; CirrusSearch : CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
; Elasticsearch : is a Java application, so you need [[Java]] installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup.


== Elasticsearch for QualityBox ==
;Elastica :Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration.
;CirrusSearch :CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
;Elasticsearch :is a Java application, so you need [[Java]] installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup.
 
==Elasticsearch for QualityBox==


<source lang="haproxy">
<source lang="haproxy">
Line 151: Line 152:
</source>
</source>


== Where is my Elasticsearch? ==
==Where is my Elasticsearch?==
Maybe you installed elasticsearch, but have no idea where it resides on your system.  Try this:
Maybe you installed elasticsearch, but have no idea where it resides on your system.  Try this:
<source lang="bash">
<source lang="bash">
Line 171: Line 172:
The configuration for Elasticsearch is normally held in two files: <code>/etc/elasticsearch/elasticsearch.yml</code> and <code>/etc/elasticsearch/logging.yml</code>
The configuration for Elasticsearch is normally held in two files: <code>/etc/elasticsearch/elasticsearch.yml</code> and <code>/etc/elasticsearch/logging.yml</code>


== Starting / Stopping ==
==Starting / Stopping==
Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD
Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD


== Upgrading ==
==Upgrading==
Old versions of Meza run the legacy REL1_27 release of MediaWiki, while the "beta" Meza runs REL1_28. Our goal should be to run "stable" REL1_29 asap.
QualityBox 34 runs MediaWiki 1.34.x and Elasticsearch 6.x


The best way to upgrade to QB 34 from QB 32.x (ES 5.x) is to [https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/HEAD/README blow away the index and start over].
* DO NOT follow the [https://www.elastic.co/guide/en/elastic-stack/6.8/upgrading-elastic-stack.html upgrade instructions]. Migration Assistant does not work for 5.6 in testing.
* DO NOT prepare a 32.x host by [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/installing-xpack-es.html installing X-Pack] and preparing the indexes using the Migration Assistant. [https://github.com/elastic/elasticsearch/issues/30085#issuecomment-685286211 It does not work!]. And you'll avoid the hassle of registering for a (free) limited license. Plus the hassle of installing and upgrading Kibana just to get the Migration Assistant.
* [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/breaking_60_indices_changes.html Breaking changes in 6.x] are taken care of by the Elastica Extension


{| class="wikitable"
{| class="wikitable"
|+ Version Dependencies
|+Version Dependencies
|-
!MediaWiki!!ElasticSearch!!Elastica!!CirrusSearch!!Cluster Restart?!!Reindex?
|-
|REL1_27||1.x||REL1_27||REL1_27||n/a||n/a
|-
|-
! MediaWiki !! ElasticSearch !! Elastica !! CirrusSearch !! Cluster Restart? !! Reindex?
|REL1_28||2.x||REL1_28||REL1_28||restart||yes <ref>For more information about upgrading from 1.x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref>
|-
|-
| REL1_27 || 1.x || REL1_27 || REL1_27 || n/a || n/a
|REL1_29||5.3.x or 5.4.x||REL1_29||REL1_29||restart||yes <ref>For more information about upgrading from 2.4 to 5.6, see [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 5.6 Reference.</ref>
|-
|-
| REL1_28 || 2.x || REL1_28 || REL1_28 || restart || yes <ref>For more information about upgrading from 1.x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref>
|REL1_30||5.3.x or 5.4.x||REL1_30||REL1_30||no <ref>Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html</ref>||Depends <ref>Elasticsearch can read indices created in the '''previous major version'''. Older indices must be reindexed or deleted.  Elasticsearch will fail to start if incompatible indices are present.</ref>
|-
|-
| REL1_29 || 5.3+ || REL1_29 || REL1_29 || restart || yes <ref>For more information about upgrading from 2.4 to 5.6, see [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 5.6 Reference.</ref>
|REL 1_31 and 1_32
|''5.5.x or 5.6.x''
|
|
|
|
|-
|-
| REL1_30 || 6.x || REL1_30 || REL1_30 || no <ref>Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html</ref> || Depends <ref>Elasticsearch can read indices created in the '''previous major version'''. Older indices must be reindexed or deleted.  Elasticsearch will fail to start if incompatible indices are present.</ref>
|REL 1_33 and !_34 and 1_35
 
|6.5.x (6.5.4 rec)
|
|
|
|
|}
|}


With version 6.0 of ElasticSearch already released, we should immediately upgrade MediaWiki to REL1_29 which is compatible with ElasticSearch 6.x. In the table above es6.x is listed as compatible with mwREL1_30, but in reality we can use es6.x starting in mwREL1_29.


Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older setups).  Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty.
Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older setups).  Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty.


== Reindexing ==
==Reindexing==


The most basic form of the reindex API just copies documents from one index into another.  You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field.  Usually though, you are reindexing because you are forced to during a major version upgrade.
The most basic form of the reindex API just copies documents from one index into another.  You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field.  Usually though, you are reindexing because you are forced to during a major version upgrade.
Line 209: Line 227:




== Monitoring ==
==Monitoring==
With the upgrade to Elasticsearch 5.x and 6.x, plugins are deprecated. It's suggested to use Kibana as a monitoring and management interface to Elasticsearch.
With the upgrade to Elasticsearch 5.x and 6.x, plugins are deprecated. It's suggested to use [[Kibana]] as a monitoring and management interface to Elasticsearch.
<img src="http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" />
<img src="http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" />


Line 220: Line 238:
</html>
</html>


* Monitoring the Elastic Stack https://www.elastic.co/guide/en/elastic-stack-overview/6.4/xpack-monitoring.html
*Monitoring the Elastic Stack https://www.elastic.co/guide/en/elastic-stack-overview/6.4/xpack-monitoring.html
* Monitoring Settings https://www.elastic.co/guide/en/elasticsearch/reference/6.4/monitoring-settings.html
*Monitoring Settings https://www.elastic.co/guide/en/elasticsearch/reference/6.4/monitoring-settings.html
* X-Pack monitoring https://www.elastic.co/guide/en/kibana/current/xpack-monitoring.html
*X-Pack monitoring https://www.elastic.co/guide/en/kibana/current/xpack-monitoring.html
* Configuring Monitoring in Kibana https://www.elastic.co/guide/en/kibana/6.4/monitoring-xpack-kibana.html
*Configuring Monitoring in Kibana https://www.elastic.co/guide/en/kibana/6.4/monitoring-xpack-kibana.html
* Monitoring with [https://github.com/lmenezes/cerebro Cerebro]
*Monitoring with [https://github.com/lmenezes/cerebro Cerebro]
* You can still use [https://mobz.github.io/elasticsearch-head/ elasticsearch-head] if you use an SSH reverse tunnel to access the remote server.
*You can still use [https://mobz.github.io/elasticsearch-head/ elasticsearch-head] if you use an SSH reverse tunnel to access the remote server.




 
==Installation==
== Installation ==
Here's a quick example of how we got all the parts installed on an [[Ubuntu]] server.
Here's a quick example of how we got all the parts installed on an [[Ubuntu]] server.
<source lang="bash">
<source lang="bash">
Line 304: Line 321:
</source>
</source>


== Resources ==
==Resources==
* See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README
* https://wikitech.wikimedia.org/wiki/Search


== Problems ==
*See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).
*https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php
*https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README
*https://wikitech.wikimedia.org/wiki/Search
 
==Problems==
We recently ran a 'rebuild-all' script to update Elasticsearch indexes
We recently ran a 'rebuild-all' script to update Elasticsearch indexes
<pre>
<pre>
Line 338: Line 356:
===SOLVED===
===SOLVED===
You can delete the unwanted index like this with [[curl]]:
You can delete the unwanted index like this with [[curl]]:
<source lang="bash>
<source lang="bash">
curl -XDELETE "http://localhost:9200/wiki_cod_content"
curl -XDELETE "http://localhost:9200/wiki_cod_content"
</source>
</source>

Latest revision as of 22:07, 17 December 2024

Elasticsearch Dialog-information.svg
Search
Image shows: Search
Summary
Description: This site uses Elasticsearch for an amazing search experience!
More
Notes: The What, Why and Advantages of Elasticsearch
Test: Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g.Search for 'ssh-agent' in the File namespace



This site uses Elasticsearch for it's search functionality under the hood.


About[edit | edit source]

Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video [1] but you have to submit an email to view it. (joe@example.com probably works)

Community[edit | edit source]

There is a Discourse forum at https://discuss.elastic.co/

Features[edit | edit source]

See mw:Help:CirrusSearch for help on how to best use the search functionality (including regex searches).

Search Tips[edit | edit source]

This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the [Special:Search search results for 'Ansible']. Just prefix your search term with the ~ character. Now when you press enter, you'll go to the search results page with a full listing of results.

Use the morelike: special prefix morelike:Elasticsearch


Use the cirrusdump action to see the document as ElasticSearch sees it [1]. This is especially useful to test whether (new) documents are being indexed. Additional MediaWiki API methods like cirrus-config-dump are listed at mw:Extension:CirrusSearch

Different indexes are created for the entire contents of the wiki. Each index is weighted differently. So, for example, "Lead-in" text is the wikitext between the top of the page and the first heading. Words found here are deemed more relevant to a users search query than the same word if found in the body text of an article. So, in this wiki, searching for the word "YAML" puts the Ansible article ahead of the Eclipse article in search results.

Content as well as all files uploaded into the system are indexed. For example, a search for "FAI" lists both the Cloning article as well as the PDF file And the file is not listed only because of the file name, but also because of the (indexed) file content. A search for "Ed Roman" will bring up the Enterprise Java Beans Design Patterns PDF file (see p. 13 where Ed Roman is mentioned.)

Elasticsearch performs poorly when JVM starts swapping: you should ensure that it never swaps.

Set this property to true to lock the memory: bootstrap.mlockall: true (in /etc/elasticsearch/elasticsearch.yml)

Make sure that the ES_MIN_MEM and ES_MAX_MEM environment variables are set to the same value, and that the machine has enough memory to allocate for Elasticsearch, leaving enough memory for the operating system itself.

You should also make sure that the Elasticsearch process is allowed to lock the memory, eg. by using ulimit -l unlimited.

Video[edit | edit source]

  • Building Elasticsearch: From Idea to {code} to Adoption The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. Shay Banon, creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and it's rapid adoption by users worldwide. -- RISE | August 2015

Troubleshooting[edit | edit source]

Make sure that you use the official packages from Elasticsearch, and NOT the Ubuntu packages. See below for the installation guide. Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04). Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): /usr/share/elasticsearch/bin/elasticsearch

Usage: /usr/share/elasticsearch/bin/elasticsearch [-vdh] [-p pidfile] [-D prop] [-X prop]
Start elasticsearch.
    -d            daemonize (run in background)
    -p pidfile    write PID to <pidfile>
    -h
    --help        print command line options
    -v            print elasticsearch version, then exit
    -D prop       set JAVA system property
    -X prop       set non-standard JAVA system property
   --prop=val
   --prop val     set elasticsearch property (i.e. -Des.<prop>=<val>)

For version 1.7 on Ubuntu 16.04, although the system uses SystemD, there is still a SysV init script that controls elasticsearch. I think this means that you can't get the logging info into the journalctl system... See /etc/init.d/elasticsearch According to the install guide, you should be able to edit the elasticsearch.service file, and take out the --quiet option to make it log to the journal. When that is enabled, you can do journalctl --unit elasticsearch to quickly see the info being logged.

Production Configuration[edit | edit source]

ElasticSearch looks for a configuration file to include, and uses a search path for that include. You can specify it on the command-line; through an environment variable; or just make sure that your file is found in the search path. (My default was found at /usr/share/elasticsearch/bin/elasticsearch.in.sh)

# If an include wasn't specified in the environment, then search for one...
if [ "x$ES_INCLUDE" = "x" ]; then
    # Locations (in order) to use when searching for an include file.
    for include in /usr/share/elasticsearch/elasticsearch.in.sh \
                   /usr/local/share/elasticsearch/elasticsearch.in.sh \
                   /opt/elasticsearch/elasticsearch.in.sh \
                   ~/.elasticsearch.in.sh \
                   $ES_HOME/bin/elasticsearch.in.sh \
                   "`dirname "$0"`"/elasticsearch.in.sh; do
        if [ -r "$include" ]; then
            . "$include"
            break
        fi
    done
# ...otherwise, source the specified include.
elif [ -r "$ES_INCLUDE" ]; then
    . "$ES_INCLUDE"
fi


Elasticsearch for MediaWiki[edit | edit source]

To improve the out-of-the-box search experience with MediaWiki, you should install the mw:Extension:CirrusSearch. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the Elasticsearch system (you can use yum or apt repositories for that).

Wikitech gives some information about how WMF uses Elasticsearch at https://wikitech.wikimedia.org/wiki/Search

This system has three components: Elastica, CirrusSearch, and Elasticsearch.

Elastica
Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the Elastica library. It has no configuration.
CirrusSearch
CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
Elasticsearch
is a Java application, so you need Java installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup.

Elasticsearch for QualityBox[edit | edit source]

# disallow PUT and DELETE methods through the web 
# administrators will need to use local curl commands to bypass the load-balancer 
# in the event that you want to delete indexes etc. 
frontend elastic 
        bind *:9201 
        mode http 
        acl is_delete method DELETE 
        http-request deny if is_delete 
        acl is_put method PUT 
        http-request deny if is_put 
        default_backend elastic 
 
backend elastic 
        mode http 
        option forwardfor 
        balance source 
        option httpclose 
       server es1 127.0.0.1:9200 weight 1 check inter 1000 rise 5 fall 1

You can add multi-wiki search like so using $wgCirrusSearchEnableCrossProjectSearch, $wgCirrusSearchWikiToNameMap and $wgCirrusSearchInterwikiSources:



if ( $wikiId !== 'commons' ) {
	$wgCirrusSearchEnableCrossProjectSearch = true;
	$wgCirrusSearchWikiToNameMap = [
		'commons' => 'wiki_commons',
	];
	$wgCirrusSearchInterwikiSources = [
		'commons' => 'wiki_commons_content_first',
	];
}

Where is my Elasticsearch?[edit | edit source]

Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this:

curl -XGET "http://localhost:9200/_nodes/settings?pretty=true"

Other direct commands

curl 'localhost:9200/_tasks?pretty'
curl 'localhost:9200/_cat/nodes?pretty'
curl 'localhost:9200/_nodes?pretty'
curl 'localhost:9200/_nodes/settings?pretty=true'
curl 'localhost:9200/_cat/health?pretty'
curl 'localhost:9200/_cluster/health?pretty=true'
curl 'localhost:9200/_cluster/state?pretty'

curl 'localhost:9200/_cat/indices?v'

The configuration for Elasticsearch is normally held in two files: /etc/elasticsearch/elasticsearch.yml and /etc/elasticsearch/logging.yml

Starting / Stopping[edit | edit source]

Elasticsearch is (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD

Upgrading[edit | edit source]

QualityBox 34 runs MediaWiki 1.34.x and Elasticsearch 6.x

The best way to upgrade to QB 34 from QB 32.x (ES 5.x) is to blow away the index and start over.

  • DO NOT follow the upgrade instructions. Migration Assistant does not work for 5.6 in testing.
  • DO NOT prepare a 32.x host by installing X-Pack and preparing the indexes using the Migration Assistant. It does not work!. And you'll avoid the hassle of registering for a (free) limited license. Plus the hassle of installing and upgrading Kibana just to get the Migration Assistant.
  • Breaking changes in 6.x are taken care of by the Elastica Extension
Version Dependencies
MediaWiki ElasticSearch Elastica CirrusSearch Cluster Restart? Reindex?
REL1_27 1.x REL1_27 REL1_27 n/a n/a
REL1_28 2.x REL1_28 REL1_28 restart yes [2]
REL1_29 5.3.x or 5.4.x REL1_29 REL1_29 restart yes [3]
REL1_30 5.3.x or 5.4.x REL1_30 REL1_30 no [4] Depends [5]
REL 1_31 and 1_32 5.5.x or 5.6.x
REL 1_33 and !_34 and 1_35 6.5.x (6.5.4 rec)


Elastic Co. provides an ansible role to manage your installation (including a 2.x branch for older setups). Their guide to upgrading covers the nitty gritty.

Reindexing[edit | edit source]

The most basic form of the reindex API just copies documents from one index into another. You might reindex to change the name of a field. Usually though, you are reindexing because you are forced to during a major version upgrade.

To assist in the upgrade process there is a plugin that assists with the tasks.

Also, you can reindex from a remote (cluster) so that you can upgrade without downtime because once the new cluster is ready, you can just switch to it with minimal disruption. [6]

If you are re-indexing your existing Meza installation, you can sudo meza maint rebuild monolith --tags search-index


Monitoring[edit | edit source]

With the upgrade to Elasticsearch 5.x and 6.x, plugins are deprecated. It's suggested to use Kibana as a monitoring and management interface to Elasticsearch. <img src="http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" />

http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" type="image/svg+xml



Installation[edit | edit source]

Here's a quick example of how we got all the parts installed on an Ubuntu server.

# is the curl extension to PHP installed?
php -i |grep -C2 curl
# no curl?
sudo apt-get install php5-curl
pushd extensions
java -version
# no java
sudo apt-get install default-jre
# need the jdk
sudo apt-get install default-jdk
# add JAVA_HOME to /etc/environment
sudo update-alternatives --config java
echo 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java' |sudo tee -a /etc/environment
source /etc/environment
echo $JAVA_HOME
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list
#### don't do this because 2.1.1 is too new
#### sudo apt-get update && sudo apt-get install elasticsearch
#### get the 1.7.x version and install that
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.7.4.deb
sudo dpkg -i elasticsearch-1.7.4.deb
echo PATH=$PATH:/usr/share/elasticsearch/bin/ | sudo tee -a /etc/environment
source /etc/environment
which elasticsearch
sudo service elasticsearch start
# check with curl (see below)
# using SysV init
sudo update-rc.d elasticsearch defaults 95 10
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CirrusSearch.git
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Elastica.git
cd Elastica
composer install
# load Special:Version to check
sudo -u www-data php ./w/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
sudo -u www-data php  /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
sudo -u www-data php  /var/www/freephile.com/www/w/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse

Checking if elasticsearch is running

curl http://localhost:9200/
{
  "name" : "Carmella Unuscione",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.1.1",
    "build_hash" : "40e2c53a6b6c2972b3d13846e450e66f4375bd71",
    "build_timestamp" : "2015-12-15T13:05:55Z",
    "build_snapshot" : false,
    "lucene_version" : "5.3.1"
  },
  "tagline" : "You Know, for Search"
}
// second time around with the older version installed
{
  "status" : 200,
  "name" : "Richard Rider",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.7.4",
    "build_hash" : "0d3159b9fc8bc8e367c5c40c09c2a57c0032b32e",
    "build_timestamp" : "2015-12-15T11:25:18Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Resources[edit | edit source]

Problems[edit | edit source]

We recently ran a 'rebuild-all' script to update Elasticsearch indexes

[centos@ip-10-0-50-189 .deploy-meza]$ time sudo ./elastic-rebuild-all.sh demo
Rebuilding index for demo
  Output log:
    /opt/data-meza/logs/search-index/demo.2017-09-07_13:15:01.log
elastic-build-index completed for "demo" at 2017-09-07_13:15:03

real    0m1.653s
user    0m1.327s
sys     0m0.199s
[centos@ip-10-0-50-189 .deploy-meza]$ tail /opt/data-meza/logs/search-index/demo.2017-09-07_13:15:01.log
        Inferring index identifier...error
Looks like the index has more than one identifier. 

You should delete all but the one of them currently active. Here is the list: wiki_demo_content,wiki_demo_content_first


Notice: Undefined index: HTTP_HOST in /opt/htdocs/mediawiki/LocalSettings.php on line 219
[          wiki_demo] index(es) do not exist. Did you forget to run updateSearchIndexConfig?

Notice: Undefined index: HTTP_HOST in /opt/htdocs/mediawiki/LocalSettings.php on line 219
[          wiki_demo] index(es) do not exist. Did you forget to run updateSearchIndexConfig?
******* Elastic Search build index complete! *******

Emphasis added

SOLVED[edit | edit source]

You can delete the unwanted index like this with curl:

curl -XDELETE "http://localhost:9200/wiki_cod_content"

See more about deleting wikis and all indexes at https://github.com/freephile/meza/blob/6658c795a4b5e5b1a5afcb05c62cf0bcc2d0203b/src/scripts/delete.wikis.sh

References[edit source]

  1. https://www.elastic.co/webinars/getting-started-elasticsearch
  2. For more information about upgrading from 1.x to 2.4, see Upgrading Elasticsearch in the Elasticsearch 2.4 Reference.
  3. For more information about upgrading from 2.4 to 5.6, see Upgrading Elasticsearch in the Elasticsearch 5.6 Reference.
  4. Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html
  5. Elasticsearch can read indices created in the previous major version. Older indices must be reindexed or deleted. Elasticsearch will fail to start if incompatible indices are present.
  6. https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html