Changes

Jump to navigation Jump to search
13,520 bytes added ,  12:16, 18 May 2020
add notes
This site uses Elasticsearch for it's search functionality under the hood.
{{Feature
|explainsimage= Search|description= This site uses Elasticsearch for the best possible search experience [[File:System-searchElasticsearch_logo.svg|linkimgdesc=Search|thumb|64px]]|notes=|tests=|examplestitle=
}}
{{#set:feature description = This site uses Elasticsearch for an amazing search experience! }}
{{#set:feature notes = [https://medium.com/@AIMDekTech/what-is-elasticsearch-why-elasticsearch-advantages-of-elasticsearch-47b81b549f4d The What, Why and Advantages of Elasticsearch] }}
{{#set:feature tests = Search for something in "files" which indicates that PDF index results are returned instead of just articles. E.g.[https://wiki.freephile.org/wiki/index.php?title=Special:Search&profile=advanced&profile=advanced&fulltext=Search&search=ssh-agent&ns6=1 Search for 'ssh-agent' in the File namespace] }}
{{#set:feature examples = }}
This site uses Elasticsearch for it's search functionality under the hood.
 
== About ==
Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about I'd like to recommend the intro video <ref>https://www.elastic.co/webinars/getting-started-elasticsearch</ref> but you have to submit an email to view it. (joe@example.com probably works) == Community ==There is a Discourse forum at https://discuss.elastic.co/ == Features include==See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches). === Search Tips ===This wiki supports Elasticsearch features. So, for example, let's say you want to search for 'Ansible' in the wiki, but you know that there is an Ansible page, and you don't want to be taken directly to that page. Instead, you prefer to actually see all the [Special:Search search results for 'Ansible']. Just prefix your search term with the <code>~</code> character. Now when you press enter, you'll go to the search results page with a full listing of results.
* Distributed and Highly Available Search EngineUse the <code>morelike:</code> special prefix [https://wiki.freephile.org/wiki/api.php?action=query&list=search&srsearch=morelike:Elasticsearch morelike:Elasticsearch]** Each index is fully sharded with a configurable number of shards.** Each shard can have one or more replicas.** Read Use the <code>cirrusdump</ Search operations performed on either one of code> action to see the replica sharddocument as ElasticSearch sees it [{{fullurl:{{PAGENAMEE}}|action=cirrusdump}}].This is especially useful to test whether (new) documents are being indexed. Additional MediaWiki API methods like <code>cirrus-config-dump</code> are listed at [[mw:Extension:CirrusSearch#API]]* Multi Tenant with Multi Types.** Support {{Messagebox| type = success| text = Different indexes are created for more than one the entire contents of the wiki. Each indexis weighted differently.** Support So, for example, "Lead-in" text is the wikitext between the top of the page and the first heading. Words found here are deemed more relevant to a users search query than one type per indexthe same word if found in the body text of an article.** Index level configuration (number of shards So, index storagein this wiki, ...).* Various set [{{fullurl:Special:Search|search=yaml|fulltext=Search}} searching for the word "YAML"] puts the [[Ansible]] article ahead of APIs** HTTP RESTful API** Native Java APIthe [[Eclipse]] article in search results.** All APIs perform automatic node operation rerouting.}}* Document oriented** No need for upfront schema definition.{{Messagebox** Schema can be defined per | type for customization of = success| text = Content as well as all files uploaded into the indexing processsystem are indexed.* ReliableFor example, Asynchronous Write Behind [{{fullurl:Special:Search|search=fai|fulltext=Search|profile=all}} a search for long term persistency"FAI"] lists both the [[Cloning]] article as well as the [[:File:Fai poster a4.* pdf|PDF file]] And the file is not listed only because of the file name, but also because of the (Nearindexed) Real Time file content. [{{fullurl:Special:Search|search=ed%20roman|fulltext=Search|profile=all}} A search for "Ed Roman"] will bring up the Enterprise Java Beans Design Patterns PDF file ([{{fullurl:File:Ejbdesignpatterns.pdf|page=13}} see p. 13 where Ed Roman is mentioned].)}} {{Messagebox| type = warning| text = * Built on top of Lucene Elasticsearch performs poorly when JVM starts swapping: you should ensure that it ''never'' swaps.** Each shard is a fully functional Lucene index** All Set this property to true to lock the power of Lucene easily exposed through simple configuration memory: <code>bootstrap.mlockall: true</code> (in /etc/elasticsearch/ pluginselasticsearch.yml)* Per operation consistency** Single document level operations Make sure that the <code>ES_MIN_MEM</code> and <code>ES_MAX_MEM</code> environment variables are atomicset to the same value, consistentand that the machine has enough memory to allocate for Elasticsearch, isolated and durableleaving enough memory for the operating system itself.* Open Source under You should also make sure that the Apache LicenseElasticsearch process is allowed to lock the memory, version 2 ("ALv2")eg. by using <code>ulimit -l unlimited</code>.}}
== Video ==
* [https://vimeo.com/136326424 Building Elasticsearch: From Idea to {code} to Adoption] The back side of a napkin, a pen, and a few beverages are often the ingredients that yield good ideas. Elasticsearch had a different origin. It started with a need for a simple search box for a collection of recipes. '''Shay Banon''', creator of Elasticsearch and CTO at Elastic, shares the history behind pushing the code for his first open source project that led to the creation of Elasticsearch and it�s rapid adoption by users worldwide. -- ''RISE | August 2015''* == Troubleshooting ==Make sure that you use the official packages from Elasticsearch, and NOT the Ubuntu packages. See below for the installation guide. Note that I had to actually un-comment and specify the bind.host as 0.0.0.0 on an older setup (Version: 1.7.3, Build: NA/NA, JVM: 1.8.0_171 on Ubuntu 16.04). Plus, make sure that your firewall is allowing the ports 9200-9400. You can run the startup shell script directly to see what's wrong if there's no log output (and read the source for options): <code>/usr/share/elasticsearch/bin/elasticsearch</code> <pre>Usage: /usr/share/elasticsearch/bin/elasticsearch [-vdh] [-p pidfile] [-D prop] [-X prop]Start elasticsearch. -d daemonize (run in background) -p pidfile write PID to <pidfile> -h --help print command line options -v print elasticsearch version, then exit -D prop set JAVA system property -X prop set non-standard JAVA system property --prop=val --prop val set elasticsearch property (i.e. -Des.<prop>=<val>)</pre> For version 1.7 on Ubuntu 16.04, although the system uses SystemD, there is still a SysV init script that controls elasticsearch. I think this means that you can't get the logging info into the journalctl system... See /etc/init.d/elasticsearch According to the [https://www.elastic.co/aboutguide/en/elasticsearch/reference/current/deb.html install guide], you should be able to edit the elasticsearch.service file, and take out the --quiet option to make it log to the journal. When that is enabled, you can do <code>journalctl --unit elasticsearch</code> to quickly see the info being logged. == Production Configuration ==ElasticSearch looks for a configuration file to include, and uses a search path for that include. You can specify it on the command-line; through an environment variable; or just make sure that your file is found in the search path. (My default was found at /usr/share/elasticsearch/bin/elasticsearch.in.sh)<source lang="bash"># If an include wasn't specified in the environment, then search for one...if [ "x$ES_INCLUDE" = "x" ]; then # Locations (in order) to use when searching for an include file. for include in /usr/share/elasticsearch/elasticsearch.in.sh \ /usr/local/share/elasticsearch/elasticsearch.in.sh \ /opt/elasticsearch/elasticsearch.in.sh \ ~/.elasticsearch.in.sh \ $ES_HOME/bin/elasticsearch.in.sh \ "`dirname "$0"`"/elasticsearch.in.sh; do if [ -r "$include" ]; then . "$include" break fi done# ...otherwise, source the specified include.elif [ -r "$ES_INCLUDE" ]; then . "$ES_INCLUDE"fi</source> 
== Elasticsearch for MediaWiki ==
To improve the out-of-the-box search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]]. CirrusSearch is just a connector to the Elasticsearch engine. Thus, to use CirrusSearch, first install the [[Elasticsearch]] system (you can use [<code>yum</code> or <code>apt</code> repositories for that).  Wikitech gives some information about how WMF uses Elasticsearch at https://wwwwikitech.elasticwikimedia.co/guide/en/elasticsearchorg/referencewiki/current/setup-repositories.html the repositories for that]). Search
This system has three components: Elastica, CirrusSearch, and Elasticsearch.
; Elastica : Elastica is a MediaWiki extension that provides the library to interface with Elasticsearch. It wraps the [https://github.com/ruflin/Elastica Elastica] library. It has no configuration.
; CirrusSearch : CirrusSearch is a MediaWiki extension that provides search support backed by Elasticsearch.
; Elasticsearch : is a Java application, so you need [[Java]] installed as well. As all these pieces continue to be developed and released, you must be sure to take heed of the requirements for matching the right versions together to compose your full setup. == Elasticsearch for QualityBox == <source lang="haproxy"># disallow PUT and DELETE methods through the web # administrators will need to use local curl commands to bypass the load-balancer # in the event that you want to delete indexes etc. frontend elastic bind *:9201 mode http acl is_delete method DELETE http-request deny if is_delete acl is_put method PUT http-request deny if is_put default_backend elastic backend elastic mode http option forwardfor balance source option httpclose server es1 127.0.0.1:9200 weight 1 check inter 1000 rise 5 fall 1 </source> You can add multi-wiki search like so using [https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/master/CirrusSearch.php $wgCirrusSearchEnableCrossProjectSearch], At the time of this writing$wgCirrusSearchWikiToNameMap and $wgCirrusSearchInterwikiSources:    <source lang="php">if ( $wikiId !== 'commons' ) { $wgCirrusSearchEnableCrossProjectSearch = true; $wgCirrusSearchWikiToNameMap = [ 'commons' => 'wiki_commons', ]; $wgCirrusSearchInterwikiSources = [ 'commons' => 'wiki_commons_content_first', there ];}  </source> == Where is a version mismatchmy Elasticsearch? ==Maybe you installed elasticsearch, but have no idea where it resides on your system. Try this:<source lang="bash">curl -XGET "http://localhost:9200/_nodes/settings?pretty=true"</source>Other direct commands<source lang="bash">curl 'localhost:9200/_tasks?pretty'curl 'localhost:9200/_cat/nodes?pretty'curl 'localhost:9200/_nodes?pretty'curl 'localhost:9200/_nodes/settings?pretty=true'curl 'localhost:9200/_cat/health?pretty'curl 'localhost:9200/_cluster/health?pretty=true'curl 'localhost:9200/_cluster/state?pretty' curl 'localhost:9200/_cat/indices?v'</source> The configuration for Elasticsearch is normally held in two files: <code>/etc/elasticsearch/elasticsearch.yml</code> and <code>/etc/elasticsearch/logging.yml</code> == Starting / Stopping ==Elasticsearch is at version (usually) run as a service, so you can start and stop it the way you would depending on whether you run SysV init or SystemD == Upgrading ==Old versions of Meza run the legacy REL1_27 release of MediaWiki, while the "beta" Meza runs REL1_28. Our goal should be to run "stable" REL1_29 asap.  {| class="wikitable"|+ Version Dependencies|-! MediaWiki !! ElasticSearch !! Elastica !! CirrusSearch !! Cluster Restart? !! Reindex?|-| REL1_27 || 1.x || REL1_27 || REL1_27 || n/a || n/a|-| REL1_28 || 2.x || REL1_28 || REL1_28 || restart || yes <ref>For more information about upgrading from 1.1 x to 2.4, see [https://www.elastic.co/guide/en/elasticsearch/reference/2.4/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 2.4 Reference.</ref>|-| REL1_29 || 5.3+ || REL1_29 || REL1_29 || restart || yes <ref>For more information about upgrading from 2.4 to 5.6, see [https://www.elastic.co/guide/en/elasticsearch/reference/5.6/setup-upgrade.html Upgrading Elasticsearch] in the Elasticsearch 5.6 Reference.</ref>|-| REL1_30 || 6.x || REL1_30 || REL1_30 || no <ref>Elasticsearch 6.x support rolling upgrades from Elasticsearch 5.6 Upgrading from earlier versions requires a full cluster restart. See https://www.elastic.co/guide/en/elasticsearch/reference/6.0/restart-upgrade.html</ref> || Depends <ref>Elasticsearch can read indices created in the repositories'''previous major version'''. Older indices must be reindexed or deleted. Elasticsearch will fail to start if incompatible indices are present.</ref> |} With version 6.0 of ElasticSearch already released, but CirrusSearch we should immediately upgrade MediaWiki to REL1_29 which is only compatible with ElasticSearch 6.x. In the table above es6.x is listed as compatible with mwREL1_30, but in reality we can use es6.x starting in mwREL1_29. Elastic Co. provides an [https://github.com/elastic/ansible-elasticsearch/tree/master ansible role to manage your installation] (including a [https://github.com/elastic/ansible-elasticsearch/tree/2.x 2.x branch] for older 1setups). Their [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/setup-upgrade.html guide to upgrading] covers the nitty gritty. == Reindexing == The most basic form of the reindex API just copies documents from one index into another. You might [https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-reindex.html reindex] to change the name of a field.7 Usually though, you are reindexing because you are forced to during a major versionupgrade. To assist in the upgrade process there is a plugin that assists with the tasks. Also, you can reindex from a remote (cluster) so that you can upgrade without downtime because once the new cluster is ready, you can just switch to it with minimal disruption.<ref>https://www.elastic.co/guide/en/elasticsearch/reference/6.0/reindex-upgrade.html</ref> If you are re-indexing your existing Meza installation, you can <code> sudo meza maint rebuild monolith --tags search-index</code>  == Monitoring ==With the upgrade to Elasticsearch 5.x and 6.x, plugins are deprecated. It's suggested to use Kibana as a monitoring and management interface to Elasticsearch.<img src="http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" /> http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" type="image/svg+xml  <html><embed src="http://meta.qualitybox.us:20000/api/v1/badge.svg?chart=elasticsearch_local.cluster_health_status&alarm=elasticsearch_last_collected&refresh=auto" type="image/svg+xml" height="20"/></html> * Monitoring the Elastic Stack https://www.elastic.co/guide/en/elastic-stack-overview/6.4/xpack-monitoring.html* Monitoring Settings https://www.elastic.co/guide/en/elasticsearch/reference/6.4/monitoring-settings.html* X-Pack monitoring https://www.elastic.co/guide/en/kibana/current/xpack-monitoring.html* Configuring Monitoring in Kibana https://www.elastic.co/guide/en/kibana/6.4/monitoring-xpack-kibana.html* Monitoring with [https://github.com/lmenezes/cerebro Cerebro]* You can still use [https://mobz.github.io/elasticsearch-head/ elasticsearch-head] if you use an SSH reverse tunnel to access the remote server.  
== Installation ==
Checking if elasticsearch is running
<source lang="bash">
curl -X GET http://localhost:9200/
</source>
<source lang="javascript">
== Resources ==
* See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php
* https://gitphabricator.wikimedia.org/blobdiffusion/mediawiki%2Fextensions%2FCirrusSearch.gitECIR/browse/HEADmaster/README
* https://wikitech.wikimedia.org/wiki/Search
 
== Problems ==
We recently ran a 'rebuild-all' script to update Elasticsearch indexes
<pre>
[centos@ip-10-0-50-189 .deploy-meza]$ time sudo ./elastic-rebuild-all.sh demo
Rebuilding index for demo
Output log:
/opt/data-meza/logs/search-index/demo.2017-09-07_13:15:01.log
elastic-build-index completed for "demo" at 2017-09-07_13:15:03
 
real 0m1.653s
user 0m1.327s
sys 0m0.199s
[centos@ip-10-0-50-189 .deploy-meza]$ tail /opt/data-meza/logs/search-index/demo.2017-09-07_13:15:01.log
Inferring index identifier...error
Looks like the index has more than one identifier. </pre><span style="background-color:#c1f749;">You should delete all
but the one of them currently active. Here is the list: wiki_demo_content,wiki_demo_content_first</span><pre>
 
Notice: Undefined index: HTTP_HOST in /opt/htdocs/mediawiki/LocalSettings.php on line 219
[ wiki_demo] index(es) do not exist. Did you forget to run updateSearchIndexConfig?
 
Notice: Undefined index: HTTP_HOST in /opt/htdocs/mediawiki/LocalSettings.php on line 219
[ wiki_demo] index(es) do not exist. Did you forget to run updateSearchIndexConfig?
******* Elastic Search build index complete! *******
</pre>
''Emphasis added''
 
===SOLVED===
You can delete the unwanted index like this with [[curl]]:
<source lang="bash>
curl -XDELETE "http://localhost:9200/wiki_cod_content"
</source>
 
See more about deleting wikis and all indexes at https://github.com/freephile/meza/blob/6658c795a4b5e5b1a5afcb05c62cf0bcc2d0203b/src/scripts/delete.wikis.sh
 
{{References}}
[[Category:Search]]
[[Category:Elasticsearch]]
[[Category:Wiki]]
[[Category:QualityBox]]

Navigation menu