Difference between revisions of "Elasticsearch"

From Freephile Wiki
Jump to navigation Jump to search
(describes a couple of major feature points of Elasticsearch)
Line 10: Line 10:
  
 
== About ==
 
== About ==
Elasticsearch is a distributed RESTful search engine built for the cloud. Features include:
+
Elasticsearch is a distributed RESTful search engine built for the cloud. See https://www.elastic.co/about
  
* Distributed and Highly Available Search Engine.
+
== Features ==
** Each index is fully sharded with a configurable number of shards.
+
See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).
** Each shard can have one or more replicas.
+
 
** Read / Search operations performed on either one of the replica shard.
+
# Different indexes are created for the entire contents of the wiki. Each index is weighted differently. So, for example, "Lead-in" text is the wikitext between the top of the page and the first heading. Words found here are deemed more relevant to a users search query than the same word if found in the body text of an article. So, in this wiki, [{{fullurl:Special:Search|search=yaml|fulltext=Search}} searching for the word "YAML"] puts the [[Ansible]] article ahead of the [[Eclipse]] article in search results.
* Multi Tenant with Multi Types.
+
# Content as well as all files uploaded into the system are indexed. For example, [{{fullurl:Special:Search|search=fai|fulltext=Search|profile=all}} a search for "FAI"] lists both the [[Cloning]] article as well as the [[:File:Fai poster a4.pdf|PDF file]]  And the file is not listed only because of the file name, but also because of the (indexed) file content.  [{{fullurl:Special:Search|search=ed%20roman|fulltext=Search|profile=all}} A search for "Ed Roman"] will bring up the Enterprise Java Beans Design Patterns PDF file ([{{fullurl:File:Ejbdesignpatterns.pdf|page=13}} see p. 13 where Ed Roman is mentioned].)
** Support for more than one index.
 
** Support for more than one type per index.
 
** Index level configuration (number of shards, index storage, ...).
 
* Various set of APIs
 
** HTTP RESTful API
 
** Native Java API.
 
** All APIs perform automatic node operation rerouting.
 
* Document oriented
 
** No need for upfront schema definition.
 
** Schema can be defined per type for customization of the indexing process.
 
* Reliable, Asynchronous Write Behind for long term persistency.
 
* (Near) Real Time Search.
 
* Built on top of Lucene
 
** Each shard is a fully functional Lucene index
 
** All the power of Lucene easily exposed through simple configuration / plugins.
 
* Per operation consistency
 
** Single document level operations are atomic, consistent, isolated and durable.
 
* Open Source under the Apache License, version 2 ("ALv2")
 
  
 
== Video ==
 
== Video ==
  
* https://www.elastic.co/about
 
  
 
== Elasticsearch for MediaWiki ==
 
== Elasticsearch for MediaWiki ==
Line 124: Line 105:
  
 
== Resources ==
 
== Resources ==
* See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).  In a nutshell: the search index is faster, and more complete.
+
* See [[mw:Help:CirrusSearch]] for help on how to best use the search functionality (including regex searches).
 
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php
 
* https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/CirrusSearch.php
 
* https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/HEAD/README
 
* https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FCirrusSearch.git/HEAD/README

Revision as of 00:07, 31 January 2016