Difference between revisions of "Search"
Jump to navigation
Jump to search
(fix image) |
(interim draft on search extension) |
||
(6 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
=== Semantic Web Search === | === Semantic Web Search === | ||
− | [[Image: | + | [[Image:Logo.png|thumb]] |
# http://swoogle.umbc.edu Semantic Web Search | # http://swoogle.umbc.edu Semantic Web Search | ||
=== Searching for Multimedia === | === Searching for Multimedia === | ||
− | + | When searching for unrestricted graphics content, it is hard to beat the huge commons of Wiki commons. Use the search engine on toolserver.org to find the images or other media you're looking for. http://toolserver.org/~tangotango/mayflower/ Any image found there can be used under the terms of the (creative commons) license listed -- meaning it can be used here or on your website. | |
− | + | === Native (Application) Search === | |
+ | Applications such as this wiki (runs on MediaWiki), and CMS systems (e.g. Drupal) obviously know their own content. So, if you are looking for something and want the best results for those applications, you should make use of the direct search facilities in the application. | ||
+ | # [[mw:Search]] helps you learn and understand the search capabilities of this system | ||
+ | # Note that the simplest enhancement you can make to a small-scale installation is to tweak the MySQL stopwords and word-length. However, the built-in search capability of MediaWiki is actually not that great. See the section below on CirrusSearch | ||
− | + | Note that this wiki and the CMS systems also provide an 'OpenSearch' implementation that lets you use your browser's search toolbar to directly search these applications. | |
− | === | + | The MediaWiki system now includes a Ajax 'suggest' feature while you type in the search box. Setting is a UPO or User Preference Option that you control in your user settings. |
− | + | ||
+ | If you're a sysop for a MediaWiki site, you probably want to install the [[mw:Extension:Replace_Text]] so that you can search and replace strings in your content. | ||
+ | |||
+ | ==== Cirrus Search ==== | ||
+ | To really get good search experience with MediaWiki, you should install the [[mw:Extension:CirrusSearch]]. To do so, first install the [[ElasticSearch]] system (you can use [https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-repositories.html the repositories for that]). ElasticSearch is a Java application, so you need Java installed as well. | ||
+ | |||
+ | <source lang="bash"> | ||
+ | # is the curl extension to PHP installed? | ||
+ | php -i |grep -C2 curl | ||
+ | # no curl? | ||
+ | sudo apt-get install php5-curl | ||
+ | pushd extensions | ||
+ | java -version | ||
+ | # no java | ||
+ | sudo apt-get install default-jre | ||
+ | # need the jdk | ||
+ | sudo apt-get install default-jdk | ||
+ | # add JAVA_HOME to /etc/environment | ||
+ | sudo update-alternatives --config java | ||
+ | echo 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java' |sudo tee -a /etc/environment | ||
+ | source /etc/environment | ||
+ | echo $JAVA_HOME | ||
+ | wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - | ||
+ | echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list | ||
+ | sudo apt-get update && sudo apt-get install elasticsearch | ||
+ | echo PATH=$PATH:/usr/share/elasticsearch/bin/ | sudo tee -a /etc/environment | ||
+ | source /etc/environment | ||
+ | which elasticsearch | ||
+ | sudo service elasticsearch start | ||
+ | # check with curl (see below) | ||
+ | # using SysV init | ||
+ | sudo update-rc.d elasticsearch defaults 95 10 | ||
+ | git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/CirrusSearch.git | ||
+ | git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Elastica.git | ||
+ | cd Elastica | ||
+ | composer install | ||
+ | # load Special:Version to check | ||
+ | </source> | ||
− | + | Checking if elasticsearch is running | |
+ | <source lang="bash"> | ||
+ | curl -X GET http://localhost:9200/ | ||
+ | </source> | ||
+ | <source lang="javascript"> | ||
+ | { | ||
+ | "name" : "Carmella Unuscione", | ||
+ | "cluster_name" : "elasticsearch", | ||
+ | "version" : { | ||
+ | "number" : "2.1.1", | ||
+ | "build_hash" : "40e2c53a6b6c2972b3d13846e450e66f4375bd71", | ||
+ | "build_timestamp" : "2015-12-15T13:05:55Z", | ||
+ | "build_snapshot" : false, | ||
+ | "lucene_version" : "5.3.1" | ||
+ | }, | ||
+ | "tagline" : "You Know, for Search" | ||
+ | } | ||
+ | </source> | ||
− | |||
== General == | == General == | ||
− | |||
Google offers a service called the [http://www.google.com/coop/cse/ Google Custom Search Engine]. The Google CSE is much like the 'normal' Google, but is configured to include only domains that you want. Additionally, the domains can be grouped into 'realms' that can be used to assist the user to find content according to functional area. | Google offers a service called the [http://www.google.com/coop/cse/ Google Custom Search Engine]. The Google CSE is much like the 'normal' Google, but is configured to include only domains that you want. Additionally, the domains can be grouped into 'realms' that can be used to assist the user to find content according to functional area. | ||
Line 35: | Line 90: | ||
# The index will not allow custom data formats or indexes that you create... it's Google's algorithms for better or for worse. | # The index will not allow custom data formats or indexes that you create... it's Google's algorithms for better or for worse. | ||
− | To meet these needs, use a product like [[mnoGoSearch]] | + | To meet these needs, use a product like [[mnoGoSearch]] [[mw:Apache_Solr]] or [[mw:Nutch]] which you are free to install and configure to suit your requirements. |
− | See [[wp:Category: | + | See [[wp:Category:Internet_search_engines]] for a list of search engine solutions. |
== Editors == | == Editors == | ||
Line 46: | Line 101: | ||
=== Search your code. Can you 'grok' it? === | === Search your code. Can you 'grok' it? === | ||
[[File:Opengrok-analysis.png|right]] | [[File:Opengrok-analysis.png|right]] | ||
− | LXR The [http://lxr.linux.no/ Linux Cross Reference] is probably the first widely used web-based code cross-reference tool. Along came [http://opengrok.github.io/OpenGrok/ OpenGrok] which started out as a project at Sun (which was bought by Oracle) and now the project lives on its own in the open. OpenGrok is '''lightening fast''' and is actively maintained as an open source project on GitHub. By the way, the underlying search is powered by SOLR. Meanwhile, [http://kohsuke.org/ Kohsuke Kawaguchi] the magic man behind Jenkins ( | + | LXR The [http://lxr.linux.no/ Linux Cross Reference] is probably the first widely used web-based code cross-reference tool. Along came [http://opengrok.github.io/OpenGrok/ OpenGrok] which started out as a project at Sun (which was bought by Oracle) and now the project lives on its own in the open. OpenGrok is '''lightening fast''' and is actively maintained as an open source project on GitHub. By the way, the underlying search is powered by SOLR. Meanwhile, [http://kohsuke.org/ Kohsuke Kawaguchi] the magic man behind Jenkins (Hudson), also wrote [http://sorcerer.jenkins-ci.org/ Sorceror] which understands semantics in Java. Sadly, Sorceror code hasn't been touched in 4 years and doesn't seem to be an active project - but for Java codebases, it's probably still a good option. |
=== Browser extensions / Web Apps === | === Browser extensions / Web Apps === |