Open main menu

Changes

7,930 bytes added ,  00:26, 1 June 2018
replace bad characters with simple apostrophe
== Introduction ==
In this paper we will look at the methods and architecture for serving a MediaWiki instance that is performant, scalable, and redundant. We'll also touch on the related operations that may be used to make a complete picture of the organization IT infrastructure. On the implementation side, we will be using the traditional GNU/Linux Free Software components: Linux, Apache, MySQL, PHP (LAMP), Squid/Varnish, LVS, Memcached, Nginx, etc. You won't need this information to run a single small wiki. But you will need this information if you aspire to provide a large-scale, performant, enterprise wiki.<ref>Mark Bergsma presented 'Wikimedia architecture' in 2008. http://www.haute-disponibilite.net/wp-content/uploads/2008/06/wikimedia-architecture.pdf Although the material is now dated, it provides a clear example of running an architecture that supports 3Gbits/s of data traffic and 30,000 HTTP requests/s on 350 commodity servers managed by 6 people.</ref>  One goal of this paper is to update the information at [[mw:Manual:MediaWiki architecture]].
{|align="center" style="max-width:400px;"| [[File:DevOpsInfrastructure./ObjectReplacements/Object 1|639x324pxsvg]]
|-
| Your infrastructure is dependent on Development Operations, Quality Assurance, Release Engineering and Product Deployment; plus Configuration Management, Monitoring and Control
|}
 
 
== Overview ==
Caching everywhere. Most application data is cached in Memcached, a distributed object cache.
 {{#drawio|align="right" style="max-width:MediawikiArchitecture400px;"| '''MediaWiki Architecture'''|-|type [[file:MediaWiki_Architecture.svg|max-widththumb|right|333px|Click for large view. Edit this diagram at [https://www.draw.io/?lightbox=1&highlight=0000ff&edit=_blank&layers=1&nav=1&title=713px}MediaWiki%20Architecture#R7R1bc5u4%2Btd4pntmkkHceUzcZJuzzW5O3Wn3PJ0hNrHZOuAF0jb764%2FAiKBPEggjMG7Th8aWhRCfvvuNmTF%2F%2FP5r4u82t%2FEq2M50bfV9Zryd6TpCho3%2F5CPP%2BxHbMfcD6yRclZNeBhbhP0E5qJWjT%2BEqSKmJWRxvs3BHDy7jKAqWGTXmJ0n8jZ72EG%2Fpu%2B78dcAMLJb%2Blh39HK6yzX7UtbSX8XdBuN6QOyOt%2FOXRJ5PLgXTjr%2BJvtSHjambMkzjO9p8ev8%2BDbQ48Apf9ddeCX6uNJUGUyVxglNvInsmzBSv8qOXXOMk28TqO%2FO3Vy%2BhlEj9FqyBfQMPfNtnjFn9E%2BONfQZY9l6flP2UxHnpZ4X0c78p5eG%2FJ85%2F59ecW%2Bfrfcrnge5j9WftcjJ87%2Bbz9VvP9CR%2B2HErjp2RZztJL%2FPCTdVDOsitAYwwN4scA3x9PSYKtn4Vf6dX9ElXW1bzq0rs4xPfVtQqtyzN9Jsiq0Uvst1BeVT8TsJBhGNRCJlhn%2F3TMOhdJ4j%2FXpu3yCal4v6bu8m4j2hVCVsN0%2FGF%2Ff%2FKtBtmXoQL5%2BIhoTg0R5bGtjlrOIKhluvRRuQdiFlzH6otZtePveuAlXX71t08lWGa6vcVgvEyzJI7W%2BNs6%2F7ZeRemK%2FIQXpX8lP6zCr3CInnjxhM8%2FCbP9Meja298XrYviH6h1AYK%2BoF%2BOS982YRYsdn7Bd75huUejZPm0QZIF35sRisUeAcESGfLtRQQhMrapiR8DnHId36gDbDgt73TJsy4MbJZi0R4TBxAHNL9EkkQ7CLHZDLFdPSXxLuCQz8ebqw94qj5RIrBoqLqSNGAroAGHBaKf5ic%2Bj4u%2FIlCiiYIS8pNRYYmE7J%2FDyu8rLn53MX93Nct3hX%2FU3r37dJtT6lMUhTn71m6DVeh%2FDr%2BENZDft7H19ttWAwlnrR7r85abwKV98HK39cPobPeU7PDl%2FXEUeYjWWGwWSQ2Xg6SM9n0IlrpKtdImcSYtErnmUQ%2FriEj2ukh0jUEkomHQvNtCZk%2F1k38f4kyobqPWUnFYzfXmMXcY6FrhIkjSo3N2rHxQILA1naEaV2eJBlqaB3F2gwHPv2PMI7UPmEsHyfFh4zlHg43LqhAqGcwBerUjdsCc61aNyaCDWQwhR8pMdlVwmK6MwfBozmC3uDyY%2BW7LfM1tmt%2Bb8xDrfTroo8gs46GIq8Qs6%2B4Uo31vDvSfwvmm2TS%2Fv7AxmSP%2F1%2Fluszs6GzVcQBscNuo4LBs1XAVsVKk3oobOronqCH2mnWsFE%2BxEGXdBEuJHwqKuF0lwXBWOdxSS8DzA1cxmLuhYveY7VguXtY2m%2Bf1JjnWWXNzdHJ3gTBscw4h6i8ez1scXPAdSksuhJCUWTldKqvS%2FKuzZgul64%2Fz%2BmM7TJ4Dr48PV25tFk7ej5jnY6%2Fn%2FeQqeguFcCwmersAKMG2aiYxpIXlqwb4I0jSMo3TyMAfqtGPbo8GcrNsE89vgce4vN%2FmzSsH91wDbs%2BESr5pftndAFpJiqGNYJ0EQqRAlGpCgxDHTchCWruAgWPcAcxB%2F7IJokfnLL3je4vPN9Uf%2BgfQHaJz40VqFmxJE7C0CqTFkszYJ2VzTovUGn8KALgWBclxpmTotS4kW2SEKb7nNSwj8lxyNl96Lo4OFOocGBRtGgOGWdC%2FcF8HHQ%2Be3ae6oUdPvrc8gbSIOkhLd0UyNk554LKmwtaYk1aQrSnkasEIMXe0RItb6OjI7U%2BTj4h7inkCVR1oc1zwHpEZYjmIW47jgPnYzC3Adp2l%2Bb%2FzxeB52RrdefLy8WFxJqnlv0gzrCflhXxSqXpr60Srxf5loVN8BUX3LY3URZFqsMgLzUw5RRkqu1Az%2FOwzMT2HwjQ9%2FjRP0v5E9qsVJHZULjFDbYw0ipHOOyvAUHBXimURH5LSU6%2FVFU%2BTEog5nwoiIL0pzHCYf2EZQmwJLqGLB5CAJFumKRbIm4a%2B485M0DgWW85HJzAZuN8flkJnGySEhpNePzHgG7%2FF0Uk5YtzEcrFDZ0ZotNMeDcYMGd4OAFNqWkLXQqlKOykMykPpkA9p1VKvTbPyQod15mAlJtz037tbH2NZ8efeEuyBbnp93WfLYklyHLIaTS8ljMV6DWJFlMbrXkcVEcRTQIEgxPmcXeWXUy8%2FF2HWY37ZTilqeS%2F8lmMfbOClubWjFv%2BoXUjElz5VUZRlw6jUQUpKJwoZ6TNpOdoiqp5iBVEUAlQqp2KWCJGJEV1s%2FzQpH%2BCLwk%2BVmKHftM94bxtD%2B5Frl4ldWqMuQ61D%2BWkSIswbR958WeODNNvYxH9Xu%2Fa0fLYPkFxUwuy9ucrlO%2FFWIoVMjS3Num1gzUpOVD0rfOFm6yOBpWAoAWoUljq1hKWdYiIQV6hyrzNgd2%2FFnOvQRm0ZblobbNL8%2FYzLEgaRCM6idvv33U15Pe4nxOzvzt%2BE62jsItsFD9vIrVDHSnR9xl9mGUXBGsDhfB53r3GX8x5wko%2Ft0V3k2pjj0UgVXPLG07tYbwp2KGPrfTer2XZTX1y39rFvqzxuYJzs6G3gdmgAL5mExjy2%2FIuDr0A831JkKxlUgBqC5buJsbP0M3O8nVRt%2F6rspckD09zAAP5dOnKl1DwM%2FiEMM115OG5Ih2Vyv%2FslPojDdDAFMBSAEFnyVOVh3eg3UPqTq%2BFWDXwWsY4cHabAYsq57NXCZSMZa5buaMUF4ed8V5Vg3BVF1ckwQ7F5DsFXg%2B25b4uAOK2wzl7ZbCQKLhzi2TBl3ZrQi0Zrl1k%2FTcEmfuzjhdggvpcU5fqPl%2BEHqaJUaWqM6i0N0sGOVRB8e0Z26xoQrJiBaSIBuB%2BGATESvDQdkuuwpSsgucWAKx22AO8EQg%2FRxI4FcGOK4We1iKtLRBDlqXK3BHUg6mtPVGkzCphvUBo9X%2F6wCLoQsTkZEmDwR0aIhWIiOIA3HM4R36sozTKdlIYU8w%2BYc%2BakEOSk0sAU58IKelagpCC84FriEqv6ZzF7GbKCJ7IlkFqpnFzanvgvZzeyi9ZC744ku2cF3oONlU%2FdOPk8EthzkJorweg6qSBRBNqtO%2FP7rze9%2FqoDfukCB%2FgCy6ErSSl%2Bot1UZCj6cZjNHYSgjaibc%2FneeoLSuan0DKmxgHzsJTgPsIYukC3bVOGxQk2o5YC%2BK8gwtHW64uamLS4od%2BPP7p%2F84PIfstb8L8f9JkGb%2F%2B4ryu0H0zf3jNFbiyeE%2F%2Fn0xIcejEhJ4tnU5s97mmGVclh71t0uMS3ljn8ucyMOlv70of3gMV6sC%2B7f%2BfbC99Jdf1gUd1PjoQ%2FGviY%2BWrysoNzOriJrCYadZXTrTzpEHGi2Wy%2FdtaGzTOc6Q5cQPD2nQW63lNGq4w1wCj5i5U0T7%2BD4XgfgQHsMIP00ccdzw5QW5AqBh5E%2BLZ1%2BFCYZt3txBw6ynXE7KT38IzuT8Ld2zOlTDnyIcw2JPlvO%2BPNy2DKP1x4IRnplqJK4Fvf46a0UhnkRRYajq3NKh8SXKYbIBeZz0S%2BT1Tb%2BUFsfctP722onFxk9WBZ7f5%2Fu6S%2BK%2F8neg6Nfq%2ByLP4yjDyk8627%2FzBP%2BdF8T5iHWijHztXMtxvH7Mr%2Bu%2Frv%2B6%2Fuv6P%2Br6vOWUZSg8hNstUXfLErKBvQlAtfEsh1FtdIdTPq%2Bm1YHHWiC3z%2BnfW84B3PppbjbIAT3d%2BLv84%2FJ5G2LoJ90gXzc0BgY%2FrKJCGsdZYXNUSzXgZ505i63%2FNThdeILiZVR5l0cBJ9u448TByaDnuPBkE3tOHJ66jY4JT3ETiiZJevUdM15sDOIr894wgVo5%2FQbYWln4GOyNntzWyv0Gv4wkuF8vfb301C89LeXTAOJ6XO2zCk3%2BtNqnRTo2V9rniOJIJ%2BGNH0e8m%2BQVrMcQ7zo5zB8HnhbolDkyPFnf8anD0zuiOq%2Bjaby5aXZQIEMnHeKoPhJIkLKrPJChczsNAlX6Aj%2FocxYu8y720mWBTbr5bxfXv%2BUVUMvtExB%2FTcu8uQ3SdP8yt%2Fu8WVEi7B05rA4zyxNKr681TQ3xeC6dVmKR72M0BdWRuHiqOot3F2%2F%2F%2BONOcEpchKjBvTpj7U2FR4M0%2FQRHZBjX%2BJ%2BiIwL8zdUkuxQqaSHGbbsMbemvQZEP8D5er4u3HchRFOnJqmuFgjrVZqxIA6lXDumy2nYAloJXWuncPlvjCxi5ohFHTdWIQC61VA5pHn1OB3TyR4i8EbtvK39mNw4CKylKwEIaiHaQLYt3Bi%2BgW8PxCl6Mpgt652zpaBrZIGMWRuk2pxNqifcNqi7tWjCbsjsFCYagI5sJ03mkO6HaYC%2B9X2MvQleLJklLcSvUKl%2Btnlm2l0t5M3L2pb0nnypY0tsRUgXPQPKnQy%2BgJFNQJ%2F6KCWdzgzcbcpO5eR5CFUq3zlO6p2uxiuU9qMM8oK6j6sEtWkK%2BJlSQna%2BaGcINt4ruxvkKeCfPPIDI1KMeWV640kXmAheGZIVg34RogJlM5aE8WsGywcFKjXVdxpE1wkmK%2FdCgW38FVeVln8I7dS77hAt5A56fxMsY%2BGViqSAW3V9ajlM7xnSU9dhMdkMzWcxQkslusAEfpXTT0Aldnjm2VjY5AhCPxDBBEZFBXgjamWEC09YwwUKqXqAH3u5umc1y2HBh1ZNiOczpDfM5SPMdzGM%2F%2F8u43z7eXH2YFeG1SXrd4Is2eM2XePWgtgqa5nZZYSA47aZeLuhMxOvP4XFsDCVM0WRNsKn054BwGbWtl85p6NJdWoxZBmtySp3K4xVHLIDvYLgGHcI7dfZmeS0LKdTUSPOa6Tg3ZXFDfOLQzuzei8OD5jzk5LK9OFr3MmYvDt2cRn6AehsbGzfNKKH1brsBl%2BjfdkPuPm1elZb5tDbXfrWk0%2Bkg7GODh1NzhbrO8Rpb6Er6ZQ3KWLWxnB7CO3UVpcxCAzo9LNb4PvlWONDyGbUVjv6jdA8rLWKxN9EC4cvOBGN7NJ4f2jvMghHZvuqKnH6l2gPB6UrF%2BB9vMRW9pvENkMZnkdeQkmQTi5cjZrA8w27w1cnzDPG7vVl3yRf%2F4Ys%2FAngfHh70ZS6qmddOrux7G8NHUXIY0OYMg7V1B3OzlJkQarwJNV9C1%2F7etMeg2TngMH14h3IO2Caw6SV7c7McGjYpluwnfggbdRQEVA970SqTu9fU%2B13mvbArP91UxHsYMnFbiJf85mjBCqCeScrr9qAHxHBVOQOQ4CzFkt9REDlWhLJjoaXZsas5x6YdqIn9oel8SINtzQdCR4vk9g6FjpxGcSeDjmpet9HuwBdx0KqjnwAXlOOvA%2FKt%2B%2BAvUMQk%2Fced7SgS8B8Mf6ffvVl5wj9pTkojaUv%2Fb1CXardiX%2FcoP82qbKfZL2yCxiJd5zulKiznR259fuksJYteyDGHU29dGfV2zE7CVFKNVsd7pIwFI44Si7RmFuyCwjyb4KIy7MYck7aPnLLWSw79Wjco7S8m1R9kG6S%2BbQj0m0g0Tmhzq4%2FgO7z82H1tklj%2BA5dkO%2FK1vhjLNg40vnXQMNpxkBR6dCYH%2BMyoJbUazPf05vk6cBSB%2BQr0hhMqFFTka%2BLjdnNxC4OYVmfchu%2BctA8tG9CBzurZoCDmRHDbMoym%2Bf1xm5QyC1C5NK0oA67d8pLF6E7G4Av2O8PXELqcTF4hb%2B%2Fp9rJBzZIrGdbtiqvwPp7WkloLyuPb5lceVv58BXxYIhF%2FvvGjdXCXYBxTnnoP4iABWlmBw0NjzG0MX1EcxNYBdhiSTQoMkKB9UByEoMDoLh9p%2FU0xkQtt43Fc4DAg4rpAZZeOrMDeFoac7t%2FZ6ejSJq5rUTXyCgRUR8NiQhLryLZIGR1VLq9AJYhjDoNZsNavVbUC850WcQUfQ7W04jSrlsm%2BaGzeyMzi51ZU0%2B78JJXO4GBlqb%2FcdGrfCsjyZBqdQaarsyJWUZsz%2FDWJ46yOVRgIm9t4FeQz%2Fg8%3D]]]|}
== Analytics ==
== Reference Architecture ==
 
 
Amazon provides us with an example of a reference architecture<ref>https://aws.amazon.com/architecture/
# Auto Scaling to grow and shrink #5
# Database backend with synchronous replication to standby
 
 
 
 
In a standard web application architecture, the incoming user traffic is distributed through load balancers to a number of application servers that run independent instances. These application servers access a shared storage, a shared database and a shared cache. This architecture scales well up to a 6 figure number of users. The application servers are easy to scale because doubling the number of servers doubles the performance.
  But scalability limitations can be found in the shared components. These are the load balancers, the database, the storage and the cache.<ref>Nextcloud introduces an architecture called 'Global Scale' that is designed to scale to hundreds of millions of users. https://nextcloud.com/blog/nextcloud-announces-global-scale-architecture-as-part-of-nextcloud-12/
</ref>
  This reference architecture is also incomplete because it does not address any of the related aspects of how you must integrate this architecture into your operations. It is obviously important that you must have a means to deploy the software onto the system. It is equally important that you configure, monitor and control the infrastructure to adjust over time. Even if the infrastructure 'automatically' adjusts (failoverfail-over, scale-up, scale-down), you need to be able to monitor and know how these systems are performing. If the software is at all developed or deployed internally, then you must also integrate the Development, Software Quality Assurance / Testing, and Release Management disciplines. We can take this even further to address things like how does the architecture enable you to migrate to various geographic locations ([https://wikitech.wikimedia.org/wiki/Switch_Datacenter switch data center]) or fail over in catastophecatastrophe.
== Wikimedia Foundation ==
Looking at the Wikimedia Foundation's usage and implementation of technology gives us great insight about how to grow and scale to be a top ten Internet site, using commodity hardware plus free and open source infrastructure components. At VarnishCon 2016, Emanuele Rocca presents on running Wikipedia.org and details their operations engineering.<ref>https://upload.wikimedia.org/wikipedia/commons/d/d4/WMF_Traffic_Varnishcon_2016.pdf
</ref> Aside from the architecture above, here is another representation of their Web request flow [https://upload.wikimedia.org/wikipedia/commons/5/51/Wikipedia_webrequest_flow_2015-10.png from October 2015] and [https://upload.wikimedia.org/wikipedia/commons/d/d8/Wikimedia-servers-2010-12-28.svg architecture from 2010]
[[File:Wikipedia_webrequest_flow_2015-10.png|664x852pxthumb|right|500px|click for larger view]]
[[File:Wikimedia-servers-2010-12-28.svg|664x714pxthumb|right|500px|click for larger view]]
== Data Persistence ==
It's the job of the persistence layer to store the data of your application. For MediaWiki, there is the old [https://github.com/wikimedia/cdb Constant Database] (CDB) wrapper around the PHP's [https://secure.php.net/manual/en/book.dba.php native PHP DBA functions] . This is (which provides a flat file store like the [https://en.wikipedia.org/wiki/Berkeley_DB Berkeley DB] style databases) is now replaced by simple PHP arrays which are file included. This allows the HHVM opcode cache to precompile and cache these data structures. In MediaWiki, it is used for the [https://www.mediawiki.org/wiki/Interwiki_cache interwiki cache], and the localization cache Having the . This is not to say that you will want or need WMF interwiki list may not be important to your wiki. But , but having a performant cache for the interwiki links contained in ''your '' wiki farm<ref>e.g. see httphttps://lolfreephile.esportspedia.comorg/w/api.php?action=query&amp;meta=siteinfo&amp;siprop=interwikimap</ref> is probably important.
Persistent data is stored in the following ways:
The core database is a separate database per wiki (''not'' a separate server!). One master with many replicated slaves. Reads go to the slaves and write operations go to the master.
 
=== REST API and RESTBase ===
The '''[[mw:REST_API|MediaWiki REST API]]''' reached official v1 status in April 2017. The REST API is a way that you can access content of your wiki. '''[[mw:RESTBase|RESTBase]]''' is a storing proxy for the API, so that read requests can be executed even faster, with lower latency and even less resource usage. The first use-case for REST API was speeding up [[Visual Editor]].
{|align="center" style="max-width:400px;"
| [[File:Cassandra_logo.svg|thumb|right|Apache Casandra NoSQL database|link=https://en.wikipedia.org/wiki/Apache_Cassandra]]
|-
| RESTBase uses Apache Cassandra for backend storage.
|}
{|align="center" style="max-width:400px;"
| [[File:Restbase request flow.svg]]
|-
| The RESTBase stores this content for easy retrieval, much like a cache. And due to the explicit, predictable URI structure, cache hit rates approximate 95%, which is fantastic for high-volume, low-latency, fast user experiences.
|}
 
See the [https://blog.wikimedia.org/2017/04/06/wikimedia-rest-api/ announcement of the REST API] on the WMF blog
 
== Component Optimization ==
Require email for sign-up
Instant Commons is a great feature if you want the millions of photos found at [https://commons.wikimedia.org/wiki/Main_Page Wikimedia Commons] available at your fingertips. If you don't, then you might still have your own collection of files/assets that you want to make available across your wiki farm. In that case, you want to use MediaWiki's federated file system which includes advanced configurations for [https://www.mediawiki.org/wiki/Manual:$wgLBFactoryConf Load Balanced] or local File repos<ref>[https://www.mediawiki.org/wiki/Manual:$wgForeignFileRepos $wgForeignFileRepos]
</ref>.
Wiki best practices for cutting down spam. If your server is busy with spam bots, then real users won't enjoy themselves.
=== Security considerations ===
You can't deploy an enterprise architecture without including security best practices in general; and the specific security practices relevant to your application. For example, make sure that users with the editinterface permission are trusted admins.<ref>https://www.mediawiki.org/wiki/Manual:Security
</ref>
</ref>
[[File:Scap-diagram.png|664x347pxthumb|right|347px|click for larger view]]
== Containers ==
In order to scale, you not only need to have configuration management (etckeeper, git) and orchestration tools (Chef, Ansible, Puppet), but you need a way to package base or complete systems for easy reproducibility. The Docker or VirtualBox technologies allow you to do just that. So, for example, when you want to deploy a MariaDB Galera clustered solution for your database tier, you can use [https://github.com/instantlinux Rich Braun]'s docker script Instantlinux/mariadb-galera <ref>https://hub.docker.com/r/instantlinux/mariadb-galera/
</ref>
== Other Best Practices ==
Use naming conventions in your infrastructure so that you know what's what.<ref>https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions
</ref>
{{References}}
__NOTOC__
__NOEDITSECTION__
[[Category:Mediawiki]]
[[Category:Wiki]]
[[Category:System Architecture]]
4,558

edits