MediaWiki:Tech: Difference between revisions

From MediaWiki
Jump to navigationJump to search
Line 17: Line 17:
** (At some point, this needs to go into a change control system and set to rsync transparently.)
** (At some point, this needs to go into a change control system and set to rsync transparently.)
** If you edit these files, a config reload is all that you'll need.  Don't waste time restarting Squid.
** If you edit these files, a config reload is all that you'll need.  Don't waste time restarting Squid.
===Mighty Feature, Mighty Server Drain===
* The Go-Box!
* No, seriously, the Go-Box was originally written to poll [http://tfwiki.net/mediawiki/index.php?title=Template:Goicons&action=raw] each time it runs, as was the generator used for off-site signature blocks.  So every time you pull up a page with a Gobox, the server polls itself for that list, using up a second connection, AND taxing the server to execute the PHP code.
* The GoBoxes are also called from MonacoBook, the TFWiki skin, so every time you pull up a page?  You use up an Apache process JUST TO GRAB SOME RARELY-UPDATED XML.
* We've changed that to poll a file in <code>/var/lib/mediawiki/goicons.xml</code> instead, which is updated via a cron job.  Make sure you replicate that whole deal if you ever duplicate the wiki, it's kinda important.
* Or, you could NOT replicate this step, and suffer the consequences of increased CPU load, and have an annoyed sysadmin.  You don't want that.


===Coding Standards===
===Coding Standards===

Revision as of 22:45, 28 January 2013

This Tech Portal is to serve as a clearinghouse for information related to technical issues on Transformers Wiki.

This page's purpose is to hold information-- any tech issues, questions or initiatives should be discussed on the main Community Portal pages where everyone can weigh in. This page isn't an attempt to split-out discussion, merely to archive its results.

Information

Basic Maintenance

  • Linode's given us a lot of new tricks. The big thing to keep in mind is that the new wiki depends upon two servers which are backed up nightly:
    • The Database Server: This is running MySQL, and if we lose it, everyone's hard work goes down the toilet.
    • The Application Server: This is running Apache, ProFTPd, and Postfix. If we lose it, we lose the mediawiki installation, all customizations (MonacoBook, GoBoxes, etc.), and even if we revive the wiki, it won't look the same.
  • There are, however, a lovely middle layer: The Caching Servers. Each of these has a nigh-identical configuration, so if one dies, we simply don't care, as we can rebuild it very quickly from one of its brothers.
    • Each Caching Server runs Squid and Memcached, and their IPs need to be registered in Mediawiki's LocalSettings.php so that it knows to handle them properly. Failure to do so will break IP address blocks in Mediawiki, and will waste RAM on the Caching servers that could otherwise go to Memcached, a powerful object caching engine that accelerates everything we do by avoiding long database queries.
    • Each Caching Server also needs to be registered with the LoadBalancer, by simply copying the configuration of one of its peers and changing the internal IP address to match.
    • Finally, due to webcrawler abuse, each Caching Server also needs to be updated with new hostnames. If we start adding foreign-language wikis to the mix, we would have to add en.tfwiki.net, es.tfwiki.net (Spanish,) and so on. If we ever launch that GoBots sub-wiki, we already have gb.tfwiki.net enabled. These strings need to be added to the mySites ACL definitions in /etc/squid/squid.conf on EVERY Caching Server.
  • Similarly, abusive webcrawlers are filtered by User-Agent via regexes in the /etc/squid/badbrowsers.conf file. Well-behaved bots can be removed accordingly, but again, make sure that your edits are done to ALL Caching Servers!
    • (At some point, this needs to go into a change control system and set to rsync transparently.)
    • If you edit these files, a config reload is all that you'll need. Don't waste time restarting Squid.

Mighty Feature, Mighty Server Drain

  • The Go-Box!
  • No, seriously, the Go-Box was originally written to poll [1] each time it runs, as was the generator used for off-site signature blocks. So every time you pull up a page with a Gobox, the server polls itself for that list, using up a second connection, AND taxing the server to execute the PHP code.
  • The GoBoxes are also called from MonacoBook, the TFWiki skin, so every time you pull up a page? You use up an Apache process JUST TO GRAB SOME RARELY-UPDATED XML.
  • We've changed that to poll a file in /var/lib/mediawiki/goicons.xml instead, which is updated via a cron job. Make sure you replicate that whole deal if you ever duplicate the wiki, it's kinda important.
  • Or, you could NOT replicate this step, and suffer the consequences of increased CPU load, and have an annoyed sysadmin. You don't want that.


Coding Standards

  • Transformers Wiki:Semantic linking initiative

Scripting

  • Transformers Wiki's editable javascript file is located at MediaWiki:Common.css. It can only be altered by admins.
  • Your personal javascript file can be located at User:Username/monacobook.js or User:Username/monobook.js. There does not appear to be a corresponding User:Username/common.js, annoyingly.
  • User:Derik/javascript — Derik's loose documentation of the likely-to-be-useful Javascript functions on TFWiki. (You will not be able to use any of these functions without editing either the site's or your personal .js file though.)

Hosting

Minor note: Much of these notes are here to keep us all in the loop, and should be wikified a little more than I'm going to get to tonight.

  • The current wiki, post Bookworm, is a metered, unmanaged VPS provided by Slicehost.
    • As of January 2013, service has been relocated to Linode
    • This provides us with a dedicated server under the Xen virtualization software. Our Linux distribution of choice was Debian 5.0, on a Xen-aware 2.6.24 SMP-capable kernel.
      • Now Debian 6!
  • As of June 2009, it is running in a semi-optimized state, with MediaWiki configured to utilize a Squid reverse proxy cache for all anonymous users, and a small memcached instance for all database queries.
    • As of May 2011, functions are split between two boxes: Front-end hosts Squid, Apache, and Postfix, and the Back-end server hosts MySQL and Memcached. Disk IO contention has dropped to usable, if not ideal levels.
      • Functions are now split to three layers: 1 MySQL DB server, 1 Application server, running Apache, ProFTPd, and Postfix, and 2 Squid/Memcached servers. Load Balancers are run by a Linode-managed NodeBalancer.
    • Squid has also been updated to use the AUFS file handler instead of UFS. This uses more disk I/O, but parallelizes cache access for better concurrency. I'm also investigating the COSS handler, but that's currently untested.
  • MySQL is largely using InnoDB tables, which can be backed up in a live state, with the purchase of a hefty Innobackup license. This is not in the cards, so we are instead using LVM snapshots from Slicehost. This has been tested, and functions as expected.
    • Still being done this way.
  • FULL local filesystem backups are done via snapshot nightly and weekly. While these cannot be exported to another server, they do allow us to revert to the previous day's database/configuration/file library without much hassle.
    • Long term, we should find someone to host an rsync backup of the entire /var/lib/mediawiki directory, along with an offsite replica of the MySQL database. While it would take a while to reupload the data, this would effectively preserve the wiki in the event of a complete catastrophic datacenter failure. (We'd be down for a while, but we wouldn't lose more than a day's work.)
  • PHP session handling is done via the file handler, as the default Debian PHP package does NOT support the mm handler.
    • This was a key issue with many of our page loading glitches when the wiki first launched on this platform.

Configuration Settings

Memcached: 192MB allocated per Squid cache APC: 40MB allocated

On Echo (the apache/squid server,) noatime and data=writeback is enabled in /etc/fstab as of 6/24/2011. This should reduce the overall I/O load on the system, ideally giving us some added performance.

Noatime is still enabled on all servers, as per Linode defaults.

Software versions

  • OS: Debian 6.0, Kernel 3.6.5
  • Squid: 2.7
  • Apache: 2.2.16
  • Memcached: 1.4.5
  • MySQL: 5.1.66
  • MediaWiki: 1.15.0

Future ideas

  • It is possible to host a Mediawiki installation on a Cloud Hosting provider, such as Rackspace's Mosso, or Amazon's EC2. This would basically give us a bomb-proof installation, as long as we also maintain proper backups.
  • Ideally, we should split the site into multiple servers: One for the database, one for the Squid cache(s), and one for the Apache webserver/memcached.
  • As long as we're at it, a local DB replica would allow us to back up just the database more frequently, without any impact on site performance. If we really want to get fancy, we could keep weekly, daily, and hourly backups, which could then be recovered within minutes. Right now, if we roll back the server, we go back in time for both the database and the images.
  • Memcached only gets better as we allocate more RAM. This may become a priority, if we have the RAM to spare on the current box.

Extensions

  • Aside from the ones that were blindly imported from the old wiki, we're now using the SpamBlacklist extension, with a modification to support Reverse DNS blacklist lookups. In short, make the wrong edit, or come from the wrong IP address, and you won't even get to make your edit.