The Artima Developer Community
Sponsored Link

Java Community News
Domas Mituzas on the Wikipedia Architecture

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Domas Mituzas on the Wikipedia Architecture Posted: Oct 26, 2007 5:49 PM
Reply to this message Reply
In a recent presentation at the MySQL users conference, Domas Mituzas explains Wikipedia's architecture.

As the eights busiest Web site, Wikipedia is unique in that it relies mostly on free, open-source software for its highly available infrastructure. At the 2007 MySQL users conference, MySQL's Domas Mituzas, who also works with Wikipedia, gave a presentation on Wikipedia's scalable architecture. The presentation is available from Wikipedia's Site Internals, Configuration, Code Examples and Management Issues.

Mituzas points out that:

The principle of openness forced all operation to use free & open-source software only. Having commercial alternatives out of question, Wikipedia had the challenging task to build efficient platform of freely available components...

Wikipedia’s primary aim is to provide a platform for building collaborative compendium of knowledge. Due to different kind of funding (it is mostly donation driven), performance and efficiency has been prioritized above high availability or security of operation.

Mituzas highlights the key elements of Wikipedia's architecture:

  • Linux - operating system (Fedora, Ubuntu)
  • PowerDNS - geo-based request distribution
  • LVS - used for distributing requests to cache and application servers
  • Squid - content acceleration and distribution
  • lighttpd - static file serving
  • Apache - application HTTP server
  • PHP5 - Core language
  • MediaWiki - main application
  • Lucene, Mono - search
  • Memcached - various object caching

The presentation focuses on many aspects of caching and content delivery:

Content delivery network is the ‘holy grail’ of performance for Wikipedia. Most of pages (except for logged in users) end up generated in such a manner, where both caching and invalidating the content is fairly trivial...

There’re no unaccounted dynamic bits on a content page (if there are, the changes are not invalidated in cache layer, hence causing stale data).. Every content page has strict naming, with single URI to the file ( good for having uniform linking and not wasting memory on dupe cache entries)... Caching is application-controlled (via headers) (simplifies configuration, more efficient selection of what can and cannot be cached)... Content purging is completely application-driven (the amount of unpredictable changes in unpredictable areas would render lots of stale data otherwise)... Application must support lightweight revalidations (If-Modified-Since requests)

What do you think of Wikipedia's architecture as presented by Mituzas?

Topic: Atlassian Releases Clover 2.0 Previous Topic   Next Topic Topic: Silicon Valley Code Camp This Weekend!

Sponsored Links


Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use