The Artima Developer Community
Sponsored Link

Python Buzz Forum
Distributing initial configuration information

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Phillip Pearson

Posts: 1083
Nickname: myelin
Registered: Aug, 2003

Phillip Pearson is a Python hacker from New Zealand
Distributing initial configuration information Posted: Jul 6, 2008 7:04 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Phillip Pearson.
Original Post: Distributing initial configuration information
Feed Title: Second p0st
Feed URL: http://www.myelin.co.nz/post/rss.xml
Feed Description: Tech notes and web hackery from the guy that brought you bzero, Python Community Server, the Blogging Ecosystem and the Internet Topic Exchange
Latest Python Buzz Posts
Latest Python Buzz Posts by Phillip Pearson
Latest Posts From Second p0st

Advertisement

One thing I've never seen satisfactorily explained for distributed fault-tolerant web apps is how to distribute configuration information. For example, if you have a cluster of web servers and a pair of MySQL database servers in dual-master-single-writer configuration (both configured to replicate from each other for easy failover, but with only one handling writes), how do the web servers figure out who the master should be? Especially, when a master server goes down, then comes back online (possibly in an inconsistent state), how does it know whether it should become active again or not?

One failure model I have in mind is where you have [clusters of] servers in different locations, such that you need to change DNS to fail over from one location to another. In this case, some clients may still have the old IP address for your website in their cache, so if a colo loses its network connection and the site fails over to a different colo, but then the initial colo comes back up, it needs to know ASAP whether it's meant to be serving traffic or redirecting to the other location, or there's a danger that it will accept updates from clients, which will be hard to reconcile later. (Assuming you're not using an 'eventual consistency' model, in which case the system will probably sort things out by itself).

Also, say each cluster hosts DNS. A cluster that has recently lost its network connection shouldn't start serving DNS immediately upon going back online, in case a failover operation has resulted in records changing.

What I'd like [to build?] is a service that distributes configuration information between hosts, in such a way that the hosts can determine whether they are safe to go online or not, and which hosts are currently authoritative. So when a web host starts up, it can conclusively figure out where its database servers are and whether it should be redirecting (or answering at all), and when a master database host starts up, it can conclusively figure out whether it should accept updates from clients or whether it should rebuild itself from a new master.

The best lead I have so far is the Paxos algorithm, and Google's Chubby filesystem/lock manager. I think the trick will be for each host to maintain a local database of settings (host locations, active colos etc), versioned with the Paxos proposal number, and to continuously poll other hosts. When a host loses contact with enough other hosts to not have a quorum for Paxos, it should deactivate itself.

I haven't thought this through well enough yet, though. Another interesting thing to try would be for all hosts to determine network connectivity, and to vote on what other hosts/colos are down, to determine what to do with DNS and database master selection...

Comment

Read: Distributing initial configuration information

Topic: Hadoop up and running... now what to do with it? Previous Topic   Next Topic Topic: MySpace:

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use