Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Database partitioning

In our new architecture at Artima, we have thought a lot about scalability, and one of our strategies is database partitioning. To the extent possible, we wanted to avoid having everything in one database. The system we are designing is a network of sites, and we felt that it would be prudent to make it possible for each site to be in its own database. On the other hand, we didn't want to be forced to put every site in a different database, because we were concerned that more databases would be more complicated to manage than fewer databases. As a result, we developed the concept of a "sites" database that can host the data for one to many sites. We can deploy multiple sites databases, each of which will be connected to a cluster of application servers that will host all the sites in that database.

We have been using Postgres as our database at Artima for many years, and would like to continue to do so. We believe that one benefit of a partitioned database architecture is that we won't have to migrate everything over to a commercial database, such as Oracle, to help us scale up. If a particular sites database becomes overwhelmed with load, we can divide it into two databases, putting half of its sites into each new database—somewhat like a cell dividing.

Another benefit that we see of our partitioned database architecture is that if a database goes down, the entire network of sites doesn't go down. Only those sites hosted by that database go down. We would like eventually to set up some kind of database mirroring, so that if a database goes down, the mirror would take over shortly thereafter. But until that day, and even after that day in a catastrophic failure when both a database and its mirror fails, we'll only have a partial outage.

We weren't able to put everything into the sites databases, however, because some data is shared among all sites. In particular, we want to provide single-sign on, such that one Artima ID will work at all sites. Therefore, the user data is shared among all sites. As a result, we have a "shared" database that contains this shared data. The shared database is currently a single point of failure. If it goes down, the entire network of sites will go down. One way to improve the situation would be to replicate the user data into each sites database, such that if the shared database goes down, only the ability to update the user data would go away across the network. Anything that only required reading the user data would still work.

Design tradeoffs

Although database partitioning does provide scalability advantages, it has some drawbacks as well. First, as I mentioned previously, more databases is more databases to maintain, upgrade, back up, etc. Another drawback to partitioning databases is that because we want to avoid distributed transactions (because of their drag on scalability), we lose some ability to use the database to enforce data integrity constraints. For example, given that users are in the shared table, we can't add a foreign key constraint to the author's user ID field of a forum post in the sites table. (Were we to replicate read-only user data to each sites database, on the other hand, we could apply the constraint.)

Another example is back links. If site B publishes a translation of an article previously published in site A, we'd like to place a link in B that points to the article in A, and a link in A that points to the article in B. If both A and B were in the same database, then we could write both ends of the link in one transaction. Given that A and B can be in different databases, however, we either need to perform these writes under a distributed transaction, or write the back link asynchronously by sending a message, and accept that occasionally the back link may not get written. Because we feel scalability trumps integrity in this case, we opted for the latter approach. Back links will be written and modified by sending asynchronous messages, some of which may occasionally not arrive.

Another cost of the database partitioning approach is complexity, because if you want to avoid distributed transations, updating multiple databases is more complicated than updating a single one. A good example of this extra complexity is the writing and updating of back links. Instead of simply updating a back link directly in the same database, we must design a messaging system that allows one site to send a message to another.

Premature or prudent?

Are the manangement, integrity, and complexity tradeoffs I mentioned worth the scalability benefits? If our network needs to scale up, then I'd say yes. If not, then no. It is too soon to say. As developers, we need to make investment decisions based on our experience and the information we have at the time. I decided to take on the tradeoffs up front so that if and when the time comes that a database becomes overwhelmed, we'll have database partitioning as one tool in our scalability arsenal.

How much do you worry about scalability up front? When do you think it is and is not appropriate to complicate your design to carve out paths for future scalability? In particular, what is your opinion on database partitioning as a scalability strategy?

Carsten Saager

Posts: 17
Nickname: csar
Registered: Apr, 2006