The Artima Developer Community
Sponsored Link

Make Room for JavaSpaces, Part V
Make Your Compute Server Robust and Scalable
by Susan Hupfer
First Published in JavaWorld, June 2000

<<  Page 2 of 8  >>

Advertisement

Adding Transactions to the Worker

Take another look at the original worker code from the compute server and see why it's not fault tolerant:



public class Worker {
    private JavaSpace space;
    public static void main(String[] args) {
        Worker worker = new Worker();
        worker.startWork();
    }
    public Worker() {
        space = SpaceAccessor.getSpace();
    }
    public void startWork() {
        TaskEntry template = new TaskEntry();
        for (;;) {
            try {
                TaskEntry task = (TaskEntry)
                    space.take(template, null, Long.MAX_VALUE);
                Entry result = task.execute();
                if (result != null) {
                    space.write(result, null, 1000*60*10);
                }
            } catch (Exception e) {
                System.out.println("Task cancelled");
            }
        }
    }
}

After gaining access to a space and calling the startWork method, the worker repeatedly takes a task entry from the space, computes the task, and writes the result to the space. Note that take and write are both performed under a null transaction, which means each of those operations consists of one indivisible action (the operation itself). Step back and think about one scenario that can occur in networked environments, which are prone to partial failure. Consider the case in which a worker removes a task and begins executing it, and then failure occurs (maybe the worker dies unexpectedly or gets disconnected from the network). In this scenario, the task entry is lost for good, and as a result the overall computation won't ever be fully solved.

You can make the worker more robust by using transactions. (The complete code for the compute server that has been reworked with transactions can be found in Resources and forms the javaworld.simplecompute2 package.) First you'll modify the worker's constructor to obtain a TransactionManager proxy object and assign it to the variable mgr, and you'll define a getTransaction method that creates and returns new transactions:



public class Worker {
    private JavaSpace space;
    private TransactionManager mgr;
. . .
public Worker() {
        space = SpaceAccessor.getSpace();
        mgr = TransactionManagerAccessor.getManager();
    }
    public Transaction getTransaction(long leaseTime) {      
        try {
            Transaction.Created created =
                TransactionFactory.create(mgr, leaseTime);
            return created.transaction;
        } catch(RemoteException e) {
            e.printStackTrace();
            return null;
        } catch(LeaseDeniedException e) {
            e.printStackTrace();
            return null;
        }
    }
}

Most of the getTransaction method should be familiar to you after you have read Make Room for JavaSpaces, Part 4. Note that the method has a leaseTime parameter, which indicates the lease time that you'd like the transaction to have.

Now let's modify the startWork method to add support for transactions:



public void startWork() {
    TaskEntry template = new TaskEntry();
    for (;;) {
        // try to get a transaction with a 10-min lease time
        Transaction txn = getTransaction(1000*10*60);
        if (txn == null) {
            throw new RuntimeException("Can't obtain a transaction");
        }
       try {
            try {
                // take the task under a transaction
                TaskEntry task = (TaskEntry)
                    space.take(template, txn, Long.MAX_VALUE);
// perform the task
                Entry result = task.execute();
// write the result into the space under a transaction
                if (result != null) {
                    space.write(result, txn, 1000*60*10);
                }
            } catch (Exception e) {
                System.out.println("Task cancelled:" + e);
                txn.abort();
            }
            txn.commit();
        } catch (Exception e) {
            System.out.println("Transaction failed:" + e);
        }
    }

Each time startWork iterates through its loop, it calls getTransaction to attempt to get a new transaction with a lease time of 10 minutes. If an exception occurs while creating the transaction, then the call to getTransaction returns null, and the worker throws a runtime exception. Otherwise, the worker has a transaction in hand and can continue with its work.

First, you call take (passing it the transaction) and wait until it returns a task entry. Once you have a task entry, you call the task's execute method and assign the returned value to the local variable result. If the result entry is non-null, then you write it into the space under the transaction, with a lease time of 10 minutes.

In this scenario, three things could happen. One possibility is that the operations complete without throwing any exceptions, and you attempt to commit the transaction by calling the transaction's commit method. By calling this method, you're asking the transaction manager to commit the transaction. If the commit is successful, then all the operations invoked under the transaction (in this case, the take and write) occur in the space as one atomic operation.

The second possibility is that an exception occurs while carrying out the operations. In this case, you explicitly ask the transaction manager to abort the transaction in the inner catch clause. If the abort is successful, then no operations occur in the space -- the task still exists in the space as if it hadn't been touched.

A third possibility is that an exception occurs in the process of committing or aborting the transaction. In this case, the outer catch clause catches the exception and prints a message, indicating that the transaction failed. The transaction will expire when its lease time ends (in this case after 10 minutes), and no operations will take place. The transaction will also expire if this client unexpectedly dies or becomes disconnected from the network during the series of calls.

Now that you've made the worker code robust, let's turn to the master code and show how you can improve it as well.

<<  Page 2 of 8  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us