Concurrent transactional behavior, thoughts on RAMP
I recently came across a blog post on distributed transactions here that was promoting a new type of isolation guarantee for transcational behavior that is designed specifically for concurrent transactions to avoid locking style behavior. The goal of this new design is to provide atomic visibility during a transaction. This behavior is of interest to me because it allows a way to do a transactional read on data without blocking incoming transactional writes. The author Peter Bailis wrote the post based on his published paper and provided the implementations that were used for the paper’s performance metrics (7 million ops/sec across a cluster of 100 machines).
So what is RAMP and what does it do? RAMP stands for Read Atomic Multi-Partition Transactions and is an algorithm to ensure atomic visibility in sharded (partitioned) databases.
RAMP provides a clever way to allow concurrent transactions to be aware of each other but not block each other. RAMP transactions help solve a problem you run into often with distributed systems: How do I make sure that people are seeing the same state at a given time?
A classic example taken from the paper is where Tom friends Tim on a social media platform. When you go to read Tom or Tim’s list of friends you don’t want to display that Tom is friends with Tim while displaying that Tim is not friends with Tom. You have to update two records separately but then on the read you want to make sure any changes to either for the given time you’re looking at it are observable (what people see is consistent between people).
The basic algorithm uses a two phase commit protocol combined with some meta data to allow transactions to proceed concurrently. The paper provided three types of RAMP algorithms (Fast, Small, Hybrid) to address meta information size and the number of RTTs required on read (1, 2 always, or 2 some times). Bailis also provided detailed statistics from testing the three RAMP algorithms and several close to RAMP algorithms (who either provide RA or close to RA semantics) in a cluster environment under heavy workloads to provide some empirical data on performance. RAMP did very well compared to the other lock based algorithms with often out performing them on throughput by two to three times the number of transactions per second.
The RAMP-F, or RAMP Fast algorithm is extremely simple. First a client proposes a change to any set of keys with a given client local time stamp for the transaction. The server records that intent and if any other concurrent write transactions occurs that transaction is aware that another transaction is attempting a write and can then get the other’s writes before proceeding. If you are performing a transactional read of the data you can also ensure that you are aware of any incoming data changes and retrieve those before you return. The key is that your items can be across multiple partitions which allows you to be aware of multiple indexes that may be updating.
Here is RAMP-F in Pseudo-ish code
Server – (simple python implementation here)
versions = {}{} latestCommit = {} def prepare(key, value, timestamp) versions[key][timestamp] = value def commit(key, timestamp) if timestamp < latestCommit[key] return latestCommit[key] = timestamp def get(key, tx) if tx return versions[key][tx] else return versions[key][latestCommit[key]]
Client - (simple python implementation here)
def put(set of items (k, v)) timestamp = newTimestamp() parallel-for each item (in parallel) responses = append(responses, prepare(key, value, timestamp)) parallel-for each response in responses (in parallel) if response == 200 commit(key, timestamp) def get(set of keys) parallel-for key in keys results[key] = Server.get(key, nil) parallel-for result in results (in parallel) for key in result.has_pending_tx() results[result.key] = Server.get(key, result.timestamp) return results has_pending basically determines if based on the meta data there is a pending write to go then go get that write's data.
In short, what RAMP gives you is the ability to get multiple different values (and handle detecting incoming writes) from many partitions and write in a last man wins fashion. In contrast something like Google's Datastore would block or have high transaction collisions even if those transactions were reading the entity and aborting themselves if they had detected that the entity was in the desired state already. The RAMP approach allows for more concurrent style transactions across a database and opens up an interesting, new and powerfully performant option for manipulating data in distributed systems.