Iowa Code Camp 2016 “Correctness, Consistency, and other lies”

by Ross · November 4, 2016

I had the opportunity to speak at Iowa Code camp in October. My talk was “Correctness, Consistency, and other lies”. My slides are here. It was a good experience and I enjoyed being able to speak to a small audience. There were about 15 people in the audience which gave me time to answer questions and make sure everyone understood things sufficiently.

During the talk I spoke about a few major distributed consensus protocols and also some design practices when building distributed systems that help manage the complexity of deriving ‘correctness’ of your data. I covered Paxos, RAFT, and some CRDT data structures and touched on the CALM theorem. I also went over how most databases default or even highest ‘consistency’ guarantee still allow common data errors to creep into your system.

I think the calm theorem drives home the importance of really asking yourself what needs to be coordinated in your system. I used a shopping cart example to talk through how we can avoid transactions as we add/remove items and it is only when we go to check out, then we need to actually snapshot things. I also introduced the concept of using a commit log based architecture to allow you to be able to let your data flow more easily through different systems. I really feel like what we have to do is look at our systems as all part of a database turned inside out. Database transaction avoidance algorithms can be applied to large scale distributed systems because it often times feels like all problems are the same, just in a different theater.

What I mean to say is that when you break down your challenges they center on the same fundamentals of working with IO, coordination between processes, and ensuring some form of correctness and availability. The underlying technology may change, you may be using docker, a new NoSQL system but often it feels like we are merely moving user generated behavior, writing that out to some shared state, and throwing some light weight processing around it. What Uber does from an algorithm perspective is fairly simplistic. Making it scale and be stable for the load they incur is not. Either way using new systems and algorithms can be exciting, interesting and challenging but we should always remember that some times the problem is not as hard as we would like it to be and that we should remain focused on pushing for solutions that we can complete quickly and are low in complexity so when they break (and they will) we can easily debug and fix them.