Apache Cassandra Review
Updated: Sep 14
Apache Cassandra is a open source NOSQL distributed database.
If you are considering moving from a single node database like MYSQL to a distributed database, then Cassandra is definitely a database you want to evaluate and consider.
The inspiration for Cassandra came from the Dynamo paper and many of its architectural concepts are from that paper.
Should you use Cassandra ?
Cassandra might be a good choice for you when
High availability is a very important requirement.
Data size is so large that it cannot fit on a single node.
Eventual consistency can be tolerated.
You are ok with de-normalized data model , data duplication and issues related to it.
Concurrent updates to same data item is not often.
Write model is append rather than read - update - write.
Cassandra would not be a good choice for you if
Your data size is small enough to fit on a single node.
Your application needs ACID transactions.
You need strong consistency.
Your schema needs to be fully normalized.
Data duplication can cause problems in applications.
Usage pattern is read - update - write.
Apache Cassandra Architecture
Master less Distributed architecture
No Master. All nodes are equal.
Data partitioning based on consistent hashing.
Clients can connect to any node.
Node can be added and removed from cluster easily.
You can visualize a cluster as a set a nodes on a ring.
In the diagram below, A,B,C ... H are the node.
The hash values by applying a hash function are also arranged on the ring. Each node is responsible for storing a range of values.
For example node B stores key, value pairs whose hash(key) falls in the range 0 -100. C has 101 -200. D has 201 - 300 and so on.
A hash of the key is used to determine the position of the hash in the ring and the key value is stored in the next node on the ring.
Additionally the key value is replicated to multiple nodes for resilience. So a key written to node B might be replicated to nodes C and D.
Client can connect to any node and do a write.
Similarly a client can connect to any node and do a read. That node will hash the key to determine the node that can service the read request and delegate the request to that node.
Nodes can be added and removed from the cluster anytime. To minimize the amount the data that needs to be redistributed when that happens, virtual nodes are used. In the above diagram think of A, B .... H as virtual nodes. A real node N1 stores keys from virtual nodes B,F. Real node N2 stores keys from virtual node C, G and so on.
Resiliency, high availability, replication
Key values are replicated to multiple nodes. So a key written to node B might be replicated to nodes C and D. Replication factor is tunable. If the node that stores a particular key goes down, that key is still available.
Horizontally scalable by easy addition or removing nodes to the cluster. Cassandra takes care of the redistribution of keys as needed. The use of virtual nodes minimizes the need to redistribute keys.
Cassandra node communicate with each other using the gossip protocol and exchange metadata and other cluster information. This article provides a good overview of gossip.
Within each node, the storage data structures are memtable ( in memory ) and SSTable (disk). This is different from the B+ tree we are used to in relational databases. This article provides a good overview of the read and write process within each node in Cassandra.
Keys differences with respect to RDBMSs are
Joins are not supported.
Data is de-normalized.
Query first approach -- You design the schema based on what queries the application need. Schema is designed to make those queries efficient.
storage - In Cassandra when you design your table, you need to keep storage and partitioning in mind. Queries that can be served from 1 partition as fast. In RDBMS, the developer never thinks of storage.
Sort order is decided by the clustering key. Changing the sort order not as simple as changing "order by" in the query
Amazons dynamo on which Cassandra is based was a pure key value store, with the hash of the key used to partition data across nodes. To make it more acceptable to the relational crowd, Cassandra has slapped a table , row , column like structure on top of what is really under the hood a key value like store.
So you do design your schema as a table with columns. But the key has 2 parts - a partition key and a clustering key. The partition key is used to partition data across nodes. Clustering key is used for sorting the data within a node. The first field in the key is the partition key, The rest is the clustering key
Writes are atomic at a partition or row level.
There is support for lightweight transactions with linearizable consistency.
INSERT INTO user ( id, name, password) VALUES ( 12, "abc","xyz) IF NOT EXISTS;
This is like a compare and set operation to prevent duplicates or out of order updates. There is a performance cost as Paxos is used for distributed consensus.
Without that, last write wins, which can lead to unpredictable results. Concurrent writes to same cell can lead to consistency issues.
If transactions are important to you, you must read https://aphyr.com/posts/294-call-me-maybe-cassandra. It is a little dated but will give you a good idea as to what issues you will encounter with an eventually consistent database.
Cassandra is a great database for the use case that is was intended for - which is high availability and high write throughput which sacrificing consistency and isolation.
Think Amazon shopping cart. I do a lot of shopping at Amazon. My checkout process has never ever failed or hung. However if I leave items in the shopping cart or put things in the cart and come back later and try to complete the transaction from another device, sometimes I find duplicate or old items in the cart. You get the picture. This minor glitch does not stop me from doing business for amazon.
Cassandra can be great database for consumer applications that need to scale to millions of users when issues like the one above can be tolerated. But it is not a drop in replacement for your existing relational database.