MongoDb Review: Is Mongodb right for you ?
Updated: Sep 17
A few years ago, I asked the CTO of a Silicon valley startup why they picked MongoDb as their database ? The answer was "It is an awesome database". I asked another CTO why MongoDb was not considered at their company ? The answer was "It is a terrible database". In both cases the CTOs were unable to provide any serious technical reasons. The coverage of MongoDb on the internet follows the same pattern. You see articles praising the database or others for no good reason simply ripping the database.
In this blog we go beyond the hype (in either direction) and look at the reasons why you should consider or not consider MongoDb as your distributed database.
The highlights of MongoDb are It is a document database. It stores json documents in collections which are analogous to tables. Supports horizontal scalability by sharding data across nodes. Supports high availability using replica sets. Transactions - operations on a single document as atomic. Programming model for the application programmer is very easy but It has
multiple moving parts and the administration of a multi node cluster is not that easy.
2.0 When to use MongoDb?
You should consider MongoDb if Your application data model is JSON documents. You do not care about a relational model and do not want to waste resources on converting JSON documents to a relational model and vice versa. You are fine with denormalization and duplication of data. Your data size is large enough that you need to scale beyond single node. You can tolerate weaker isolation levels. You may want to pass on MongoDb if Your application data model is NOT JSON. Your data fits on a single node. Stick to Mysql or Postgresql. Your organization is rigidly married to the relation model and normalized data. Your need strong isolation level or transactions that span multiple documents or nodes.
A Mongodb database has collections which are analogous to tables in other databases. A collection stores JSON or more specifically BSON (binary representation of JSON). It is essentially schema less. Schema validation is not required but possible.
3.1 Programming Model
The client programming model while not SQL is a fairly simple API that is well described in the MongoDb documentation. It is quite easy to use. The thing to remember is that you will be coding to JSON and not rows and columns.
Storage is the very similar to Mysql. It is B+tree based except that the record is BSON (binary JSON). Early versions had a proprietary storage engine MMAPV1 which had a lot of issues. From v4.2, Mongdb started using WiredTiger as the storage engine and that was a big improvement. Clustered indexes were added in v5.2. In storage, Mongodb is closer to the traditional databases that the modern distributed databases.
Unlike Cassandra or CockroachDb where all nodes are equal, a sharded cluster in MongoDb has 3 moving parts - the shard, the query router(mongos) and a config server. This adds to the cost of administration. A shard key is used to partition the documents in a collection in chunks and the chunks are distributed across shards. When a chunk reaches a certain size, it is split into 2 chunks. Clients interact with a sharded cluster by connecting to the router. A typical configuration could be - 3 replicas for each shard, 3 replica config servers and 1 or more mongos. Unlike many other databases, sharding does not happen by default. Sharding needs to be enabled for collections explicitly.
A replica set in Mongodb is set of servers that manage the same data. Replication provides the high availability for the data. In a replica set, one node is primary and the others are considered secondary. Primary receives all writes and replicates to secondaries asynchronously. By default, Clients generally read from the primary as well. However replicas can accept reads. If the primary becomes unavailable, one of the secondary takes over as primary. Compared to say Cassandra or CockroachDb, configuring a mongodb requires a little bit of work. It is not as simple as adding a node to a cluster.
Mongodb has the concept of "read concern" and "write concern" that determines what data concurrent application will see.
3.5.1 read concerns
"local" means you read the latest in a particular shard, even though it can be rolled back in future. "majority" means you read data that is acknowledged by majority replicas, even though it can be rolled back in the future. "snapshot" means you read from a snaphot from the majority replicas. All this sounds sketchy because what you read can be rolled back. Read concern really need to be used in conjunction with corresponding write concern for the read to read committed data
3.5.2 write concerns
A write concern w:1 means client gets acknowledgement after the write is committed by the primary. A write concern w:majority mean client get acknowledgement only after write is committed by majority of replicas. The write concern has an impact on reads no matter what the read concern is. Clearly the isolation story is not the strictest and in highly concurrent applications there is risk of reading stale and/uncommitted data. Application developers need to take care to protect themselves.
Operations on a single document are atomic. Mongodbs position is that because you can embed documents within documents, the need and use cases for multi document transaction are few. But they are supported from version 4.2.
The cost of administration is higher where compared to competing databases. Setting up replicas and shards requires an administrator typing a few commands. All nodes are not equals. There are different types of servers - shards, config servers and routes. Not as simple as starting a node as pointing it to an existing cluster.
Mongodb a reasonable database to use if your currency is JSON documents. The json based programming model is very easy for application programmers. However its storage is based on the b+tree which makes it closer to MySql than the write optimized NoSql databases. It does support server side sharding and horizontal scalability as well as high availability using replication. However it has more moving parts and cost of administration is higher than comparable databases. In the early days, Mongodb got a lot bad press. When it comes to database technologies like storage or distributing data, Mongodb generally has to play catch up. But they have settled down to become a stable database especially after they moved to the WiredTiger storage engine.