Replication and Sharding in MongoDB

There are two replication models used in NoSQL databases, master-slave and master-master. In master-slave replication, all write operations are performed on master copy and then propagated to slave copies while in master-master write operations can be done on both master copies, but read operations are not guaranteed to be consistent values.

Replica set is concept used in MongoDB to achieve replication i.e. sharing multiple copies of same data across multiple nodes and it is based on slight variation of master-salve technique.

A replica set will have one primary copy of the collection C stored in one node N1, and at least one secondary copy (replica) of C stored at another node N2. The cost of storage and update (write) increases with the number of replicas. The total number of participants in a replica set must be at least three, so if only one secondary copy is needed, a participant in the replica set known as an arbiter must run on the third node N3.

Arbiter does not hold a replica of the collection but participates in elections to choose a new primary if the node storing the current primary copy fails.

MongoDB Replica Set

Sharding is vertical partitioning of collection of documents or objects in NoSQL database,

“Sharding of the documents in the collection—also known as horizontal partitioning— divides the documents into disjoint partitions known as shards. This allows the system to add more nodes as needed by a process known as horizontal scaling of the distributed system, and to store the shards of the collection on different nodes to achieve load balancing”Elsmari (2017).

Three are two mechanisms used in MongoDB to partition a collection, rang-partitioning and hash-partitioning, both required to have shard key, which in MongoDB required to have two characteristics, it must have in every document in a collection and it must have index as well.

The values for shared key are divided into chunks through rand or hash mechanism.

Horizontal Scalability Through Sharding