Comparison Between Databases HBase, Cassandra and MongoDB

HBase:

What is HBase:

HBase

Based on Hadoop, HBase is a distributed columnar database. This project is open-source and is scalable horizontally. HBase provides quick access to raw structured data of huge amounts in a way that is similar to Google's big table. The Hadoop File System (HDFS) provides fault tolerance for this system.

It provides random access to Hadoop's File System in real-time via random reads and writes. In HDFS, data can be stored directly or through HBase. By using HBase, data consumers can read/access data from HDFS at random. HBase supports both reading and writing of data to and from the Hadoop File System.

Mechanisms for storing data in HBase:

Tables in HBase are sorted by row, and the database is column-oriented. The schema for a table specifies only column families, where each column is a key-value pair. There are multiple column families in a table, and each of the column families can have any number of columns. Each successive column value is stored sequentially on the disk. There is also a timestamp on each cell value. HBase works as follows:

  • The table is composed of rows.
  • Rows are collections of column families.
  • Each column family comprises a number of columns.
  • Columns are collections of key/value pairs.

A look at HBase's features:

  • HBase shows linear scalability.
  • A failure can be handled automatically.
  • A consistent read and write are provided by HBase.
  • Both sources and destinations can be accessed using Hadoop.
  • It comes with a user-friendly Java API.
  • Replication of data across clusters is supported.

Using area of HBase:

  • Big Data can be read and written in real-time using Apache HBase.
  • On top of commodity hardware clusters, it hosts very large tables.
  • In a sense, Apache HBase is like Google's Bigtable but is not relational. Google Bigtable uses Google File System, just as Apache HBase makes use of Hadoop and HDFS for its storage.

Application areas for HBase include:

  • Heavy applications can be written with HBase.
  • If quick random access to data is needed, HBase is used.
  • Among the companies that use HBase internally are Facebook, Twitter, Yahoo, and Adobe.

Cassandra:

What is Cassandra:

Cassandra

For managing very large amounts of structured data spread across the world, Apache Cassandra is an open-source, distributed, and cloud-based database system. Unlike traditional databases, Cassandra provides highly available services without one single point of failure.

Following is a list of Apache Cassandra's most notable features:

  • Scalability, fault tolerance, and consistency.
  • As a columnar database, it is highly performant.
  • Data model: Google Bigtable. Distribution design: Amazon Dynamo.
  • Based on Facebook's technology, it differs significantly from traditional relational databases.
  • In Cassandra, Dynamo-style replication is implemented without any single point of failure. But it also adds a powerful data model called "column families" to the mix.
  • Many of the world's largest companies are using Cassandra, including Facebook, Twitter, Cisco, Rackspace, eBay, Twitter, Netflix, and many more.

Cassandra's features:

Due to its outstanding technical features, Cassandra has become so popular. A few of Cassandra's features are described below.

Elastic scalability – The scalability of Cassandra is extraordinary; you can add more data and more hardware to accommodate more customers as necessary.

Always on Architecture − Cassandra is inherently fault-tolerant, so it can be used for business-critical applications that cannot afford to have failed.

Fast Linear-Scale Performance − There is linear scalability with Cassandra, which means that your bandwidth increases as you add nodes to the cluster. Cassandra maintains a rapid response time.

Flexible Data Storage − Assembled, semi-structural, and unstructured data formats can all be processed by Cassandra. You can dynamically adapt its structure to meet your changing needs.

Easy Data Distribution − As Cassandra replicates data across multiple data centers, it allows you to distribute it where you need it.

Transaction Support − The properties of Cassandra include Atomicity, Consistency, Isolation, and Durability (ACID).

Fast Writes − Since Cassandra runs on commodity hardware, it is suited to run on low-cost computers. The device can write data at an astonishing speed and store hundreds of terabytes of information without sacrificing read efficiency.

MongoDB:

MongoDB: What is it?

MongoDB

Document-oriented NoSQL database MongoDB is used to store high volumes of data. MongoDB relies on collections and documents rather than the traditional tables and rows seen in relational databases. MongoDB stores information as pairs of key values, which are the fundamental unit of data. The equivalent of relational database tables are collections, which contain sets of documents and functions. In the mid-2000s, MongoDB came to light as a database.

Features of MongoDB:

  • Every database consists of collections, which contain documents. There are a variety of fields in each document. Documents can differ in size and content.
  • Developers can construct the document structure in a similar manner to how classes and objects are constructed in their respective programming languages.
  • It's common to hear developers describe their classes as "key-value pairs" rather than "rows and columns.".
  • MongoDB does not require that rows (or documents) have a schema defined before use. This means that the fields can be added as they are needed.
  • As part of MongoDB's data model, you can represent hierarchical structures, arrays, and other more complicated structures.
  • Scalability - MongoDB environments scale well. There are hundreds of companies around the world that run clusters that have 100 or more nodes facilitating the storage of millions of documents.

MongoDB Architecture: Key Components

The following terms are commonly used in MongoDB

_id – Every MongoDB document must have this field. Documents in MongoDB contain an _id field which represents a unique value. As the primary key, this field acts like a unique identifier. The _id field is automatically created by MongoDB if you create a new document without it.

Collection – The grouping of documents in MongoDB. Collections are the equivalent of tables in general relational databases (RDMS) such as Oracle or MS SQL. They exist within one database. They have no defining structure as can be seen from the introduction.

Cursor – There is a cursor representing the result set of the query.

Database – Like the RDMS container for tables, this is a container for collections. There is one file system per database. It is possible to have many databases on the server.

Document – In MongoDB, a record is referred to as a document. This document consists of the field name and its value.

Field – In a document, there is a name and a value. Documents can have zero or more fields. A document's fields are like its columns in a relational database.

JSON – The JavaScript Object Notation. Structured data is expressed in this plain text format, which can be read by humans. Many programming languages are currently compatible with JSON.

Comparison Between HBase, Cassandra, and MongoDB:

Property

HBase

Cassandra

MongoDB

Key Characteristics

·Distributed and scalable big datastore
·Strong consistency
·Built on top of Hadoop HDFS
·CP on CAP

 

·High availability
·Incremental scalability
·Eventually consistent
·Trade-offs between consistency and latency
·Minimal administration
·No SPF (Single point of failure) – all nodes are the same in

·Schemas to change as applications evolve (Schema-free)
·Full index support for high performance
·Replication and failover for high availability
·Auto Sharding for easy Scalability
·Rich document-based queries for easy readability
·Master-slave model
·CP on CAP

Good For

·Optimized for reading
·Well suited for range-based scan
·Strict consistency
·Fast read and write with scalability

 

·Simple setup, maintenance code
·Fast random read/write
·Flexible parsing/wide column requirement
·No multiple secondary indexes needed

 

·RDBMS replacement for web applications
·Semi-structured content management
·Real-time analytics and high-speed logging, caching, and high scalability
·Web 2.0, Media, SAAS, Gaming

Not Good For

·Classic transactional applications or even relational analytics
·Applications need full table scan
·Data to be aggregated, rolled up, analyzed cross rows

 

·Secondary index
·Relational data
·Transactional operations (Rollback, Commit)
·Primary & Financial record
·Stringent and authorization needed on data
·Dynamic queries/searching on column data
·Low latency

·Highly transactional system
·Applications with traditional database requirements such as foreign key constraints

Usage Case

Facebook message

 

Twitter, Travel portal

Craigslist, Foursquare


Database   Relational Database   Cassandra   MongoDB   HBase   NoSQL   Comparision   Performance   NoSQL Database   Comparison Between Databases HBase   Cassandra & MongoDB  


Comments

0 Comments

Leave a comment

Search