v Introduction of MongoDB:
MongoDB
is an open-source document database and leading NoSQL database. MongoDB is
written in C++. MongoDB is a cross-platform, document oriented database that
provides, high performance high availability and easy scalability. MongoDB
works on concept of collection and document.
You
can leverage MongoDB if you are expecting a lot of reads and write operations
from your application but you don’t care much about some the data being lost in
the server crash.
You
can also use MongoDB when you are planning to integrate hundreds of different
data sources since the document-based model of MongoDB Serves as a great fit to
provide a single unified view of your data. You can even use it to clickstream
data and use it for customer behavioral analysis.
v Architecture of MongoDB :
1. Database:
Database
is a physical container for collections. Each database gets its own sets of files
on the file system. In addition to grouping documents by collection, MongoDB
groups collections into databases. A single MongoDB server has multiple
databases. A database has its own permissions, and each database is stored in
separate files on disk. A good rule of thumb is to store all data for a single
application in the same database. Separate databases are useful when storing data
for several application or users on the same MongoDB server.
Like
collections, databases are identified by name. Database names can be any UTF-8
string, with the following restrictions:
Naming
Rules:
1.
The empty string is not a valid database name.
2.
A database name cannot contain any of special symbols.
3.
Database names are limited to a maximum of 64 bytes.
4.
Database names are case-sensitive, even on non-case
sensitive file systems.
There
are also several reserved database names, which you can access but which have
special semantics. These are as follows;
1.
admin:
This
is the root database in terms of authentication. If a user is added to the
admin database, the user automatically inherits permissions for all databases.
There are also certain server side commands that can be run only from the admin
database, such as listing all of the databases or shutting down the server.
2.
local:
This
database will never be replicated and can be used to store any collections that
should be local to a single server.
3.
config:
When MongoDB
is being used in a sharded setup, it uses the config database to store
information about the shards.
By concatenating a database name with
a collection in that database you can get a fully qualified collection name
called a namespace.
2. Collection:
Collection
is a group of MongoDB documents, It is the equivalent of an RDBMS table. A
collection exists within a single database. Documents within a collection can
have different fields. Typically, all documents in a collection are of similar
or related purpose.
Naming
Rules:
A collection is identified by its
name. Collection names can be any UTF-8 string, with a few below restrictions;
1.
The empty string (“ “) is not a valid collection name.
2.
Collection name may not contain the character \0
(Null) because this delineates the end of a collection name.
3.
User should not create any collections that start with
the system., a prefix reserved for internal collections. For example, the
system.users collection contains the database’s users, and the
system.namespaces collection contains information about all of the database’s
collections.
4. User cannot create collection with the reserved character $ in the name. The various drivers available for the database do support using $ in collection names because some system generated collections contain it.
3. Document:
A document is
a set of key-value pairs. Documents have dynamic schema. Dynamic schema means
that documents in the same collection do not need to have the same set of
fields or structure, and common fields in a collection’s documents may hold
different types of data.
Terms related with RDBMS and MongoDB
MongoDB |
|
Database |
Database |
Table |
Collection |
Tuple/Row |
Document |
Column |
Field |
Table Join |
Embedded Documents |
Primary Key |
Primary key (Default Key-id provided by MongoDB itself) |
Database Server and Client
mysqld/Oracle |
Mongod |
mysql/sqlplus |
Mongo |
Any relational database has a typical
schema design that shows number of tables and the relationship between these
tables. While in MongoDB, there is no concept of relationship.
1. Queries: It supports ad-hoc queries and document-based queries.
2. Index-support: Any field in the document can be indexed.
3. Replication: It supports Master-Slave replication. MongoDB uses the native applications to maintain multiple copies of data. Preventing database downtime is one of the replica set’s features as it has a self-healing shard.
4. Multiple Servers: The database can run over multiple servers. Data is duplicated to foolproof the system in the case of hardware failure.
5. Auto-sharding: This process distributes data across multiple physical partitions called shards. Due to sharding, MongoDB has an automatic load balancing feature.
6. Map-reduce: It supports MapReduce and flexible aggregation tools.
7. Failure Handling: In MongoDB, it’s easy to cope with cases of failures. Huge numbers of replicas give out increased protection and data availability against database downtimes like rack failures, multiple machine failures, and data center failures, or even network partitions.
8. GridFS: Without complicating your stack, any size of files can be stored, GridFS feature divides files into smaller parts and stores them as separate documents.
9. Schema-less Database: It is a schema-less database written in C++.
10. Document-oriented Storage: It uses the JSON format document.
11. Procedures: MongoDB JavaScript works well as the database uses the language instead of procedures.
1. Schema less: MongoDB is a document database in which one collection holds different documents. Number of fields, content ad size of the document can differ from one document to another. Data is stored in the form of JSON style documents.
2. Structure of a single object is clear.
3. No complex joins.
4. Deep query ability. MongoDB supports dynamic queries on documents using a document based query language that is nearly as powerful as SQL.
5. Tuning
6. Easy of scale- out: MongoDB is easy to scale.
7. Conversion/ mapping of an application objects to database objects not needed.
8. Uses internal memory for storing the working set, enabling faster access of data.
1.
Big Data
2.
Content management and delivery
3.
Mobile and Social infrastructure
4.
User Data Management
5.
Data Hub
v Data Model in MongoDB:
Data
in MongoDB has a flexible schema. Documents in the same collection. They do not
need to have the same set of fields or structure Common fields in a
collection’s documents may hold different types of data.
MongoDB
provides two types of data models:
1.
Embedded Data Model:
In
this model, user can have (embed) all the related data in a single document, it
is also known as de-normalized data model. For example, assume we are using
details of employee in three different documents namely, Personal_data, Contact_data
and Address, we can embed all the three documents in a single one like;
{
_id: ,
Emp_ID: "10025AE336"
Personal_details:{
First_Name: "Tejas",
Last_Name: "Sharma",
Date_Of_Birth: "1995-09-26"
},
Contact: {
e-mail: "Tejas123@gmail.com",
phone: "9000022338"
},
Address: {
city: "Mathura",
Area: "Mathura",
State: "Uttar Pradesh"
}
}
2.
Normalized Data Model:
In this model,
we can refer the sub documents in the original document, using references. For example,
we can re-write the above document in the normalized model as:
Employee:
{
_id: <ObjectId101>,
Emp_ID: "10025AE336"
}
Personal_details:
{
_id: <ObjectId102>,
empDocID: " ObjectId101",
First_Name: "Tejas",
Last_Name: "Sharma",
Date_Of_Birth: "1995-09-26"
}
Contact:
{
_id: <ObjectId103>,
empDocID: " ObjectId101",
e-mail: "Tejas123@gmail.com",
phone: "9000022338"
}
Address:
{
_id: <ObjectId104>,
empDocID: " ObjectId101",
city: "Mathura",
Area: "Mathura",
State: "Uttar Pradesh"
}
v Designing Schema in MongoDB:
1. Design your schema according to user requirements.
2. Combine objects into one documents if you will use them together, Otherwise separate them(but make sure there should not be need of joins)
3. Duplicate the data (limited) because disk space is cheap as compare to compute time.
4. Do joins while write, not on read.
5. Optimize your schema for most frequent use cases.
6. Do complex aggregation in the schema.
Ø Example:
Suppose
a client needs a database design for his blog/website and see the difference
between RDBMS and MongoDB schema design. Website has the following
requirements;
ü Every
post has the unique title, description and url.
ü Every
post can have one or more tags.
ü Every
post has the name of its publisher and total number of likes.
ü Every
post has comments given by users along with their name, messages data-time and
likes.
ü On
each post, there can be zero or more comments.
In RDBMS
schema, design for above requirements will have minimum three tables.
While in MongoDB schema, design will have one
collection post and the following structure;
{
_id: POST_ID
title: TITLE_OF_POST,
description: POST_DESCRIPTION,
by: POST_BY,
url: URL_OF_POST,
tags: [TAG1, TAG2, TAG3],
likes: TOTAL_LIKES,
comments: [
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
},
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
}
]
}
So, while showing the data, in RDBMS we need to join
three tables and in MongoDB, data will be shown from one collection only.
0 Comments