Big Data In a Nutshell
Introduction
Presently
the importance of big data is being realised very slowly. Big data is complex
data for which advanced methods are required to get a certain value. Here the size of data is such that it goes
beyond the ability of the software tools to capture, process and calculate the
data.
As it is fluctuates every moment accuracy plays a very important role. Better
accuracy can lead to good decisions. As a result of which growth will take
place. I have summarised the definition of big data as per various research
papers.
In
2010, Apache Hadoop defined big data as “data sets which could not be captured
, managed and processed by ganaral computred within ageneral computer with
acceptable scope”. Here Apache tries to tell that the complex data which cannot
be maintained with the help of other softwares .
This requires a high level technique and
technology. The Gartner group defines big data as three dimensional data growth
challenges and opportunities as the 3v’s.
(1) Increasing volume – This directly says to
increase the volume of the data. This data does not sample. It directly grows
and observers the track and what happens to this data.
(2) Variety – Variety refers to the
range of the big data. It is very important to understand the variety of big
data. It tells us that from where the big data will be drawn.
Nowadays the data is growing very rapidly.
Currently the World’s per capita capacity to store data is being doubled every
40 months. This implies that every 40th month we require a new
technology. 2.5 exa bytes of data is being created every day. This data
is gathered from various software logs,
cameras, microphones, radio frequency identification devices(RFID), mobile
devices.
With
big data many important things of the field of technology are related. Things
such as cloud computing and big data play a very vital role for the advancement
of technology. Big data also has certain challenges which are needed to
overcome.
Big
data is one field which directly relates to decision making. Simply with the
help of charts, graphs we could lead to some excellent decisions.
History
This
started seven decade ago. Earlier this was refereed as “information explosion”.
This was the term firstly used in the oxford dictionary. Our increasing ability
to store and analyze data has been a gradual evolution. The main evolution of
big data started at the end of the last century and this took place because of
the invention of digital storage and internet.
As the usage of the term big data has
increased now it all began with the literature. Various novels, articles were
written for the better understanding of this term. In 2008, it was estimated
that 14.7 exabytes of information was produced. According to the reports the
data was going on increasing.
In 2014, the rise of mobile machines took
place. The people started to use mobile devices to access digital data. Now big
data analytics is becoming a top priority for the business. Currently big data
is not a new phenomenon but one that had a long evolution of capturing and
using data. Big data is also laying the foundations on which many evolutions
will be built.
Four Layers
There
are four layers in Big data. Those are as follows:-
(1) Data source layer:-
This
is the first layer of the Big data. In this layer the arrival of data takes
place. For this we first need to analyse whatever we have. Then the next step
is to find out what do we need to answer. For this analysis of question is very
important. This helps to establish new
sources for data.
(2) Data storage layer:-
After
collecting data from the first layer. The next step begins. In this step volume
of data enterprises begins to generate and the storage starts to explode. For
smaller data sets all that is required is a bigger hard disk.
Now when you move on to huge data the
requirement of file system comes. You must have a system that understands the
file system and that can handle the database that is being generated.
Depending
on the amount of data you are storing you need to make a decision of what are
your security and privacy requirements
(3) Data processing layer:-
As
the name suggests the analysis of data takes place in this layer. This is the
most crucial layer of the big data. This layer enables to reach out to a
particular solution.
could
be done by preparing charts, graphs from the analysed data. Presenting the data
as simple as it could be is the key feature which allows to take quick and
right decisions.
Technology
Aim:- Real or Real time delivery of information.
For handling data various technologies were
used such as Relational Database Management system (RDBMS), Dekstop statistics
and visualization packages . But these fail when big data comes in to act. Users
of big data prefer direct attached storage (DAS). This also has many forms like
Solid State Drive (SSD).
With
the help of this the capacity of the SATA disk increases which is buried inside
parallel processing nodes. While using
technology it must be assured that latency is kept in mind. Wherever possible it is tried that latency
could be avoided. Advancement of
technology in big data is very crucial. Proper advancement in this could lead
to exact conclusions. By producing exact conclusions one could easily predict
the trends which the market is following and could be very helpful in
predicting the market.
Various
technologies such as these are used in handling various big data :-
(1) A/B Testing
(2) Machine learning
(3) Natural Language
Processing(NLP)
(4) Cloud Computing
(5) Business Intelligence
(6) Charts
(7) Graphs
Applications
Cloud Computing :-
Cloud
Computing is the delivery of computing services over the internet. Cloud
Computing allows the users to access software and hardware that are accessed by
third parties at remote location.
It is a model for enabling convenient, on
demand network access to shared pool of configurable computing resources that can be rapidly
provisioned and released with minimal management efforts on service provider
interaction.
Cloud model consists of five characteristics:-
(1) On demand self service:-
A customer can unilaterally provision computing
capabilities such as server time and new storage, add needed automatically
without automatically without requiring human access with each service
provider.
(2) Broad new access:-
The capabilities are available over the network and
accessed through standard mechanism that promotes use by heterogeneous thin ir
thick client platform such as mobile phone, tablets, laptop etc)
(3) Resource pooling:-
The provider
computing services are pooled to drive multiple consumer using a multi-tenant
model with different physical and virtually resources which are dynamically
assigned and redesigned according to the consumer demand.
(4) Rapid Elasticity:-
Here this can
be easily provisioned and released. In some cases it can take place
automatically also. This is done to move up the inward and outward commensurate
with demand.
(5) Measured services:-
This system automatically controls and optimizes
resource. This is done by leveraging a metering capability at the same level of
abstraction , It is appropriate to the types of services.
Relationship
between cloud computing and Big data
The development of cloud computing can directly lead
to the solutions of the challenges Big data is facing. It very crucial to
enhance the development of cloud computing. With the help of cloud computing
the storage issue can be solved. This is one of the biggest issue which big
data is facing. Another key thing which affects the big data is distributed
storage. This can effectively manage big data.
Cloud computing
mainly relates to affect the architecture of the IT industry whereas big data
plays a vital role in decision making.
They both are indirectly connected. Therefore, the development of both
the things could lead in further advancement and enhancement in the field of
technology.
Relationship between Iot and Big Data
In
Iot huge amount of sensors are fixed into various devices and machines in the
real world. The sensors which are fixed in this devices and machines produce a
huge amount of data. This data is being produced from many fields. This could
be environmental data, transport data and many others.
Now
this huge amount of data generated could be referred to as Big Data. This has
its own characterstics. This data could be structured, unstructured or other.
It needs to be analysed. For analysis certain graphs, charts are prepared to
reach to a certain solution or a conclusion. This conclusion could be helpful
in bringing a solution to a problem.
Currently
the data processing capacity of IoT has gone down. It is very important to
introduce new technology in this field which could lead to the development and hence
produce good conclusions which would ultimately give a good solution to a
certain problem.