Big Data — Technology or Problem ?
A brief introduction to the problems faced by tech giants in handling Petabytes of data everyday.
We all familiar with the term DATA. Till 2010, we had approximately 5ZB data. Presently, we are dealing with 50ZB data. It’s already a huge number and we are seeing this increase exponentially.
Data can be anything ranging from text, mail, phone calls to music, videos, etc. Skype users make 231,840 calls and people are tweeting out their thoughts at 511,200 tweets a minute. For a comparison, a smartphone user approximately transacts 40 Exabytes in a month.
According to a report, “The world’s internet population is growing significantly year-over-year. As of January 2019, the internet reaches 56.1% of the world’s population and now represents 4.39 billion people — a 9% increase from January 2018.”
When we hear the about IT Giant’s, the firms that strike in our mind are Google, Facebook, Microsoft etc. but have we ever questioned ourselves how these companies handle and manipulate our data ?
Thus Big data is not a technology but its a Problem.
Some Facts related to the ocean of Data
Did you ever think that how much data(photos,videos.etc) Facebook is receiving daily?
Facebook recently unveiled some statistics on the amount of data its system processes and stores. According to Facebook, its data system processes 2.5 million pieces of content each day amounting to 500+ terabytes of data daily. Facebook generates 2.7 billion Like actions per day and 300 million new photos are uploaded daily. Breaking the data down a bit, Facebook says that it scans roughly 105 TB of data each half hour.
Same is the case of YouTube & Instagram:
The total number of people who use YouTube — 1,300,000,000. 300 hours of video are uploaded to YouTube every minute! Almost 5 billion videos are watched on YouTube every single day. YouTube gets over 30 million visitors per day.
95 million photos and videos are shared on Instagram per day. Over 40 billion photos and videos have been shared on the Instagram platform since its inception.
Solution or Strategies to handle Big Data
Distributed System is the solution
For example, you can distribute a set of programs on the same physical server and use messaging services to enable them to communicate and pass information. It is also possible to have many different systems or servers, each with its own memory, that can work together to solve one problem.
The server to which all the servers are connected is known as a master and other servers are known as slaves.We can call all the servers as nodes also. Especially, the master is known as Name node whereas slaves are known as Data node.And together this team of all data nodes and master node is called as a cluster.
The Hadoop Framework
Apache Hadoop is an open source framework intended to make interaction with big data easier. Hadoop is a framework that enables processing of large data sets which reside in the form of clusters. Being a framework, Hadoop is made up of several modules that are supported by a large ecosystem of technologies. Hadoop has made its place in the industries and companies that need to work on large data sets which are sensitive and needs efficient handling.
Big Data is a data set that is huge and complex so that traditional data processing applications are inadequate to deal with them. There are challenges to managing such a huge volume of data such as capture, store, data analysis, data transfer, data sharing, etc. Big Data follows the 3V model as “High Volume”, “High Velocity” and “High Variety”.
The importance of Big Data is not about how much volume of data is present rather it is focused on what you do with that data.