BIG DATA

Big Data — Technology or Problem ?

We all familiar with the term DATA. Till 2010, we had approximately 5ZB data. Presently, we are dealing with 50ZB data. It’s already a huge number and we are seeing this increase exponentially.

Data can be anything ranging from text, mail, phone calls to music, videos, etc. Skype users make 231,840 calls and people are tweeting out their thoughts at 511,200 tweets a minute. For a comparison, a smartphone user approximately transacts 40 Exabytes in a month.

According to a report, “The world’s internet population is growing significantly year-over-year. As of January 2019, the internet reaches 56.1% of the world’s population and now represents 4.39 billion people — a 9% increase from January 2018.”

When we hear the about IT Giant’s, the firms that strike in our mind are Google, Facebook, Microsoft etc. but have we ever questioned ourselves how these companies handle and manipulate our data ?

Thus Big data is not a technology but its a Problem.

Some Facts related to the ocean of Data

Did you ever think that how much data(photos,videos.etc) Facebook is receiving daily?

Facebook recently unveiled some statistics on the amount of data its system processes and stores. According to Facebook, its data system processes 2.5 million pieces of content each day amounting to 500+ terabytes of data daily. Facebook generates 2.7 billion Like actions per day and 300 million new photos are uploaded daily. Breaking the data down a bit, Facebook says that it scans roughly 105 TB of data each half hour.

Same is the case of YouTube & Instagram:

The total number of people who use YouTube — 1,300,000,000. 300 hours of video are uploaded to YouTube every minute! Almost 5 billion videos are watched on YouTube every single day. YouTube gets over 30 million visitors per day.

95 million photos and videos are shared on Instagram per day. Over 40 billion photos and videos have been shared on the Instagram platform since its inception.

Solution or Strategies to handle Big Data

Distributed System is the solution

For example, you can distribute a set of programs on the same physical server and use messaging services to enable them to communicate and pass information. It is also possible to have many different systems or servers, each with its own memory, that can work together to solve one problem.

The server to which all the servers are connected is known as a master and other servers are known as slaves.We can call all the servers as nodes also. Especially, the master is known as Name node whereas slaves are known as Data node.And together this team of all data nodes and master node is called as a cluster.

The Hadoop Framework

Apache Hadoop is an open source framework intended to make interaction with big data easier. Hadoop is a framework that enables processing of large data sets which reside in the form of clusters. Being a framework, Hadoop is made up of several modules that are supported by a large ecosystem of technologies. Hadoop has made its place in the industries and companies that need to work on large data sets which are sensitive and needs efficient handling.

Conclusion

Big Data is a data set that is huge and complex so that traditional data processing applications are inadequate to deal with them. There are challenges to managing such a huge volume of data such as capture, store, data analysis, data transfer, data sharing, etc. Big Data follows the 3V model as “High Volume”, “High Velocity” and “High Variety”.

The importance of Big Data is not about how much volume of data is present rather it is focused on what you do with that data.

You can reach out on my Twitter, Instagram, or on LinkedIn if you need more help. I would be more than happy.

If you have come up to this, do drop an 👏 if you liked this article.

Good Luck 😎 and happy coding 👨‍💻

--

--

--

Software Engineer @ Red Hat | Learning, Sharing & Contributing to the Open Source

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Optimize Rails App Performance With Rails + Amazon CloudFront

DrupalCon 2020, a diverse digital conference about the Open Web

Sequelize_part II

Make Rock, Paper, and Scissors Game using Python Programming

Deploying .Net Core web API to Heroku using Docker

Encoding/Decoding a Picture in Unity

How to Containerize Your Spark Application to Run on Kubernetes

Disruptive Innovation: Fabricating the Unimaginable

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Avik Kundu

Avik Kundu

Software Engineer @ Red Hat | Learning, Sharing & Contributing to the Open Source

More from Medium

The Why, What And How Of Composable Data And Analytics

The importance of data in communications

A Overview of Hadoop :

How Decision Automation Solves for Big Data Bias