Big Data and Hadoop

 · 1 min read

What is Big Data

Big data is an assortment of such huge and complex data that becomes very difficult to capture, store, process, retrieve and analyze it.

The 5 V’s properties of Big data

  1. Velocity Speed at which data is emanating and changes are occurring between the diverse data sets

  2. Volume This refers to the sheer volume of data being generated every second

  3. Variety Can use structured as well as unstructured data

  4. Veracity Data reliable and trust. Verifying and validating the data

  5. Value Having access to big data is all well and good but that’s only useful if we can turn it into a value

Challenges in Big Data

  • Dealing with data growth
  • Analyzing in timely manner
  • Integrating desperate data sources
  • Validating Data
  • Securing Data

What is Hadoop

Apache Hadoop is an open-source software platform for distributed storage and distributed processing of very large data sets. Clusters built from commodity hardware.

When to use Hadoop?

  • Data Size and Data Diversity
  • Multiple Frameworks for Big Data
  • Lifetime Data Availability
  • Batch

When NOT to use Hadoop?

  • Real Time Analytics
  • Not a Replacement for Existing Infrastructure
  • Multiple Smaller Datasets

Hadoop Ecosystem

Architecture of Hadoop

HDFS Concept

Limitations of Classic Hadoop

YARN Concepts