Home
Business Guide
 
it management   »  it manager's tool kit   »  technology cheat sheets   »  big data

The Big Data Cheat Sheet

        posted by , November 21, 2012

Definition: What is big data?



Big data is a dataset that's so large that it can't be managed with traditional information technology tools.

Business Definition

The term big data describes business processes that capture, process, store, search, share, analyze or visualize large datasets.

Big data business processes have fed demand for technologies that are specifically designed to handle large data sets. As a result, big data can also be defined as a technology:

Technology Definition

The term big data describes technologies built to efficiently process large, complex datasets.


How big is big?



You're probably wondering: how big is big data?

Unfortunately, there's isn't a precise answer to this question. Big data is any dataset that's large enough to push the limits of standard technologies. In other words, you need specialized technologies to tackle big data.

What's considered big data today, won't be considered big data tomorrow.

The world's capacity to capture, store, transport and process data has grown rapidly ever since the early 1960s. Over the same time period, new business drivers have doubled the amount of information the average business stores every 1.2 years.

Yeah, but how big is it?

When people speak of big data they're usually taking about data in the petabyte (1 million gigabyte) to exabyte (1 billion gigabyte) range.

In some cases, a terabyte can be considered big data when there's a requirement to process the data in a short time interval. For example, you may need big data technologies to process a terabyte of data in a matter of minutes.

The Laws of Big Data



Big data is often explained in terms of three laws:

1. Volume
The more data the better. Large volumes of data can produce business results.

Big data is characterized by large volumes of data. The challenge is to reduce large volumes of data to business results. For example, how can a business use social profiles and click-streams to improve online sales.

2. Variety
Data diversity improves big data's business value.

Big data is often captured from a wide variety of sources and stored in a variety of formats.

For example, let's say a grocery chain wanted to understand shopper behavior to improve sales. They might collect and analyze video, audio, sensor data, user profiles and point of sale transactions.

3. Velocity
The faster big data can be processed, the greater its business value.

Organizations have dealt with petabytes of data for a long time. The big difference with big data is data velocity — the speed by which data must be captured, stored or processed.

There's a big difference between maintaining a petabyte of historical data and collecting and processing a petabyte of data in an hour.

Trends: Where is big data going?



Business trends such as social media, internet of things, crowdsourcing, data integration, natural language processing, analytics and visualization are expected to drive the continued growth of big data.

Corporate data is projected to almost double each year for the next decade. High demand for data scientists and data-literate managers is expected over the same time period.

Why is everyone so excited about big data?

It isn't unusual for large and mid-sized organizations to handle petabytes of data. In other words, big data is a common business problem.

Big data related IT spending is estimated at around $100 billion globally with an annual growth rate of over 9% a year.

Real world examples of big data include:

Google's search index increased from 11 billion to 50 billion pages between 2008 and 2012.

Facebook has more than 1 billion active users

Walmart often processes more than 1 million transactions an hour representing over 2 petabytes of data.

Twitter handles 400 million tweets a day.

An estimated 100 trillion emails are sent each year. Thats 14,285 emails for every human on the planet.



Pitfalls: Common big data myths



1. Big data is a technology
Big data is both a business and a technology problem (and opportunity).

2. There's a silver bullet for big data
Big data isn't a product. It's a series of complex and diverse business problems that are addressed with numerous tools and architectures.

3. Big data is only a concern for large organizations
Big data is quickly becoming an issue for organizations large and small.

4. Relational databases (RDBMS) can't handle big data
Relational databases are a common component of big data solutions. For example, massive parallel-processing (MPP) databases are often built with relational technologies.

5. Big data is all hype
Big data is a relatively new term for one of information technology's oldest trends: the exponential growth of business data. In fact, business data has grown dramatically for the past 40+ years.

big data

As hardware storage capacity has grown and prices have fallen, demand for storage has increased. As business data has grown, so has its value.

Quick Summary



A few key points to remember about big data:

Big data is a business and technology term that describes processes and tools for achieving value from large volumes of data.


Big data typically implies datasets of a petabyte or more. However, the term might also be used to describe processing smaller datasets at high speed. For example, processing a terabyte of data in a minute.


Big data is defined by its volume, variety and velocity.


Corporate data is doubling each year* — driving demand for big data technologies.


Strong demand for data scientists and data-literate managers is expected over the coming decade.




3 Shares Google Twitter Facebook



Related Articles



Enterprise Architecture
How to architect an organization.




Current state blueprints capture business, data and implementation architecture at the conceptual, logical and physical levels.

Our collection of SOA architecture resources and tools.

A guide to information security including cheat sheets, best practices and checklists.

All systems need to be replaced with time. However, just because a system is legacy — doesn't mean it needs to be replaced immediately.


Recently on Simplicable


Multifactor Authentication Explained

posted by Anna Mar
How to confirm the identity of users and entities.

Security Principles

posted by Anna Mar
The maxims of security.

Physical Security Explained

posted by Anna Mar
Physical security is real world security. The type of security that existed long before the information revolution.

Canary Trap Explained

posted by Anna Mar
A digital signature embedded in information that can be tied to a source such as an individual or an IP address.

Sitemap













about     contact     sitemap     privacy     terms of service     copyright