The Big Data Cheat Sheetposted by John Spacey, November 21, 2012
Definition: What is big data?
Big data is a dataset that's so large that it can't be managed with traditional information technology tools.
Big data business processes have fed demand for technologies that are specifically designed to handle large data sets. As a result, big data can also be defined as a technology:
How big is big?
You're probably wondering: how big is big data?
Unfortunately, there's isn't a precise answer to this question. Big data is any dataset that's large enough to push the limits of standard technologies. In other words, you need specialized technologies to tackle big data.
What's considered big data today, won't be considered big data tomorrow.
The world's capacity to capture, store, transport and process data has grown rapidly ever since the early 1960s. Over the same time period, new business drivers have doubled the amount of information the average business stores every 1.2 years.
Yeah, but how big is it?When people speak of big data they're usually taking about data in the petabyte (1 million gigabyte) to exabyte (1 billion gigabyte) range.
In some cases, a terabyte can be considered big data when there's a requirement to process the data in a short time interval. For example, you may need big data technologies to process a terabyte of data in a matter of minutes.
The Laws of Big Data
Big data is often explained in terms of three laws:
Big data is characterized by large volumes of data. The challenge is to reduce large volumes of data to business results. For example, how can a business use social profiles and click-streams to improve online sales.
Big data is often captured from a wide variety of sources and stored in a variety of formats.
For example, let's say a grocery chain wanted to understand shopper behavior to improve sales. They might collect and analyze video, audio, sensor data, user profiles and point of sale transactions.
Organizations have dealt with petabytes of data for a long time. The big difference with big data is data velocity — the speed by which data must be captured, stored or processed.
There's a big difference between maintaining a petabyte of historical data and collecting and processing a petabyte of data in an hour.
Trends: Where is big data going?
Business trends such as social media, internet of things, crowdsourcing, data integration, natural language processing, analytics and visualization are expected to drive the continued growth of big data.
Corporate data is projected to almost double each year for the next decade. High demand for data scientists and data-literate managers is expected over the same time period.
Pitfalls: Common big data myths
1. Big data is a technology
Big data is both a business and a technology problem (and opportunity).
2. There's a silver bullet for big data
Big data isn't a product. It's a series of complex and diverse business problems that are addressed with numerous tools and architectures.
3. Big data is only a concern for large organizations
Big data is quickly becoming an issue for organizations large and small.
4. Relational databases (RDBMS) can't handle big data
Relational databases are a common component of big data solutions. For example, massive parallel-processing (MPP) databases are often built with relational technologies.
5. Big data is all hype
Big data is a relatively new term for one of information technology's oldest trends: the exponential growth of business data. In fact, business data has grown dramatically for the past 40+ years.
As hardware storage capacity has grown and prices have fallen, demand for storage has increased. As business data has grown, so has its value.
A few key points to remember about big data:
Big data is a business and technology term that describes processes and tools for achieving value from large volumes of data.
Big data typically implies datasets of a petabyte or more. However, the term might also be used to describe processing smaller datasets at high speed. For example, processing a terabyte of data in a minute.
Big data is defined by its volume, variety and velocity.
Corporate data is doubling each year* — driving demand for big data technologies.
Strong demand for data scientists and data-literate managers is expected over the coming decade.
Current state blueprints capture business, data and implementation architecture at the conceptual, logical and physical levels.|
Our collection of SOA architecture resources and tools.|
A guide to information security including cheat sheets, best practices and checklists.|
All systems need to be replaced with time. However, just because a system is legacy — doesn't mean it needs to be replaced immediately. |