Big Data for Everyone: The Midmarket is Included

A new catch phrase has been getting a great deal of attention in the information technologies market recently. Let's consider what this phrase refers to, whether it is really something new, the up and coming tools that allow organizations to utilize Big Data concepts, and what midmarket organizations should do, if anything, about it.

What is big data?

The catch phrase "Big Data" refers to the trend to gather and then closely analyze data that is generated by operational systems, transactional systems and even data created by automated systems, such as retail point of sale (POS) systems. How proponents would say this differs from traditional data processing can be summed up by the three "Vs": Volume, Variety and Velocity.

Let's examine each of these separately.

Volume

Big Data proponents would point out that all computer-based systems are creating and storing huge amounts of data. As new systems are added to an organization's IT portfolio, the amount of data being collected and stored grows every day.

Variety

Today's workloads are creating both structured and non-structured data. Both types of data contain useful information, if it can be teased out of the stream of data being generated by workloads and put to use. Data can come from collaborative, transactional, and even retail point of sale systems.

Big Data tools must have the capability to utilize data regardless of format or what workload generated it. Furthermore, these tools must make it easy for business and financial analysts to examine the data, test out how different streams of data may relate to one another, and make it possible for a deeper understanding of what's happening to emerge.

Velocity

Big Data proponents would point out that one of the key differences from previous attempts to understand what the organization is doing and what customers want, such as business intelligence systems (BI), is how quickly the streams of data are being generated.

How is this different from other systems?

One of the biggest differences between Big Data and other information systems is that the questions analysts might pose are not likely to be known beforehand. Traditional systems require a clear understanding of what data is going to be processed, how it is going to be produced, how it is going to be formatted and how that data is going to be reported.

Why is Big Data interesting?

Big Data systems make it possible to sift through huge amounts of data looking for new, previously unknown, relationships among data items or data streams. For example, it might be possible to relate streams of Email and Text messages to the sales and support calls for specific products. Could an increase in communication with customers be related to increased sales, returns or increased calls for support? Unless those thoughts were included in the initial design of a traditional system, it would be hard to find an answer based upon the outputs of that system.

One could imagine that a merchant having several thousand stores, each serving thousands of customers on a daily basis, would be producing an amazingly large amount of shopping cart data. Organizations of this type would love to be able to better understand what is being sold in each store, what products generate the most sales, what products are seen most often in customer shopping carts and the like so effective, real-time marketing and promotional systems can be deployed.

An online merchant might be better able to discover when and why customers abandon shopping carts rather than completing a purchase. They might be better able to discover if specific products or combination of products appear to be related to customers "walking away" before making a purchase. This might make it possible to develop better pricing, promotional or sales strategies. Similar examples can be found that relate to organizations in many markets including: financial services, healthcare, manufacturing, transportation or hospitality.

Companies of all sizes can make use of Big Data because many suppliers are offering Cloud-based Big Data services so companies don't have to acquire new systems, software or hire staff.

Is this something new?

In a single word, the answer is "No." Today's Big Data systems can be seen as an outgrowth of tools that examine how operating systems, database engines, application frameworks, applications and even networking systems are operating. It is fairly clear that end user experience management, application performance management, systems and network management tools that scan through and learn from operational system log files are predecessors to today's Big Data solutions.

What, then, is new?

Although Big Data concepts have been around the industry for decades, the industry has seen the emergence of new and innovative tools that make it more easily possible to collect, integrate and examine data streams of all types and glean new types of understanding than ever before.

For example, the Apache Software Foundation, has quite a number of interrelated open source projects whose goals are to make this whole process easier. Some of the Apache Software Foundation open source projects that have been created to make Big Data easier to use here is a partial list of projects that are related to Big Data:

  • Apache Hadoop — a distributed computing platform making it possible to harness together the power of many computer systems to collect, integrate and analyze massive streams of data. This project also includes the Hadoop Distributed File System (HDFS) that allows systems to share and update data simultaneously. It also includes an implementation of MapReduce that makes it possible to sift through data rapidly and efficiently.
  • Apache HBase — HBase is a tool that works with both Hadoop and HDFS to offer a column-oriented view of Big Data resources. This makes it possible for developers who are already familiar with traditional relational database systems to more easily use Hadoop-based Big Data.
  • Apache Hive — Hive is a tool designed to work with Hadoop to help analysts use highly distributed, Big Data repositories.
  • Apache Lucene — Lucene is a tool that makes search through non-structured text-based resources simpler to build and use

Many of these tools are only beginning to be packaged as commercial products that include simple installation, management tools, example applications, training and support. It would be helpful for midmarket companies to work with partners, such as IBM, HP, MapR Technologies and the like to make use of these tools.

The Promise of Big Data

Big Data offers a number of promises that center on helping organizations move from making important decisions based upon seat-of-the-pants analysis to using systematic, repeatable processes that make actual operational data available to support decisions.

The use of Big Data offers the promise that organizations of all sizes can uncover emerging trends early on. This means being able to better understand customer needs and respond to those need more quickly. It also means being able to quickly learn more about customer decision making and behavior.

Taken together, these promises could lead to being able to avoid being blindsided by rapidly emerging trends, higher revenues and greater customer satisfaction.

---

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.


Kusnetzky Group LLC © 2006-2014