View Sidebar

Post Tagged with: Big Data Basics

7 Big Names in the Big Data World

7 Big Names in the Big Data World

The big data world is not only a territory accessible to big and well established database and data warehouse companies today. The pure-play big data startups too are emerging as innovative thinkers, creative and technically sound enough to create a buzz in the marketplace.

Anyhow, in this post, we’re going to talk about the big shots in the game.

Big Names in the Big Data Industry -2013

Here’s the list of 7 BIG Names in the Big Data World:


The biggest Big Data Vendor as per 2012 revenue figures, IBM raised about $1.3 billion from the Big Data related services and products, according to the reports submitted by Wikibon. The product range of IBM includes a warehouse that has its own unique built-in data mining and cubing capacity.  Also, its PureData systems include packaged analytic integration feature.

Best known products of IBM include DB2, its unique warehouse-InfoSphere and Informix database platforms, SPSS statistical software, designed to support real time predictive analysis and Cognos Business Intelligence application with its big data platform capabilities.


Famous for its flagship database, Oracle is amongst the big players in the Big Data space. The total revenue generated by Oracle in 2012 was approximately $415 million, making it the fifth biggest Big Data vendor for the year. The Big Data Application of Oracle combines with Intel Server, Oracle’s NoSQL database and with Cloudera’s Hadoop distribution.

Oracle has a wide range of tools to compliment with its Big Data Platform known as Oracle Exadata. These tools include the Advanced Analytics via the R Programming language, along with the in-memory database option with Oracle’s Exalytics in memory machine and data warehouse of Oracle.


Specializing in machine data analysis, Splunk had the biggest market share of all the Data vendors in 2012, with the total revenue of about $186 million, according to the Wikibon report.


Google effortlessly made its place amongst the top 7 names included in the Big Data world. The Big Data offering of Google includes its BigQuery that is a cloud based Bid Data analytics platform. The Big Data related revenues generate by Google in 2012 were about $36 million, as per the Wikibon report.


10Gen is best known for its leading NoSQL database, the open source MongoDB that is distinguished as the prime document oriented database. The MongoDB can handle semi structured information that is encoded in Java Script Object Notation (JSON), XML format. What makes it different is its ease of use, speed and its flexibility.

The list of 10Gen’s strategic investors includes Intel, In-Q-Tel and Red Hat. 10Gen was ranked third amongst the only Hadoop and NoSQL vendors last year. 10Gen generated about $36 million revenue in the year of 2012 according to the Wikibon report.


Another big name in the Bid Data World is Hortonworks. A Hadoop vendor, Hortonworks received over $70 million venture capital investment after spinning off form yahoo in 2011. Hortonworks has its own certification courses with a legion of developers within its virtual box.

Hortonworks is going up exponentially against Cloudera and is known for its partnerships with Rackspace, Microsoft, Red Hat and other companies.


Best known for its NoSQL database M7, MapR works with Google Compute Engine and Amazon’s Cloud Platform. MapR was ranked fourth by the Wikibon report, amongst the Hadoop and NoSQL only vendors list last year. According to Wikibon, the total revenue generated by MapR in the year 2012 was about $23 million.

Hadoop – A Brief Introduction

Hadoop – A Brief Introduction

Earlier, it used to be a tough job to store enormous data sets on distributed server clusters. With technological advancements that poured in over the last two decades, however, it has become feasible to both store and analyze big chunks of data without having to shell out hefty budgets.

What is Hadoop and How does it work

One of the amazing techniques that enable easy storage of massive data sets and helps run distributed analysis applications in each cluster unit is known as Hadoop. It IS a big deal in big data and many experts recognize it as a major force.

Let’s get down to the basics.

What is Hadoop?

Basically, Hadoop is an open source software platform. It was introduced by the Apache Software Foundation. It is a simple yet effective technological solution that turned out to be highly useful in managing huge data, a mixture of structured and complex data in particular, quite efficiently and cheaply.

Hadoop has been specially designed to be strong enough to help big data applications run smoothly despite the failure of individual servers. This software platform is highly efficient and does not require applications to transport big data volumes across the network.

How does it Work?

Hadoop software library can be described as a framework which uses simple programming models to facilitate the distributed processing of huge data sets through clusters of computers. The library is not dependent on hardware for high-availability because it can find out and handle failure in the application layer itself. In a way, it delivers readily available services on top of a server of computers that are prone to failure.

Since Hadoop is fully modular, it allows you to swap out nearly all its machineries for a totally different software tool. The architecture is stout, flexible and efficient.

What are Hadoop Distributed File Systems? 

A distributed file-system for the storage of data and a data processing framework are two main parts of Hadoop. These two components play the most important role.

Technically, the distributed file-system is a compilation of storage clusters holding the actual data. Although Hadoop can use different file systems, it prefers to use Hadoop Distributed File Systems (which are cleverly named) for security reasons. Once placed in HDFS, your data stays right there until some operations are required to be performed on it. You can run an analysis on your data or export it to another tool right there within Hadoop.

Hadoop – Data Processing Framework

MapReduce is the default name of the java-based system that works as the data processing framework. We hear more about MapReduce as compared to HDFS  because it is the very tool that actually processes data and is a wonderful platform to work with.

Unlike a regular database, Hadoop does NOT involve queries, SQL (structured query language) or otherwise. Instead, it simply stores data that can be pulled out of it when required. It is a data warehousing system that simply needs a mechanism such as MapReduce for data processing.