View Sidebar

Archive for category: MapR

Unabated Experimentation is Way Forward in Big Data

Unabated Experimentation is Way Forward in Big Data

Big Data Experimentation

While it is true that analytical modeling is calling for nonstop testing of big data, the equation isn’t that straightforward and holds certain potential challenges.

The need of the hour is active experimentation in the big-data zone to help in-progress analytical model to make precise correlations. But since statistical models have their own risks, their astute application is going to be a must, especially as long as we want the results to be positive.

While a few groups are still hesitant, most full-size organizations have been able to hone their insight to realize that big data calls for incessant experimentation, and are all in support for the alteration. They also know, at the same time, that practical scenario of the booming field of big data involves certain risks associated with statistical models, especially when their implementation is not flawless.

Statistical Modeling –Practicality and Risks

Statistical models are simplified tools employed by data science to recognize and validate all major correlative aspects at work in a particular field. They can, however, make data scientists have a fake sense of validation at times.

And despite fitting the observational data quite rightly, various such models have been found to miss the real major causative factors in action. This is why predictive validity is often missing in the delusion of insight offered by such a model!

What May go Wrong?

Even though the application of a statistical model is practical in business, there is always a need to scrutinize the true, fundamental causative factors.

The lack of confidence may prove to be the biggest risk, particularly when you doubt the relevancy of the standard (past) correlations constituting your statistical model in near future. And obviously, predictive model of product demand and customer response in a particular zone which you have low confidence in will never be able to pull in huge investments during a product launch!

What is the Scope?

Even though there are certain risks involved, statistical modeling can never be completely dead. To be able to detect causative factors more quickly and effectively, statistical modeling will need to be based on real-world experimentation. This innovative approach that employs a boundless series of real-world experiments will be highly helpful in making big data business model and economy more authentic and reliable.

So How’s Real-world Experimentation Going to Be Possible? 

Exactly the way data scientists have developed advanced operational functions for ceaseless experimentation, big organizations look forward to encouraging their expert business executives to lead the charge in terms of running nonstop experiments and for better output. And to add to their convenience, the big data revolution has already offered in-database platforms for proper execution of a model and economical yet high-output computing power to make real-world experimentation feasible everywhere including scientific and business domains.

The basic idea is to prefer spending time, capital and other resources to conduct more low-risk experiments to putting extra efforts building the same models back and back again!

Monetizing Big Data: What 2014 Might Have in Store

Monetizing Big Data: What 2014 Might Have in Store

Once we are able to invest in the big data technology after successfully analyzing it, the next move will be to monetize it to obtain its monetary equivalent. To know what is the scope of big data monetization on 2014 and beyond, read on!

‘Big Data’ is already a familiar term for most of us, especially those who are into some serious business. It has been a hot topic in the media almost throughout the year 2013.

Big Data - Return on Investment - What 2014 Has in Store

All small and big businesses, however, are still trying to augment their knowledge about what actually big data is and what they should be doing about it and how. And what seems to be adding to the complications are the challenges involved in the process of big data investment.

Majorly, businesses don’t know how to obtain value from data and have to go a long way to be able to define the much-awaited big data policy. Even more importantly, they’ll have to attain the required skills and then execute them in a nifty manner to make the most of the strategies they’re working on!

Big Data – Future and Monetary Equivalent

While we are already in the first phase of the grand big data revolution where we’ve seen big investments in the technology, the next important step would be to generate revenue through big data.

Having a lot in reserve, the year 2014 is ready to play an important role in this regard:

Revenue Generation

Though businesses are all for huge investments in big data, they still need to predict how quickly it can generate revenue. The need of an effective way to measure ROI over a specific period of time may prove to be one of the potential challenges!

But despite all these assessments, most business leaders are expecting big data to be highly helpful in making the right business decisions. However, they believe that it won’t be possible to predict time and money associated with a ROI target without a guiding hand. This may cause giant businesses to opt for big data-based solutions rather than directly using big data as the only solution in 2014. The ultimate goal would be to boost up overall revenue by saving on costly technologies and data consultants.

Big Data as a Marketing Investment 

While it is true that big data has been more of a technology investment till now, we’ll see it as a marketing investment in 2014 and further, and retail brands will lead the charge in that case.

The key will be to persuade people to ‘buy’ by making all the offers directly customer-oriented. Big companies have already begun to prepare for the shift by motivating their CMOs, technology officers and information executives to work in unison to derive the best results.

Utilization of Big Data-based Solutions

With big data-based solutions surfacing quickly, all businesses will have to go for data analytics sooner or later. Though Google analytics have already been used for the same purpose for years, the latest big data-based solutions will allow all small and big companies to access solutions and methods that can ‘practically improve revenue.’ Hopefully, the year 2014 will be big for both those starting-up and well-established businesses in terms of using big data to get the best results!

Be Smart With Big Data

Be Smart With Big Data

smart dataSome companies get scared of big data. They think that since data is inherently dumb, a lot of it would be dumber still. But by being smart about big data, analysts can make sure that they get the most out of it. Handling big data can be a security risk and needs to be handled smartly.

The Present Way of Doing Things

Usually companies have one of three ways to handle data. They either go with the Heroic Model in which individuals take charge of requests and make decisions on their own without consulting with others. This model can work well for small businesses where individuals are usually aware of most situations across all areas of the business. But in bigger businesses, it can lead to confusion and chaos.

The Culture of Discipline on the other hand is one where individuals don’t make any decisions and follow a set of rules set by the management. Employees in this model can’t use data for their own decision making and just have to follow the processes set up for them.

The best way to handle data is to have a Data Smart Model in which data is managed on an evidence based management system. It is a combination of the first two methods and it works on a disciplined processing method but decision making is allowed at the individual level. This is the method that should be used to handle big data and it can result in smooth operation without much hassles.

How to Cultivate the Data Smart Culture

Certain steps need to be taken to create the data smart culture.

  • There should be a single source of truth. Decision making can be moved to the employee level but the guiding principles should be set from a single source.
  • Use ways to keep track of progress. Using a scorecard system, even on a daily basis, can help managers across different branches know how they are performing in relation to the other departments and they can then send in better data to record their progress.
  • Rules are important but there should be enough flexibility. Rules and guiding principles are needed but there should be flexibility to know when to bend the rules and when to break them. Sometimes what works in most parts of the country might not be best for a certain area. Businesses need to be able to adapt to such situations and change their rules accordingly.
  • Work on cultivating human resources. The people are the biggest asset of a company and it is important to educate them and provide them with the proper know-how to handle data. Managers need to be trained to educate the people working under them and give them a one to one engagement.

These steps can help businesses handle big data smartly and without much confusion. Every level needs to be trained to handle big data as the future is going to be all about big data.

Is Big Data a Threat to Your Privacy?

Is Big Data a Threat to Your Privacy?

Big Data is growing bigger every day and along with it the concern over invasion of privacy is also growing. Tracking all the data generated by your mobile and other devices and your interactions on social media, is beneficial for advertisers to tailor their ads to suit you. But there’s more to the story than that. Companies have now begun to come up with very creative ways to use real time data.

Let’s look at some interesting examples.

Smart Rubbish Bins in London

An advertising firm in London came up with the idea to use strategically placed dustbins to track the wifi signal of phones of the people passing by. They could use the serial number of the phones to track the movement of every individual. They could then use this data to show advertisements on the screen of these bins, that are targeted at the person passing by.

smartbins

Now even dustbins are becoming smart!

The officials have asked Renew, the responsible ad firm, to take down the smart dustbins as there has been a lot of concern about the invasion of privacy of the people.

Police Cars in Australia get Number Plate Recognition Cameras

The Aussies have come up with another great use of Big Data by using number plate recognition cameras that can read multiple number plates simultaneously and also search their database to find out all the information about that driver. They can tell if a car is stolen or if you have unpaid parking tickets just by looking at your car’s number plate.

police car

The hand of the law gets longer.

Are Such Examples a Threat to Your Privacy?

When CCTV cameras first came on the scene, the public responded to them with an outrage similar to what we see now in terms of Big Data. But once people got used to the new technology and saw the benefits in solving crimes and catching miscreants swiftly, the fears of Big Brother always watching them subsided.

The truth is that people will allow collection of any data as long as it is collected with their permission and it is used to create value for them. Instead of shoving ads in people’s faces, companies should try to find other ways to use Big Data, not only to reduce costs for the company but also to provide quality to the customer.

One great example to highlight the creative use of Big Data is the potential for insurance companies. Today all natural or man made calamities generate a lot of data in the social media.

data

Data about Hurricane Sandy

Insurance companies can use this data along with before and after images on Google Maps Street View, Flickr, Instagram etc. to find out how much destruction of property their clients have suffered.

torn houseThey can estimate the number and amount of claims that they will have to deal with. They can provide quick claim settlements to their customers which will be appreciated by all and people will readily agree to data collection if they are told of such rewards.

Great Opportunities

A Westpac survey showed that it only took 30 months for mobile usage to reach 1 Million as compared to 80 months it took for online usage to reach the 1 Million mark.

graphThis means that there are great opportunities available to use this rapidly growing Big Data but it will have to be done with care and while keeping the interests of the consumer in mind.

All Are Valuable Members of Hadoop Community says Cloudera CEO

All Are Valuable Members of Hadoop Community says Cloudera CEO

Within three months of his taking over the leadership of the company, CEO Cloudera Tom Reilly has already visualized where the company is headed.

Cloudera CEO Tom ReillyAccording to him, one needs to have a strong and far-sighted vision for the company if it is to compete against the likes of Hortonworks and MapR for a share of the pie in the highly evolving Hadoop market.

Despite the tough competition, Reilly remains a well wisher for his rivals, whom he views as valuable members of the Hardoop community. His message to his employees is also the same – consider all your competitors as valuable contributors for the success of the community.

Interestingly, Reilly credits Hortonworks, one of the fellow startups and rivals, for driving the development of YARN, which has provided the much needed impetus to every major player in Hardoop.

He also affirmed that the real competition his company faces is from information giants such as Pivotal and IBM, and not other start up rivals.

Cloudera’s CEO was a little shy in sharing details about the change of focus of his company. He has kept the details safe for a public announcement during the Hadoop World conference to be held next week.

Nevertheless, industry watchers estimate that Reilly’s plans for Cloudera are bigger than before. He doesn’t want the company to become just another Hadoop distribution company. With an ever growing list of features and over 700 partners, he aims to make it a data giant that delivers real value to enterprises.

When confronted on the question of Hortonworks luring away Spotify from Cloudera, Reilly has altogether a different take. He confesses that the development surely hurt from a public relations perspective, but it’s not something that will pull the company’s shoulders down.

All Are Valuable Members of Hadoop Community says Cloudera CEO

He explained that Spotify wanted a comprehensive enterprise support and was no longer interested in making use of the free version of Cloudera’s software. Along with Hortonworks, his company too listed a price for the deal. However, Hortonworks managed to put up a slightly better contract and Cloudera didn’t try to match it intentionally, claiming it didn’t make good business sense for them.

Reilly appears too outlandish when he states that the deal didn’t matter much to them and Spotify only managed to earn a low priced vendor with this contract. But deep within, Reilly knows that in order to make his company ride towards profitability, he needs to turn out better than his competitors.

Experts believe that even though Cloudera has a lot on its platter, the 800 pound Hardoop startup can’t distance itself from the present competition.

Unless the company takes a big leap to stand in line with information giants, it will have to live with the image of a Hardoop startup.

Data Management & Analysis at LinkedIn

Data Management & Analysis at LinkedIn

data management enables LinkedIn to provide hiring solutions, marketing solutions and networking opportunities to its members

Data management enables LinkedIn to provide hiring solutions, marketing solutions and networking opportunities to its members

LinkedIn is the world’s largest professional network. It has over 187 million members from over 200 countries. Its members include everyone from freelancers to CEOs of Fortune 500 companies. The company started out from California Mountain View in 2003 with the mission to connect the world’s professionals and it surely has achieved that in the last 10 years.

Today LinkedIn earns $252 million in revenues every year and employs over 3200 people worldwide. It has become the go to resource for HR executives whenever they have to look for someone to fill up a position. The profiles of members are their online resume that every employer can see and access. It also provides opportunities for people to connect with the right persons to take their career ahead.

All this is possible because of data collection and management. All information provided by a member in their profile is collected, analyzed and sorted so that whenever anyone wants to access it, they can do so quickly and effortlessly. This data management enables LinkedIn to provide hiring solutions, marketing solutions and networking opportunities to its members.

Not only is the data invaluable to employers but individuals too can use it to search for talent matches, similar jobs, interesting events and networking opportunities. This huge amount of data also allows LinkedIn to customize products through out the world.

LinkedIn uses data scientists to analyze the data collected by it so that they can rapidly make some sense out of it and use it to recognize opportunities and take advantage of these opportunities. These data scientists are usually qualified to analyze data and statistics and also need business skills and knowledge to make sense of this data.

LinkedIn’s success can be attributed to its decision to develop its own data management application. The company used market solutions and customized them for their own particular use to collect, sort and analyze data. It stores data online using Oracle and Expresso. It uses services such as Voldemort, Zoie, Bobo, Sensei, D-Graph, Kafka and Databus. The offline data store uses Hadoop for machine learning, ranking & relevance, Teradata etc. It also uses MapReduce Analytics, Clickstream for A/B site testing etc.

Corporations and business use LinkedIn to search for people to fill up key positions and people with social influence to test new products. Analyzing viral marketing results and recommendation engine optimization are two other services that LinkedIn offers to businesses. It helps create specialized marketing services for different businesses.

LinkedIn’s value creation is based on this data management and making their analysis of the data to key players in a short amount of time. As long as they have the ability to analyze and manage all this data, they’ll continue to grow and market new and customized products. In order to maintain their edge LinkedIn needs to find ways to handle this ever growing stream of data and also improve the quality of their data analysis.

7 Big Names in the Big Data World

7 Big Names in the Big Data World

The big data world is not only a territory accessible to big and well established database and data warehouse companies today. The pure-play big data startups too are emerging as innovative thinkers, creative and technically sound enough to create a buzz in the marketplace.

Anyhow, in this post, we’re going to talk about the big shots in the game.

Big Names in the Big Data Industry -2013

Here’s the list of 7 BIG Names in the Big Data World:

IBM

The biggest Big Data Vendor as per 2012 revenue figures, IBM raised about $1.3 billion from the Big Data related services and products, according to the reports submitted by Wikibon. The product range of IBM includes a warehouse that has its own unique built-in data mining and cubing capacity.  Also, its PureData systems include packaged analytic integration feature.

Best known products of IBM include DB2, its unique warehouse-InfoSphere and Informix database platforms, SPSS statistical software, designed to support real time predictive analysis and Cognos Business Intelligence application with its big data platform capabilities.

Oracle

Famous for its flagship database, Oracle is amongst the big players in the Big Data space. The total revenue generated by Oracle in 2012 was approximately $415 million, making it the fifth biggest Big Data vendor for the year. The Big Data Application of Oracle combines with Intel Server, Oracle’s NoSQL database and with Cloudera’s Hadoop distribution.

Oracle has a wide range of tools to compliment with its Big Data Platform known as Oracle Exadata. These tools include the Advanced Analytics via the R Programming language, along with the in-memory database option with Oracle’s Exalytics in memory machine and data warehouse of Oracle.

Splunk

Specializing in machine data analysis, Splunk had the biggest market share of all the Data vendors in 2012, with the total revenue of about $186 million, according to the Wikibon report.

Google

Google effortlessly made its place amongst the top 7 names included in the Big Data world. The Big Data offering of Google includes its BigQuery that is a cloud based Bid Data analytics platform. The Big Data related revenues generate by Google in 2012 were about $36 million, as per the Wikibon report.

10Gen

10Gen is best known for its leading NoSQL database, the open source MongoDB that is distinguished as the prime document oriented database. The MongoDB can handle semi structured information that is encoded in Java Script Object Notation (JSON), XML format. What makes it different is its ease of use, speed and its flexibility.

The list of 10Gen’s strategic investors includes Intel, In-Q-Tel and Red Hat. 10Gen was ranked third amongst the only Hadoop and NoSQL vendors last year. 10Gen generated about $36 million revenue in the year of 2012 according to the Wikibon report.

Hortonworks

Another big name in the Bid Data World is Hortonworks. A Hadoop vendor, Hortonworks received over $70 million venture capital investment after spinning off form yahoo in 2011. Hortonworks has its own certification courses with a legion of developers within its virtual box.

Hortonworks is going up exponentially against Cloudera and is known for its partnerships with Rackspace, Microsoft, Red Hat and other companies.

MapR

Best known for its NoSQL database M7, MapR works with Google Compute Engine and Amazon’s Cloud Platform. MapR was ranked fourth by the Wikibon report, amongst the Hadoop and NoSQL only vendors list last year. According to Wikibon, the total revenue generated by MapR in the year 2012 was about $23 million.

Hadoop – A Brief Introduction

Hadoop – A Brief Introduction

Earlier, it used to be a tough job to store enormous data sets on distributed server clusters. With technological advancements that poured in over the last two decades, however, it has become feasible to both store and analyze big chunks of data without having to shell out hefty budgets.

What is Hadoop and How does it work

One of the amazing techniques that enable easy storage of massive data sets and helps run distributed analysis applications in each cluster unit is known as Hadoop. It IS a big deal in big data and many experts recognize it as a major force.

Let’s get down to the basics.

What is Hadoop?

Basically, Hadoop is an open source software platform. It was introduced by the Apache Software Foundation. It is a simple yet effective technological solution that turned out to be highly useful in managing huge data, a mixture of structured and complex data in particular, quite efficiently and cheaply.

Hadoop has been specially designed to be strong enough to help big data applications run smoothly despite the failure of individual servers. This software platform is highly efficient and does not require applications to transport big data volumes across the network.

How does it Work?

Hadoop software library can be described as a framework which uses simple programming models to facilitate the distributed processing of huge data sets through clusters of computers. The library is not dependent on hardware for high-availability because it can find out and handle failure in the application layer itself. In a way, it delivers readily available services on top of a server of computers that are prone to failure.

Since Hadoop is fully modular, it allows you to swap out nearly all its machineries for a totally different software tool. The architecture is stout, flexible and efficient.

What are Hadoop Distributed File Systems? 

A distributed file-system for the storage of data and a data processing framework are two main parts of Hadoop. These two components play the most important role.

Technically, the distributed file-system is a compilation of storage clusters holding the actual data. Although Hadoop can use different file systems, it prefers to use Hadoop Distributed File Systems (which are cleverly named) for security reasons. Once placed in HDFS, your data stays right there until some operations are required to be performed on it. You can run an analysis on your data or export it to another tool right there within Hadoop.

Hadoop – Data Processing Framework

MapReduce is the default name of the java-based system that works as the data processing framework. We hear more about MapReduce as compared to HDFS  because it is the very tool that actually processes data and is a wonderful platform to work with.

Unlike a regular database, Hadoop does NOT involve queries, SQL (structured query language) or otherwise. Instead, it simply stores data that can be pulled out of it when required. It is a data warehousing system that simply needs a mechanism such as MapReduce for data processing.