Big Data is a Big Deal preso

Last March 2012 I was in Tel-Aviv to present Oracle’s strategy on Big Data in a Techeads event. I had only 15 minutes! It was a challenge to speak for such a vast audience in Israel but the feedback was good and I’ve managed to shake the waters enough so people look at it from a different perspective.

Here’s the preso I used:

“Big Data is a Big Deal” – Tel-Aviv Israel, 6th March 2012, Techeads Multi-brand Event (Avenue, Airport City)

LMC

I’ve touched a Big Data Appliance… and I’m still the same

This week I’m in Reading, UK to be hadoopized by Cloudera and all of the sudden: wham! bam! The Oracle Big Data Appliance (BDA) arrives to the Oracle Solution Center! Only due to Mr. Bayliss courtesy I’ve managed to go inside the data center with other colleagues and see/touch the latest and greatest Engineered System by Oracle. It’s 36 rack units worth of raw computing power with 216 cores of Intel 5675′s and 648 TeraBytes of capacity. Each node with 48GB of RAM and all the nodes connected with infiniband networking technology, both inside the cluster and for the outside. Meaning it can easily connect to an Exadata cluster and maintain the high throughput numbers.

Nevertheless after touching it and photographing it I’ve managed to get myself together :-) . The magic is not in the hardware itself of course, but on what runs inside! And that is Cloudera’s Distribution of Hadoop version 3 (CDH3), plus Oracle’s Big Data connectors and eventually the Oracle NoSQL database. I say eventually because it’s not a mandatory component, whereas the connectors are.

Even more magic is what you can do with CDH3. We’ve been toying around with it and until now I have to say that I totally subscribe Cloudera’s vision that the biggest elephant in the room is not the yellow one; but the lack of manpower to master MapReduce programming. I’ll drill down into this plethora of technology that looks like a Zoo, or a Pokemon mise en scene in the near future in this blog. Stay tuned.

In the following days I hope I can accomplish a bit more than just count words with CDH3! Although I still don’t know what will I do with the invaluable information and insight that came from knowing that Shakespeare mentions in his entire literature, the word “Oracle” 27 times. Welcome to Big Data.

Here are some snaps:

The expression Big Data materialised onto a rack looks sci-fi, but it’s not!

What might look sci-fi is a yellow elephant and what is he doing in my tennis shoe?

David Segleau explains how Berkeley DB is ubiquitous

Last September 2011 I flew to London to be a part of a workshop ran by Oracle Product Management, and amongst them was David Segleau, Director of Product Management. His new baby was the Oracle NoSQL Database. What stroke me was that it is based on a very powerful technology that Oracle got through the Sleepycat acquisition called: Berkeley DB.

Here’s a step back into time to watch a very important video where the very same David Segleau explains how Berkeley DB is ubiquitous!

LMC

Big Data, Blink Decisions

This article was originally published in an internal Oracle EMEA newsletter but since it does not discloses any sensible information, I thought I could re-publish it to the general audience of this blog. Enjoy.

When in 2006 Malcolm Gladwell introduced the concept of “Rapid Cognition” through his book called “Blink”, people started to take this “glimpse” of thought or first impressions more seriously. This “flurry of thought and images and preconceptions” as Gladwell calls it, make up an unconscious process that happens in the blink of an eye. It’s about the very quick decisions we make, based on lots of different information, sometimes completely unrelated and in an instant: bang! We make a decision, an assumption, a judgement.

Gladwell then adds that “we are innately suspicious of this kind of rapid cognition. We live in a world that assumes that the quality of a decision is directly related to the time and effort that went into making it”. This same world that tends to value the outcome of slow thinking is the same world that is in desperate need of the exact opposite in the information management arena.

The mere classical definition of a Decision Support System (DSS) outlines a slow process, in business terms. The problem with this approach is that the speed at which businesses need to take decisions has grown in a proportional manner when compared with the amount of information needed to be taken into account in the decision making process. Not just the amount but the variety of information. This is also true because businesses have changed, and the challenges of manufacturing are different from those faced by the real survivors of the dotcom era. In a wine producing industry, the impact of new products have to be assessed in a completely different way, as shall we say, the digital businesses. But at the same time classical businesses like pharmaceutical companies, still need to crunch lots of data, in order to assess the potential correlations between meds. So everything points out to a new world of fast decisions, based on a disproportional amount of information, when compared with the speed at with these decisions need to be taken, assessed, corrected and assessed again. A world where businesses mimic the Rapid Cognition in order to be there: at the finger tips of the customer. This is the world that created the Big Data concept.

Big Data means loads of unrelated, unstructured (not necessarily media), non-transactional data that needs to be crunched and transformed into information. From this information it should be possible to withdraw behaviour or value, and visualise patterns.

Why is this not the “normal” DSS chain we all know? First because the source is not transactional data and second because structured data needs a data model, whereas here it’s the data interpretation that sets the model.

The computational challenge is the same though:

  1. Intensive Load
  2. Fast Transformation
  3. Near real time analysis

But with Big Data there’s a new one tough: visualise. For each phase you have a new or reborn challenge. Challenges like sessionise (new challenge and verb); data mining (reborn) or visualisation techniques (new and some reborn, like statistical languages) are coming into one single bag called Big Data. Examples of Big Data might be near real time processing of sensor information (Oracle-BMW sailing boat has 250 sensors that take into consideration more than 40,000 variables per second!); log processing systems (network tools, website analysis); image analysis; scientific research that cross various fields of knowledge; the list goes on and on reaching out to all the areas where information that needs “crunching” is popping up like pop-corn!

Once I had a chat with a network admin about the tracing capabilities of network management tools and he said: “I could pop off the lid and start tracing my network to troubleshoot problems, but I can only do it for a small period. Why? Because I don’t have storage space and these sniffing/tracing tools dump “A LOT” of data out”. And then he added: “Even if I had the ability to keep all this data, say, a whole day’s worth of tracing the network, how on earth would I even get the intelligence and processing power to make some sense out of it? Or even to take conclusions on what’s happening, let alone visualise it!”

Should I say to my network admin fellow friend that he could do it if he had a Big Data System in place? Well, I guess he would like to have such system, but in the end you all know what network admins say after tracing and analysing: “Nope! Nothing’s wrong down here at the network level. You should probably talk with the DBAs or app’s guys”.

 

LMC