There is an unspoken rule that startups do not combine open source technology with commercial technology according to Dan Woods, writing for Forbes. He explains that for startups dependent on Big Data this rule may have to be broken as suitable open source applications are not yet available to work with MapReduce and Hadoop.
So what does MapReduce do?
A simple example of how it is used is given by Jen Cohen Crompton in a SAP blog:
A consumer retail brand is looking to identify the most frequently purchased products (the top three) from a cross-section of customers as part of a market research initiative focused on merchandising. Let’s say they are looking for data on women within a specific geographic area, which is information provided in each customer profile stored in their CRM database. There might be 2,000 women meeting the identified qualifications and therefore, this big data set needs to be sorted.
The input data for this query would be the profiles of the individual customers within the specifications. After the query is created and sent, the mapping function would sort through the profiles, then identify and send the most frequently purchased products to the reducer. The reducer would compare and aggregate the data generated from each map task and return an output file featuring the top three most frequently purchased products from the cross-section.
The MapReduce process is key in sorting through the big data that might be available when submitting a query. The goal is to create the most accurate output in the shortest amount of time.
Rafael Coss of IBM gives a similar explanation in his YouTube video:
And what does Hadoop do?
The official explanation is available here.
A good explanation is given by Jen Cohen Crompton here
Overall, Hadoop enables applications to work with huge amounts of data stored on various servers. Hadoop’s functions allow the existing data to be pulled from various places (since now, data is not centralized, but distributed in places using cloud technology) and use the MapReduce technology to push the query code and run a proper analysis, therefore returning the desired results.
As for the more specific functions, Hadoop has a large-scale file system (Hadoop Distributed File System or HDFS), it can write programs, it manages the distribution of programs, then accepts the results, and generates a data result set.
Hadoop’s main shortcoming is that it is a batch type system and not useful for real time processing which is necessary for real time analytics if business value is to be extracted from Big Data. Hadoop development requires advanced expertise and MapReduce is a complex form of programming. There are projects underway that will in the fullness of time provide open source applications to deal with these problems. In the meantime SAP HANA is available to startups where time to market is crucial but it is a licensed solution. According to SAP their solution will
Run your business in real real time. SAP and partner solutions powered by SAP HANA can help you dramatically accelerate analytics, business processes, predictive analysis, and sentiment data processing – all on a single in-memory computing platform.
SAP is doing its best to answer the challenges presented to it by the arrival of the cloud and Big Data. Most of the big companies with whom they have long established relationships want their data kept safely in SAP’s data storage centres and some have more data than a few of the biggest cloud users. SAP has started running existing corporate applications on HANA for these customers in its own data centres. This is giving their staff direct contact with office workers for the first time which should improve usability and their profile.
New cloud application companies appeal directly to office workers by giving them free applications and charging further down the line for enhanced developments. Young companies have no ties to old IT and are free to build on new computing architecture and thereby depriving companies like SAP of future customers. The company has launched a $155 million fund partially to encourage use of SAP HANA by startups. For it to be adopted by this fraternity it needs to become easy for them to download and experiment with to see where Hadoop and SAP HANA can work together and have a reasonable pricing model.
Can SAP Dance in the Cloud?
Last month for the first time SAP broke down their sales figures for their cloud business from their other sales and this showed that revenue from HANA had tripled year-on-year to £86 million. When Bill McDermott, co-chief executive was asked to comment by the FT he responded with “We said that we would focus on the cloud, mobile and big data. We can grow as fast as them and gain share against them, so why not break it out and show SAP can dance – and that’s what we’re doing, we’re dancing,”
Click How Hadoop and SAP HANA can Accelerate Big Data Startups to read the full article by Dan Woods.
Do you need a visa to work in IT, Engineering, Actuary or Finance?
To find out more about our solutions call now on 0800 294 4388 or Submit your Details and we will get right back to you!