Aaron YK Yee, Technical Sales Lead for Big Data & Analytics at IBM ASEAN
Apache Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.
IBM provides the industry’s premier open Hadoop solution that delivers critical insights, deployed anywhere. IBM is a proud member of ODPi, a shared industry effort focused on promoting and advancing Apache Hadoop for the enterprise.
While more companies are using the open source tool for collecting and storing a very large sets of variable data, IBM argues companies are struggling "to realise its full potential in every part of their business".
As examples it quotes a business analyst who needs to quickly find relevant information and the data scientists who need to make sense of the data with statistical modeling both with the corollary that the highly complex environments that are created need to be easy for IT to manage and deploy for everyone in their organization. No small task.
According to IBM, its BigInsights for Apache Hadoop includes 'a broad data science toolset to query data, visualise, explore and conduct distributed machine learning at scale".
Aaron YK Yee, Technical Sales Lead for Big Data & Analytics at IBM ASEAN, further shares his opinion on Hadoop.
DataStorageAsean: In Simple Terms what’s the difference between Hadoop and a Normal Database?
Aaron YK Yee: Hadoop is an open source platform that enables distributed processing of large data sets across clusters of commodity servers.
Traditional normal database (RDBMS) excels in processing predictable data workload on structured data but the beauty of Hadoop is its ability to process rapidly changing large volume of structured and unstructured data. Hadoop can also be easily scaled with commodity servers, making it readily adaptable to fluctuating workloads.
DataStorageAsean: Can you tell us about the newest developments in Hadoop and where you think the technology is heading?
Aaron YK Yee: By far the most exciting development around Hadoop is Spark. Spark as a platform enables in-memory distributed processing for large scale data processing.
Spark's ease of development, extremely fast in-memory performance and combined workflows (batch and interactive) support and complements Hadoop's strength beautifully – be it with diverse data type support, scale or resilience.
Spark is versatile and flexible and the processing capabilities of the Spark engine can be exploited from multiple “entry points”: SQL, Streaming, Machine Learning and Graph Processing. IBM believes in the strong potential of Spark and has invested heavily both internally and contributing to the Apache Spark Project itself.
DataStorageAsean: Is Hadoop for all companies of any size and can anyone in an organization get access to data held in a Hadoop cluster?
Aaron YK Yee: Companies of any sizes will benefit from Hadoop when they need to process large volumes and variety of structured and unstructured data.
There are many open source data access project (such as Hive, PIG, Hbase etc) that enable developers to get access to the data in Hadoop. IBM has also developed many products that enables business analyst, data scientist to productively draw insights from the data in Hadoop.
With technology such as Big SQL (Industrial standard SQL), BigSheets (Spreadsheet style tool), Text Analytics, Big R (R support) and Machine Learning on Big R, business analysts and data scientists can leverage on the distributed computing power of Hadoop without the need to write complex MapReduce codes.
DataStorageAsean: What are the specific challenges for Hadoop adoption in the ASEAN region?
Aaron YK Yee: The adoption rate in ASEAN varies from country to country. For countries with low adoption rate, the general challenge is the lack of knowledge of the benefits and value that Hadoop can offer plus the power its surrounding technology can bring.
DataStorageAsean: What’s unique about your Hadoop offering?
Aaron YK Yee: IBM’s Hadoop offering is branded as IBM BigInsights. IBM BigInsights for Apache Hadoop offers a strong suit of products surrounding the open source hadoop core.
This includes tools that enable administrators to allocate resources, monitor multiple clusters, and optimize workflows to increase performance. IBM's full range of enterprise software for Data Integration, Predictive Analytics, Security and Governance are tightly integrated with BigInsights.
Customers can be rest assured that corporate governance compliance and enterprise integration are well taken care off. Furthermore, IBM BigInsights is also available as a cloud offering for customers who wants to have a flexible deployment of Hadoop infrastructure.