With over forty years of experience in Information Science, specializing in large scale computing infrastructure and computationally intensive projects, Robert Hercus (RH) is arguably a man of interest. In 1977 he set up a software house specializing in application development for government agencies and private companies in areas such as finance, insurance, accounting and land registration systems, to name a few. He was also responsible for the implementation of the IT infrastructure and applications for Projek Lebuhraya Utara Selatan Sdn Bhd (PLUS) – owner and operator of toll expressways in Malaysia. He was further involved in setting up two pioneering Malaysian companies under PLUS for the development of automatic toll collection and the Touch ‘n Go systems.
In 2001, he founded Synamatix Sdn Bhd, which specializes in bioinformatics and a year later, Neuramatix Sdn Bhd, focusing on the creation of intelligent applications and devices in various domains including machine translation, robotic movement, robotic speech and semantic technology, among others. Robert is also the founder and Managing Director of Malaysian Genomics Resource Centre Berhad (MGRC), which specializes in genome sequencing and analysis, in addition to genetic screening. Founded in 2004, MGRC was listed on Malaysia’s ACE Market in 2010.
Data & Storage Asean (DSA) recently spoke to Mr Hercus on Big Data and how it impacts his work across several disciplines including life sciences, software development and personnel management.
DSA: What is your definition of Big Data?
RH: Some would say that Big Data is data that is beyond the current computing systems' ability to compute easily. Computers have been increasing in power every year so the focus on Big Data today is probably not just the petabytes of data but the extraction and analysis of complex patterns as opposed to data processing per se. Much of the analysis today looks for information such as, the kind of products people buy, who relates to who, games people play, what do friends like in terms of music or video, etc. It is looking at relationships between people and objects or actions.
In the past, interest on data was centered on transactions, stock markets, accounting or payroll. The focus was on the data itself. Today, the interest revolves around the relationship between pieces of information.
DSA: How does Big Data get applied to Neuramatix's business?
RH: Our core technology NeuraBASE, is an ultra high-speed artificial intelligence technology that finds patterns within voluminous data sets and data streams. In general, we offer software and services to help analyse patterns and associations within voluminous data.
For genome analysis projects, NeuraBASE is used to detect and identify known and novel genetic mutations, which is important for understanding the development of diseases. In machine translation, NeuraBASE is used to build words and phrases from a source language, which are then linked to translated words and phrases of another language. We are also developing an interactive speech system, in which NeuraBASE is used to build sequences of phonemes, fundamental units of human speech that make words and phrases. To perform recognition, an algorithm is used to find a matching sequence of phonemes.
In each of these examples, we find application of Big Data, and there are other applications of Big Data. At any given moment, Bots are traversing the Internet looking for opportunities.
DSA: In Malaysia and specific to your business, are you involved in any Big Data projects?
RH: We embarked on a collaborative effort with Sime Darby Technology Centre to analyse the gene expression data of oil palm a few years ago. The project involved the complete sequencing, assembly and annotation of the oil palm genome with 30 times coverage (repeated sequencing over 30 times) and with 93.8% completeness.
For machine translation, we have opened up a world of information to those who may not be fluent in English, specifically Malaysians and Indonesians. A human translator translates an average of 5-8 pages per day. In a world where an estimated 1.8 million articles published each year through an estimated 28,000 journals, there are much more information generated each day than human translators can cope. Much of this information changes rapidly. Our translation portal translates up to 200,000 words per minute. It takes seconds for us to translate entire websites. Our top users are mainly Malaysian public universities and corporate organisations. In 2013, we received 3.5 million visits from 104 cities worldwide. These include Malaysians who are either studying or working abroad.
One of our most interesting projects was a collaborative effort with the International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America. While this may not be strictly be a Big Data project which was initiated in Malaysia, we are proud to say that the bioinformatic analysis was completed here in Malaysia. In 2009, the paradigm for revealing the molecular cause of cancers relied on the interrogation of small numbers of genes, which limited the scope of investigation. The emerging second-generation massively parallel DNA sequencing technologies enabled more precise definition of the cancer genome on a large scale.
In this project, scientists from the International Mesothelioma Program, Brigham and Women's Hospital, Boston, Massachusetts, United States of America worked with us to examine the genome of a highly aggressive tumour associated with asbestos exposure. The project successfully uncovered thirty tumour-specific fusions/translocations which were independently validated. These discoveries help scientists understand how tumours develop and grow at the molecular level. Today, other scientists use genomic research to develop cancer drugs that can disrupt cellular signals for tumour growth. In some cases, tumours stop growing. In other cases, the size of the tumours were reduced.
In addition to the above, we are also involved in robotics and the autonomous navigation of unmanned aerial vehicles.
DSA: Do companies need a data scientist in order to benefit from Big Data?
RH: If a company currently has an application that does what they want it to do, they probably don’t need a data scientist. However, if you are developing something new that has not been done before, you will definitely need a data scientist to work out the best ways to extract the relationships in your data. For example, if you look at Facebook and consider a person might have a hundred friends; and each of those friends have a hundred other friends; and that one person buys a particular item on Amazon. Is there a way for a business to identify from your friend’s friends who might be interested in buying the same thing? Is there a way to link up those potential customers? This kind of complexity requires a data scientist or computational algorithms to help extract insightful associations in real-time.
DSA: Is it easy to find a data scientist?
RH: It’s actually teamwork. You need a statistician trained in dealing with complex large data sets. You also need people who can translate that into computer jargon and program it. You then need system architects who can optimize the hardware for high performance.
DSA: Where do you see Big Data going in Malaysia?
RH: There is a lot of potential in the government sector for Big Data analysis, even simple things like processing car registrations licenses or petrol consumption subsidy management – are the right people getting the subsidies? Every country needs systems for tracking criminal activities, foreign workers, illegal immigrants and also smuggling activities – the UK government for example has done an excellent job at predicting the movements of criminal activities.