IoT is no longer the new buzzword. Big data analytics has been a hot topic for a few years. Also, flash is increasingly dominating the market, while spinning disks are facing a decline. IDC reports are looking at more losers than winners. How does all this relate to flash storage?
We met up with co-founder and chief architect of Pure Storage, John Hayes, in Singapore to discuss precisely that, coinciding with his travels around the region.
Pure Storage recently launched a new product called FlashBlade in March. A jovial man, John is as much enthusiastic as he is ambitious. Proud of their latest offering, he explains that while other products focus on high performance, FlashBlade is the first flash storage that is designed for capacity. Not only is the price point lower, it’s also a scale out product. Looking at the market with a 40 – 50% growth year-on-year, Pure Storage wanted to build a more elastic product.
“As a company we focus on making a simple product that has a long lifetime. We want businesses to adapt to changing technology and its changing really quickly right now. FlashBlade is closer to how engineers work, rather than how IT works. We don’t want to just make it fast – we also want to make it more efficient. We’re in a time where flash prices are dropping a lot, so we have to do better than any existing flash products that you can buy in the market.”
“What we want to accomplish with FlashBlade is make the infrastructure is simple, so that anyone can sell their products, grow with your data, and we will support them along the way. We can help advance their product by taking huge risk out of infrastructure that they had, that is not well adapted for rapidly growing data.”
FlashBlade is deployed in a 4U chassis, with the capacity to fit up to 15 blades, expanding up to a maximum of 1.6Pb of data. This makes FlashBlade not only low power but also with easy to use software that requires minimal configuration.
Being still in pre-release, they have invited a number of companies for the alpha program; and they have been seeing flash being deployed to solve big data problems.
“In our alpha programs, one of our customers was Mentor Graphics, specialising in chip design, and the way they use it is like a big data problem – where they want to attach thousands of computers to a store of data to perform simulations. It’s similar to applications in plant genomics, even finance – it’s closer to neural network, where no one knows what it is, but you run simulations with sensors, and simulators are used to learn how to recognise the natural world.”
During his travels, John has been in discussions with several automotive companies and a general consensus was that most companies are looking towards self-driving cars. It’s estimated that it will take up to 200PB of data to train a car to drive itself, by using sensors on thousands of cars driving around the world sampling every natural environment.
John explained that structured data that lives in Tier 1 only accounts for a small percentage of world’s data and grows slowly. In fact, more than three quarters of data in the world are unstructured and growing at a rapid pace – which is why he believes that flash is paramount to catch up with the amount of data generated.
“One of the difficult bits of big data is the storage administration. We make that problem as a whole go away – so you have a huge data pool, and it’s all available at the speed of flash.”
“Key to making that work is to take advantage of properties of flash. We want it to be low power, durable, fundable. Flash idles very well compared to disk. Which means we can pack flash in tighter than traditional spinning disk. It’s much more reliable, you don’t need a bunch of mechanism around it and you can connect it directly to the CPU. Even if the media is more expensive than a disk, the system is actually cheaper, because you don’t need layers of mechanics and controls. Metal doesn’t get cheaper over time, but in designing the system we’ve eliminated that as much as possible.”
Pure Storage may not be selling analytics products, but businesses are certainly using flash products for analytics. John told us there has been examples of using flash in log analytics for analysing traffic.
“A lot of large storage, like object storage, focuses on the quantity side but not focused on driving the performance of the data, which often means you are copying it out. Hadoop is like a massive ETL machine – it’s reading, transforming and writing it back – you keep making more and more copies of that data as you transform it. But what we hope is that you can do more in the place of the original data.”
Purely looking 5 years back, John notes that there were more mechanical engineers. Initial applications were more towards analysis and engineering. In recent years however, there is an increase in software engineers, and engineering companies are turning into software and big data companies. It’s no longer possible to look at plant genomics, oil and gas, or finance, without making it a big data problem.
“Engineering is trying to understand the natural world, and it takes a lot of data to understand. We’re changing from trying to come up with a theory or formula of how the natural works, we are trying to learn how it actually works from all these data. This is a change in how companies operate. A company who doesn’t have a strategy to understand the natural world, they will have trouble competing with the ones that do.”
Looking at the latest financials, the flash industry is definitely seeing growth while spinning disks are looking at a decline. No matter how much growth flash is achieving in the market, John believes there is still much to do in developing flash to be able to compete with its slower counterparts. “Businesses are looking more into the software capability rather than the price. We’ve broken the price barrier, now it really comes down to how do we reproduce the capabilities of disk systems of the last 30 years.”
“One of the interesting initial applications for flash is archive for the medical field – they need to retrieve information quickly – they don’t know which bits of data, but they will need rapid access to something in there. The economics support using flash as an archive system. This is applicable in the finance industries as well, building the structure around the guarantee that your data will not be modified.”
But as John says, “We will need to write a lot more software.”
Right now, businesses are looking to flash to solve a critical business problem. As John observed, people only use a brand new product to solve a critical problem. However, after the problems are solved, businesses will start looking at other capabilities of flash, and thus getting more performance value through flash. Everyone wants their data to be more agile, responsive, easy to use, and scalable; flash does all that by removing complexity around managing storage.
With discussions milling around the industry regarding whether flash will be taking over the storage market, John is positive that transition is already in the works.
“Flash is better than disk in every way and better than tape in a lot of ways as well. It’s actually online – things that are online and connected means it’s more durable over long periods of time when the data is alive. It’s safe to ship like tape, if you want to ship it off site, it can be encrypted; it’s smart – it can read itself. You don’t have to worry about having matching mechanisms or connections. The only thing holding flash back is the software interface and capabilities that would allow the substitute of other products.”
It’s not all or nothing though. John analyses that cost is not the issue for the adoption of flash in backups, but rather the decades of integration that are backed up into enterprises, data processes and IT processes, which has to be rebuilt. He believes a time will come for flash when backup can be natively done instead of going through layers of process.
“When we make features in the flash array, we endeavour to rebuild them in a simpler way, trying to take the good properties of flash while trying to reproduce the benefits of other storage. It’s going to take time for that to catch up.”
John has travelled many countries, and he observed that people appreciate references from their industry, culture or anything that brings them closer; and John is determined to build that connection up. Pure Storage is targeting people developing the next wave of applications, and they are looking to be in the part of storage that is growing. “Operations will always follow where the engineers go.”
“Everyone wants their IT to work. But it’s not the companies that first adopt it, it’s the people. You find a person who wants to change their infrastructure, who wants to make a simplification and make the system better. You find those people in all sorts of industries – from hedge funds to small manufacturers like cheese manufacturers, beer manufacturers – it isn’t what you would consider to be early hi-tech adopters.”
“The question is where do you find these people that are going to take the first step.”
Based in California, John looked at industry trends in the Bay Area, where much of machine learning is devoted to advertising. He was very excited that these technologies are now used for interesting applications that actually benefit people with interesting products instead of filling in a statistic on a database.
He emphasises that data is the key to knowing customers. He knows that companies, even if they do not have plans for their data, tend to keep raw data for a long period of time. That interaction is, in his view, one of the most important assets and the closest to building and adding depth to a personal relationship with customers. “It’s really about the people.”