Big data company
The storage engine, Kudu, is meant as an alternative to the widely used Hadoop Distributed File System and the Hadoop-oriented HBase NoSQL database, borrowing characteristics from both, according to a copy of a slide deck on Kudu’s design goals that VentureBeat has obtained. The technology will be released as Apache-licensed open-source software, the slides show.
Cloudera has had one of its early employees leading a small team to work on Kudu for the past two years, and the company has begun pitching the software to customers before an open-source release at the end of this month, a source familiar with the matter told VentureBeat.
That source and others believe Kudu could present a new threat to data warehouses from Teradata and IBM’s PureData (formerly Netezza), and other vendors. It may also be used as a highly scalable in-memory database that can handle massively parallel processing (MPP) workloads, not unlike HP’s Vertica and VoltDB, the sources say. And one day Kudu — which works across multiple data centers with RAM and fast solid-state drives (SSDs) — could even play a part in backup and disaster recovery.
Cloudera declined to comment.
However Cloudera chooses to market Kudu, it’s clear that the software is a big step forward for the company, not only in the company’s efforts to outdo other Hadoop vendors, but also in its quest to become a prominent player in enterprise software.
Not that Cloudera is a nobody. It’s
So what is Kudu, then?
It’s “nearly as fast as raw HDFS for scans” and, at the same time, “nearly as fast as HBase for random access,” according to one slide from a presentation on Kudu’s design goals. But Kudu is not meant to be a drop-in substitute for HDFS or HBase. “There are still places where these systems will be optimal, and Cloudera will continue to support and invest in them,” a slide said.
Kudu could be used for time-series data, or real-time reporting, or model building, according to another slide.
And it’s important to note that Kudu isn’t a SQL query engine for pulling up specific data. Cloudera has Impala for that, and others have Hive for that. Kudu has an “early integration” with Impala, and Spark support is coming, according to a slide.
The Kudu application programming interface (API) works with Java — the common language of Hadoop — as well as C++. Kudu’s architecture allows for operation across sites, according to one slide. That makes it comparable to Google’s Spanner and
Is Kudu well adopted, though? No, not yet.
“Looking for beta customers,” a slide said.