Spark is on the ascent in the big data world and rightfully so. It's faster than MapReduce by far, and with its SQL interface, it's faster than Hive. Though operationally different than either of the two, Spark can replace both in many instances.
The company behind Spark, Databricks, hopes to carve out a niche for itself in the big data world. Yet all of the major Hadoop vendors have announced support for Spark as well. At the recent Spark Summit East, I asked Databrick’s head of customer engagement, Arsalan Tavakoli, how the company plans to compete:
It is really two different segments. I think the Hadoop ecosystem is alive and kicking. Hortonworks, MapR, Cloudera are all very focused in the on-premise world. We don’t have a distribution of Spark in the on-premise world. Actually, all of those guys leverage databricks for their L2, L3 support for Spark. When they go to a customer and sell Spark support, they rely on our expertise because we have the core braintrust around that.
This is rosy if not well-rehearsed answer to the question, but the truth is more complicated. Paco Nathan, Databricks' director of community engagement, made several unfavorable references to Hadoop during a Databricks cloud training session at Spark Summit East. He stated that he saw several companies “jumping over Hadoop” and “skipping the big Yarn deploy” to go straight to Spark. He went further to say that Hadoop would be over in a few years.