The Data Deluge
An ever-growing deluge of data floods the world, much in the form of unstructured text, metadata, and sensor readings. ‘Data Lakes’ have emerged as an innovative storage repository. Unlike data warehouses, a data lake stores nearly any type of raw data and structures it when retrieved for analysis.
However, the ETL (Extract, Transform and Load) process presents a serious challenge. How do you get the data into the lake? Cask, a Palo Alto startup, engaged me to help them solve this problem.
Challenge and Opportunity
Cask had developed a middleware platform that allows enterprises to store, manage, and analyze Big Data. Their product, the Cask Data Application Platform (CDAP), integrates the components of the Hadoop open-source ecosystem and allows developers to build data analysis applications.
Cask wanted to reach professionals who did not have the coding skills for data ingestion required by CDAP in its (then) current form. We would design and build a consumer-friendly ETL app for premier at the upcoming 2015 Strata + Hadoop conference.
The launch would bolster Cask’s status as an emerging force in the Big Data community. More critically, it needed to help secure B-round funding. Venture capital in the Valley was slowly drying up and the company desperately needed to add engineers and a sales and marketing team.
We had 8 weeks.
Hydrating the Lake
CEO, John Gray, dubbed the new tool ‘Hydrator.’ Hydrator would enable users to easily ingest data and then build and maintain data pipelines through a drag-and-drop interface.
We moved quickly. In between the endless demands of running a 3-year-old startup, the CEO and CTO would huddle with me in the conference room whiteboarding and debating features and requirements. I translated these into wireframes. We refined and iterated quickly.
I was new to Big Data and received a crash-course courtesy of the knowledgeable engineering team and my own research. I collaborated with the engineers, visual designer, and front-end team to refine and implement the designs.
We adapted components from CDAP such as the existing pipeline canvas and node menu. In the process, we simplified and organized them. Now, users could create custom or pre-configured pipelines.
They select from a menu of nodes. These nodes represent different junctures on the data transfer circuit. Users can select from variety of sources, apply transforms, and designate sources to output the data.
Version 1. The application and design evolved with each new release.
We made the deadline. The head of engineering said it was the smoothest launch they’ve had in 3 years. Conference attendees responded favorably. Most importantly, Cask received $25 million in B-round funding, crossing the ‘valley of death’ that claims so many startups. I can’t take all the credit. Engineering provided many assists.
Cask engaged me again for the next release and eventually brought me on full-time. I led UX design, assumed product manager duties, and introduced and modeled fundamental principles of design thinking. We continued to evolve Hydrator and to develop additional extensions like Tracker, a deep metadata search tool.
One of my proudest moments was when Jon announced to us all that Tom Reilly, CEO of Cloudera, said how impressed he was with the maturity of the design.