When Owen O’Malley was picking graduate schools, he chose UC Irvine’s Information & Computer Science because of its leading-edge software group and strong ties to industry. In software, it is critical to understand your customer and their use cases. Doing research is very exciting, but it is critical to combine the theoretical with the experience of industrial professionals. At UCI, with the guidance of his advisor Prof. Debra J. Richardson, he explored many different topics that proved invaluable to him in his career including program analysis, graph algorithms, distributed computing, and software architectures.
The Monday after turning in his dissertation, O’Malley started at Reasoning, which was a 10 year old start up that had just pivoted to produce a tool to find and fix y2k problems in legacy COBOL applications. “The experience I had at UCI in static analysis and developing data flow tools were critical in that job. But I never expected to build a data analysis tool for COBOL!” said O’Malley.
After saving the world from y2k disasters, O’Malley moved to Sun where he worked on a team that developed the revision control system that managed the data used for designing Sun’s CPUs. “The project was challenging because we had three sites working on the CPU designs and the data was too big to access from a single central server. The data had to be distributed and had to be kept in sync between the three sites even when they were generating 100,000 new file versions in a single day,” said O’Malley.
After Sun, O’Malley moved to NASA. O’Malley’s parents met each other while working as programmers on NASA’s Apollo project, so not only is he a second generation programmer, he has always been interested in NASA. While there, he worked on many exciting projects. For example, in 2002, he had the opportunity to work at NASA Ames Research Center on a software model checker named Java Path Finder. Software model checking is a form of static analysis that explores the possible states of a program looking for failures. While O’Malley was analyzing software that suggested actions in space shuttle emergencies, he got to pilot a three engine out scenario on one of the simulators that the shuttle pilots use for training and managed to land the simulated shuttle on the first try. O’Malley also helped out on the Mars Exploration Rover project. When the Rover had just landed on Mars, the planning software was crashing occasionally. As an expert in C++, O’Malley was called in to figure out the problem and he discovered that they were hitting a bug in the C++ runtime, which had a simple fix.
O’Malley joined Yahoo Search’s WebMap team after NASA. WebMap built and analyzed a graph of the known web with a node for each URL and an edge for each link. It had 100 billion nodes and a trillion edges and the compressed graph was 100 terabytes. Needless to say, it took a lot of computers to build it in a timely manner. Although it had been scaled up to run on 800 computers, its framework needed to be replaced to support an even larger scale. The team started designing and prototyping a new C++ framework based on the GFS and MapReduce papers from Google, but discovered a similar distributed file system and MapReduce implementation in the Lucene project at the Apache Software Foundation. It only ran on 20 machines, but it had the huge advantage of already being open source. O’Malley’s team had planned to open source their new framework, but starting with an open source project made it easier. That code quickly became the Apache Hadoop project, which became the de facto standard for big data processing. Two other UCI alumni, Hairong Kuang and Koji Noguchi, were also early critical members of the Hadoop team. Yahoo has over 45,000 computers running Hadoop; the largest clusters are 4,500 computers, and are critical to Yahoo’s business. Because the project is open source and is unique in the scale of data it can handle, it has been adopted by numerous companies including Facebook, LinkedIn, eBay, Apple, and Twitter.
In 2009 O’Malley and his colleagues used Hadoop to set the world record for big data sorting. They used 1,406 machines to sort a terabyte of data (10 billion records) in just 62 seconds and 3,658 machines to sort a petabyte of data (10 trillion records) in only 16.25 hours.
The power of Hadoop is that it lets you use many computers together, but the problem of using many computers is that they are always breaking. Computers typically last 3 years, but that means for every 1000 computers you’ll lose one a day. If you use the computers heavily, you’ll lose several each day. To keep Hadoop usable, it automatically handles the failures in software. A story O’Malley shared demonstrates how this works. The operator of the Hadoop clusters at LinkedIn was trying to convince his management that they didn’t need to buy computers with redundant power supplies, even though that’s what they typically did. In the meeting, the operator pulled up the list of machines in their production cluster and asked the managers to select a machine. The operator logged in to the one they chose and halted the machine with no warning or shutdown. Hadoop keeps three replicas of the information, so when a computer goes down the missing data is copied from one of the remaining replicas. Equivalently, the compute jobs on that computer areautomatically reassigned to other computers. As a result, operators don’t need to fix problems immediately, and one operator can run 3,000 computers. So in the operator’s ‘experiment’, the production cluster continued to function successfully.
Last year, Benchmark Capital approached Yahoo saying there was a great business opportunity to spin out the Hadoop group into a separate company. As a result, in July 2011, O’Malley and his co-founders started Hortonworks with 25 people. Hortonworks has quickly grown over the last year to 80 people. O’Malley said, “It’s really exciting to be paid to work on open source full time. Further, Hortonworks’ commitment to open source means that all our work gets released as open source, which is rewarding.” Since their software is available for free, Hortonworks’ revenue is generated via training and support.