All in a Day?s Work

All in a Day?s Work

How faculty from across campus at Purdue partnered to build a supercomputer

THIS SUMMER, IN LESS THAN a day, a team at Purdue University (Ind.) built one of the largest supercomputers in the world. Designed for 10,000 cores, “Coates” was running research jobs before the end of its installation day. And together with its 6,500-core predecessor, Steele, it makes Purdue one of the nation’s leaders in campus supercomputing resources, according to the TOP500 (www.top500.org), an annual industry-approved ranking of the world’s top supercomputers.

Coates and Steele are “community clusters,” which means that many faculty members contributed research dollars to buy part of the supercomputers. Building consensus among the faculty was the first step in creating a world-class supercomputer, a feat that is well within the reach of any university.

In 2008, faculty from different departments approached the central information technology group at Purdue, “ITaP,” to help them build several computer clusters. It’s fairly common for small numbers of researchers to pool resources for a shared cluster, but we realized this: If we combined research funds with additional investment from ITaP, we could create a much larger, shared computer.

After receiving preliminary bids for equipment, we made a formal proposal to the academic departments: If faculty would commit to buying the actual computer nodes (the CPUs), ITaP would cover the cost of racking, storage, networking, systems administration, and maintenance.

Some faculty members signed on right away, but others were more skeptical. Could the deal be this good? Could ITaP really build and maintain that size of supercomputer?

Built in a morning by 200 faculty, staff, and students, it was running jobs by the afternoon.

More than 40 faculty eventually took part, raising $2.2 million. We prepared the bids on a spreadsheet so the faculty investors could compare apples to apples, and they made a vendor choice in just 40 minutes.

The outcome was Steele, a 6,500-core supercomputer and the largest Big Ten supercomputer at the time. Built in a morning by 200 faculty, staff, and students, it was running jobs by the afternoon.

This year, when we approached faculty with a proposal to build a second supercomputer, our credibility was established. The previous success had made a strong impression, showing the faculty how Purdue’s IT organizations could mobilize effectively and quickly.

Once again, with the pooled funds of many faculty members, we provided CPUs at a significant discount to what faculty could obtain on their own. Using the same vendor selection process for Coates as had been used for Steele, the faculty chose HP, along with vendor partner Matrix Integration for the compute nodes. The university, with cooperation from Cisco, added 10GbE networking and NIC cards, ultimately creating an industry-leading 10GbE supercomputer.

Coates was built on July 21, 2009. It is currently the largest Big Ten supercomputer and is expected to appear among the 50 largest supercomputers in the world in the fall’s rankings. Purdue University faculty and staff are using the computer today to tackle large-scale science and engineering problems.

The most obvious reward of the community clusters is that faculty can utilize much more computing power than they purchased, as the machines are scheduled far more efficiently. A local cluster typically is used at only 35 to 40 percent of its capacity, so if a faculty member only purchased 10 nodes she can use many times that number when the nodes are not in use by their owners. And because of this opportunistic scheduling, Steele and Coates have an extremely high utilization rate of over 95 percent, further improving the return on investment. One unexpected gain was the new trust faculty showed in ITaP’s provisioning capabilities, which have now been extended to high-performance storage.

When there are so many advantages to building a community cluster, why wouldn’t any university do it? Credibility is key. If a campus IT organization can’t organize orders, build quickly, and reliably maintain a large machine, then that university probably shouldn’t start down this path. But if a university IT department has the operational capability and management competence, we welcome this challenge: Try and surpass Purdue’s achievements in this space.

Gerry McCartney is the Oesterle Professor of Information Technology and vice president for information technology and CIO at Purdue University (Ind.).


Advertisement