iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW 4 November 2009 > Feature - Getting GPUs on the grid

Feature - Getting GPUs on the grid

Russ Miller, principal investigator at CI Lab, stands in front of the server rack that holds Magic, a synchronous supercomputer that can achieve up to 50 Teraflops. Image courtesy of CI Lab.

Enhancing the performance of computer clusters and supercomputers using graphical processing units is all the rage. But what happens when you put these chips on a full-fledged grid?

Meet “Magic,” a supercomputing cluster based at the University of Buffalo’s CyberInfrastructure Laboratory (CI Lab). On the surface, Magic is like any other cluster of Dell nodes. “But then attached to each Dell node is an nVidia node, and each of these nVidia nodes have roughly 1000 graphical processing units,” said Russ Miller, the principal investigator for CI Lab. “Those GPUs are the same as the graphical processing unit in many laptops and desktops.”

That’s the charm of these chips: because they are mass-manufactured for use in your average, run-of-the-mill computer, they are an extremely inexpensive way of boosting computational power. That boost comes at a price, however.

“These roughly 1000 processors on each nVidia node are programmed in a synchronous process, basically bringing us back to programming methods of the 1960s,” said Miller.

The parallel programs modern supercomputers run are already quite difficult to write. Synchronous programming is a more limited form of parallel programming. “Parallel means doing multiple things at the same time,” explained Miller. “Synchronous means doing the exact same thing at the same time.”

Synchronous computations could be processing very different sets of data, as long as the algorithm used is identical. For example, an algorithm could instruct two people to kick the object in front of them. If one is playing soccer while the other is learning self-defense, the instruction may be identical, but the context, meaning and effects are quite different. “The job becomes very demanding for a programmer to be able to exploit these roughly 13 000 processors that we have in one rack. But if they can, the returns are huge,” said Miller. “We can get roughly 50 teraflops of computing out of one rack of systems.”

In a perfect world, scientists could submit their computational jobs to a scheduling application, and the scheduler would take care of finding computing resources. “I want to be able to be lying on a beach in Cancun with my iPhone, and hit a button, and not have to worry about where my data is, our what resources it’s using,” said Miller. “But there’s no way you could submit to the grid and have it assign a GPU for you. That logic is not built into the software stack just yet.”

To do so, resource providers would need the ability to specify that they can only handle synchronous computations, and users would need to be able to specify what sorts of resources their computations can exploit.

In the meantime, Magic has been hooked up to Open Science Grid and the New York State Grid since February. And instead of relying on a high-tech scheduler to assign jobs to the cluster, CI Lab has been relying on much older ‘technology’ – word of mouth.

“One of the biggest sources of users we have seen so far is just word of mouth,” said Kevin Cleary, the system administrator for Magic. “So getting the word out that on these nodes we do have these massive amounts of power available.”

Once a researcher is aware that Magic is available, he or she can tell the scheduler to submit the job directly to the GPU cluster.

As other GPU clusters come online, word of mouth may become an impractical solution. In the meantime, however, it is working well for Magic. Said Cleary, “In the past week, nearly 2500 jobs have been run on this cluster with a 98 per cent success rate.”

Miriam Boon, iSGTW


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map