iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week
Null

Home > iSGTW 17 March 2010 > Feature - Case Study: EinsteinatOSG

Feature - Case Study: Einstein@OSG


A screenshot of the Einstein@Home screensaver. Image courtesy of Einstein@Home.

For over five years, volunteers have been lending their computers’ spare cycles to the Laser Interferometer Gravitational Wave Observatory (LIGO) and GEO-600 projects via the BOINC application Einstein@Home. Now a new application wrapper, dubbed “Einstein@OSG,” brings the application to the Open Science Grid.

Today, although Einstein@OSG has been running for only six months, it is already the top contributor to Einstein@Home, processing about 10 percent of jobs.

“The Grid was perfectly suitable to run an application of this type,” said Robert Engel, lead developer and production coordinator for the Einstein@OSG project. “BOINC would benefit from every single CPU that we would provide for it. Increasing the number of CPUs by 1000 really results in 1000 times more science getting done.”

Getting Einstein@Home to run on a grid was not without difficulties. Normally, a volunteer would download and install the application. The application would constantly download data, analyze it, and then return the results. In short, each instance of Einstein@Home has a permanent home on a volunteer’s computer.

The same process would not work on the Grid. Grid jobs cannot run indefinitely, so each instance of Einstein@OSG was given a time limit.

“Once the time limit is up, the Einstein@Home application exits, followed by the Einstein@OSG application, which will save all results to an external storage location,” Engel explained. “The next time Einstein@OSG starts, it likely starts on a different cluster node which may use a different architecture.”

Next, the Einstein@OSG application detects changes in the environment, such as the architecture, location, version of software, or network connectivity, and then compiles any missing software ‘on-the-fly.’ After a final check to verify that all requirements for Einstein@Home are met, it starts up. The results from the previous run are loaded from the remote storage location, and Einstein@Home picks up where it left off.

An application on a grid will encounter software and hardware issues much more frequently than a desktop application such as Einstein@Home, according to Engel. This is because grids are much more complex, and deal with an extremely high volume of jobs.

Because the average Einstein@Home user will only encounter an error every couple of months, it’s practical for her to handle the error manually. With Einstein@OSG running on up to 10,000 cores, however, there are errors every couple of minutes. Fixing these manually simply isn’t practical, so Einstein@OSG eventually automated the process.

“It was only because of that mechanism that we were able to scale up,” Engel said. “A computer never gets tired looking for errors and fixing them, unlike me, who likes to sleep at night and spend time with his family.”

The number of clusters running on Einstein@OSG is plotted on the horizontal axis; the total number of CPU cores across all clusters is plotted on the vertical axis. The rectangles each represent one week between June 2009 and February 2010. The color indicates how much work was accomplished that week, ranging from blue (the least) to red (the most). Note that the dates of three arbitrarily chosen weeks are written in white to illustrate how over time, the amount of work as well as the number of clusters and cores has increased.

Image courtesy of Einstein@OSG.

Before Engel began work on Einstein@OSG, he was a member of a team led by Thomas Radke at the Max Planck Institute for Gravitational Physics. Radke’s team created a wrapper for Einstein@Home compatible with the German Grid Initiative (D-Grid) in 2006. Part of Engel’s contribution was the design of a user interface that allows one person to effectively monitor and control thousands of Einstein@Home applications.

“Back then it consisted of a command line tool that would summarize all activities on the Grid on a single terminal page,” Engel said. Now the tool records activities and uses that historical data to create error statistics. Those and other statistics are displayed on an internal webpage.

The wrapper created by Radke’s team could not simply be repurposed to run on OSG, unfortunately.

“OSG and the German grid are different,” Engel said. For example, “in Germany the entire grid depends on Globus.”

Engel and his team examined their options for getting Einstein@Home onto OSG, and concluded that the best option was Condor-G, a sort of hybrid of Condor and Globus. But implementing Condor-G would have required a great deal of work, delaying the launch of Einstein@Home on OSG.

That’s why Engel’s team opted to implement Globus’ GRAM, which took only two weeks of work, before they began work on a Condor-G solution. It’s a good thing too, because they soon discovered a serious issue with GRAM.

“It doesn’t go up in scale very well,” Engel said. “If you try to run more than 100 jobs on a given resource, you’ll bring down that resource.”

Still, given a chance to do things differently, Engel would have implemented GRAM, he said. “It meant that for a year, we could run jobs on OSG.”

The Condor-G version went live in September 2009, and it has rapidly picked up steam. “On a typical day, we are running between 5000 and 8000 jobs at any time,” Engel said. “Before that we were running approximately 500.”

Watch this video to learn more about LIGO and GEO600, the experiments that are supported by Einstein@Home!

Video courtesy of the American Museum of Natural History.

—Miriam Boon, iSGTW

Tags:



Null
 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR

 Announcements

NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing

 Subscribe

Enter your email address to subscribe to iSGTW.

Unsubscribe

 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum

 

January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2

 

February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III


More calendar items . . .

 

FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map