iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week
Null

Home > iSGTW - 15 April 2009 > Feature - PanDA lets scientists stay cool

Feature - PanDA makes huge job sets more bearable


Simplified view of core PanDA architecture. Click on image to see the complete diagram in a larger size. Image courtesy of BNL.

Keeping track of huge job sets processed on hundreds of compute clusters around the world through the LHC Computing Grid might send the most organized of logical thinkers into a tizzy. The PanDA (Production and Distributed Analysis) system, developed for the ATLAS collaboration at the Large Hadron Collider, lets scientists stay cool while it takes charge of distributing jobs, collecting results and managing workflow.

An important feature of PanDA is that it allows the user to submit one job, called a pilot job, which coordinates a series of jobs that the user has put together and configured. When launched, the pilot job contacts the PanDA server, which in turn locates available resources and sends the collected jobs to run based on their relative priorities. The pilot system manages the workflow efficiently, providing a quick response time. And it frees users from tedious decision-making, said Kaushik De, a PanDA developer and University of Texas physicist.

PanDA was initially developed in 2005 for U.S.-based ATLAS production and analysis on the Open Science Grid, but it has since been adopted by the global ATLAS collaboration as its primary system for distributed processing. ATLAS uses a total of three different systems — OSG, EGEE and Nordugrid — but PanDA is the interface to them all.

ATLAS has also developed a separate data management system, variously called DQ2 or DDM (for “Distributed Data Management”), that catalogs the tens of millions of ATLAS files distributed worldwide at hundreds of storage locations. PanDA works seamlessly with DQ2/DDM to match user jobs to the input data required, either by sending the job where the data already resides or vice-versa.

This panda may not be so productive or widely distributed, but he knows his priorities. Image courtesy of sxc.hu.  

At the moment, PanDA’s jobs produce and analyze simulated data, which physicists can use to fine-tune their analyses in preparation for real data once the LHC is operational.

As of January 2009, PanDA had processed more than 25 million simulated data jobs. Its current daily rate is split into about 50,000 data production jobs and between 3,000 and 5,000 analysis jobs. Once real data starts coming in, scientists estimate job counts to approach 500,000 jobs a day.

“PanDA makes it possible to use huge amounts of computing resources distributed all over the world,” Kaushik De said. “Without a system like PanDA, it would be almost impossible for physicists to do the type of large-scale processing necessary to analyze their data and quickly get results.”

Amelia Williamson, for iSGTW

Tags:



Null
 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR

 Announcements

NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing

 Subscribe

Enter your email address to subscribe to iSGTW.

Unsubscribe

 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum

 

January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2

 

February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III


More calendar items . . .

 

FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map