iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week
Null

Home > iSGTW 23 January 2008 > iSGTW Opinion - Avalanche warning: the new challenges of the data grid

 

Opinion - Avalanche warning: the new challenges of the data grid


Available disk space at Canada’s TRIUMF Tier-1 center was rapidly consumed, with users filling one year of space in just three months. More storage was again made available in November to meet demand.
Image courtesy of TRIUMF and WLCG

The Large Hadron Collider at CERN will produce many Petabytes of data every year; other applications in astronomy and genomics will also generate and rely on massive amounts of data.

How will we manage these data? Where will we store them? The new “data grid” brings with it certain responsibilities that must be borne by the users.

Managing storage for a large user base is not a new problem, but grids greatly amplify the scale.

Allocating storage resources is more complicated than managing compute clusters. Upon completion of a compute job, CPUs are simply returned to the common pool; however, their output must be stored and cannot be deleted indiscriminately. 

Laws, limits, allocations and restraint 

Even the allocation of small storage quotas to thousands of users will add up over time to cause serious problems.

One solution is to put time limits on storage allocation, but this does not help users who want to store data indefinitely.

Some of the staff at the TRIUMF Tier-1; the author is second from the right.
Image courtesy of TRIUMF

Another solution is to charge for storage space, which, however, is not in the spirit of grids. Note that this is more of a management issue—of people, not of systems—than a technical one.  

Take a look at your desktop

Many tools exist to manage large pools of data, but these do not indicate when the data can be safely deleted. What is a facility manager to do when faced with hundreds of thousands (if not millions) of files owned by thousands of users, to which he has no immediate access? Users are much less likely to worry about disk usage at a site halfway around the world than they are at their university IT centre. We have all been guilty of not cleaning up our desktop; we just go out and buy a larger disk!

Policies must be put in place to keep the explosion of data in check. I fear that this can only be done by making most storage space of the “scratch” or “temporary” variety, which will put restrictions on users. Exceptions can be made for large projects such as the LHC, which aggregate the needs of a multitude of users, but the number of these exceptions must be kept reasonable, or system administrators risk spending all of their time contacting users to clear space.

One year filled in three months

An example of the problem can be found at TRIUMF, Canada’s national laboratory for particle and nuclear physics, which houses one of ten Tier-1 centers for the ATLAS experiment at the LHC.  After an expansion of the centre in August 2007, one year’s storage allocation was filled in about three months!

The main point of this comment is to convince users that it is their responsibility to control their use of storage on the grid. Resource providers and system administrators supply the tools for data management—directory structures, file movement and so on—but users must act responsibly when managing their storage allocation. It is not reasonable for users to wait for an email asking them to delete files because the center is running out of space.

Again, the scale of the problem is amplified by the grid.  Whatever solution is adopted, it will inevitably require all of us to exercise restraint.

- Michel C. Vetterli, Simon Fraser University/TRIUMF
 

Tags:



Null
 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR

 Announcements

NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing

 Subscribe

Enter your email address to subscribe to iSGTW.

Unsubscribe

 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum

 

January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2

 

February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III


More calendar items . . .

 

FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map