iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week
Null

Home > iSGTW 25 July 2007 > iSGTW Feature - Data for the people: how to share the data love

 

Feature - Data for the people: how to fast-track your network and share the data love


“You don’t have to have a PhD to be interested in data and to want to analyze it. ” Bob Grossman is working to make large data sets easily accessible and publicly available, even to children.
Stock image from sxc.hu

So you’ve used a grid to split up your job, process it faster, then return your results. You now have a nice chunky terabyte of data. What do you do with it?

Bob Grossman, Director of the National Center for Data Mining at the University of Illinois, Chicago, U.S., says the answer is share, share, share.

“In terms of impact on society, the ability to use transparently other people’s data is going to be transforming,” Grossman says.

“It is about ‘network effects’,” he continues. “In the same way that a network becomes more interesting as more people join it, you can draw more interesting conclusions about your own data if you put it into the context of other people’s data.”

A fine notion in principle

But how can you get these network-busting bundles of new data to the people who need them?

Simple, says Grossman. You just send them, to everyone and anyone who might like to take a look.

“Our motivation for the last ten years has been to create a web for data, so it’s easy to browse, explore and download it. The system we built, called DataSpace, still controls who can write data, but we encourage anyone in the world to read it.”

Driven by this ultimate goal, Grossman turned his eye to the networks: could they distribute large sets of data across thousands of miles, and all without wasting a second? No, not really, not at all.

The network effect: the more telephones you have access to, the more useful your telephone becomes. The same can be said of data sets, says Bob Grossman.
Image courtesy of Derrick Coetzee

Grossman describes the old faithful TCP internet protocol—still going strong after nearly 25 years—as “a huge success story,” but, he says, new versions of TCP just weren’t coming out fast enough to solve his problem in good time.

“It was clear the network would change, but we didn’t want to wait ten years for that to happen. So we built our own infrastructure instead.”

Enter the fast lane

UDT, or User Datagram Protocol (UDP)-based Data Transfer, is the result. Able to shoot data around the world at 10 gigabits per second, UDT compares well with the three or four megabits per second that standard TCP—as it was usually deployed—was achieving. “And if you’re impatient like me…” jokes Grossman, “…I know which one I’d prefer.”
 
UDT has enjoyed much initial success, winning the annual Bandwidth Challenge held at the SC06 super computing conference last November by transporting the 1.3 terabytes of Sloan Digital Sky Survey (SDSS) Data from Chicago to Florida, with a sustained data transfer rate of eight gigabits per second.

For those keen on a more global challenge, UDT was used just last month to move 1.4 terabytes of SDSS data from Chicago all the way to Moscow. The transfer was complete in about 4.5 hours using a one gigabit per second link.

Even more exciting, UDT is now an option for gridFTP.

This progress points in some interesting directions for Grossman and his team.

“We want to lower the cost of getting hold of other people’s terabytes,” he says. “I want to be able to find out, in just a few minutes, whether someone’s data is going to be useful for my research.”

When asked about the policy of some collaborations in restricting who can access their data, Grossman replied:  “You don’t have to have a PhD to be interested in data and to want to analyze it. And if you want to analyze it, you have to be able to touch it. We’re building that infrastructure.”

- Cristy Burne, iSGTW

 

Tags:



Null
 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR

 Announcements

NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing

 Subscribe

Enter your email address to subscribe to iSGTW.

Unsubscribe

 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum

 

January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2

 

February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III


More calendar items . . .

 

FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map