iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW 30 September 2009 > Feature - Smart data handling: An interview with Tevfik Kosar

Q & A - Smart data handling: An interview with Tevfik Kosar

Image courtesy Tevfik Kosar

In e-science, we are constantly striving to improve performance and speed so that we can complete a larger number of more complex computations faster. Tevfik Kosar, a researcher at Louisiana State University, is working on two intertwined projects that could together lead to the sorts of improvements we hope for. Read on to find out what he had to say.

iSGTW: You just received a National Science Foundation grant to work on the Stork Data Scheduler. Can you tell us a little about that project?

Kosar: The funding for the development of Stork Data Scheduler comes from NSF's Strategic Technologies for CyberInfrastructure program. The STCI program funds innovative cyberinfrastructure services which have the potential to significantly advance research capabilities in multiple areas of science and engineering. The grant will provide three years of funding for the enhancement of the Stork Data Scheduler, making it production-level software and distributing it to the community as free, open-source software. Some of the enhanced functionalities of the Stork Data Scheduler will include:

  • data aggregation and caching for improved performance
  • early error detection, classification, and recovery for improved reliability
  • job delegation and distributed data scheduling for increased capability
  • integration with workflow planning and management for end-to-end data-flow automation
  • optimal protocol tuning, and end-to-end performance prediction services for best utilization of available network capacity.
We hope that the resulting Stork Data Scheduler will help to mitigate the end-to-end data handling bottleneck in petascale distributed computing systems for a wide range of applications from different domains.

iSGTW: What distinguishes the Stork Data Scheduler from other similar software currently used or under development?

Kosar: The Stork Data Scheduler, so named because it delivers data, is one of the first examples of a batch scheduler specializing in data movement. Stork aims to make it easier for researchers to share, store and deliver data across distributed computing and storage systems. The Stork data scheduler makes a distinctive contribution to the computational research community because it focuses on planning, scheduling, monitoring and management of data. Unlike existing approaches, Stork treats data resources and their related tasks as primary components of computational resources, not simply as side effects. This will lead to quicker and more effective collaboration among researchers.

Using Stork, researchers can transfer very large data sets with only a single command, making it one of the most powerful data transfer tools available. Stork is compatible with advanced high-performance computing toolkits (i.e. Condor, DAGMan, Pegasus), and researchers can use the software to access the power of these large systems and use them more effectively. Researchers consider the Stork Data Scheduler a highly transformative project because of its potential to dramatically change how scientists perform their research and to rapidly facilitate sharing of experience, raw data, and results. Future applications could rely on Stork to manage storage and data movement reliably and transparently across many systems, eliminating the unnecessary failure of distributed tasks.

The Stork Data Scheduler's Logo. Image courtesy Tevfik Kosar

iSGTW: Now, you also recently received an award from the NSF for your work on a project entitled "Data-aware Distributed Computing for Enabling Large-scale Collaborative Science." What is that project about, and how does it differ from other similar research efforts?

Kosar: In January 2009, I received an NSF CAREER Award for the development of a new computing paradigm called "data-aware distributed computing." Through this CAREER grant, we will develop the theory, algorithms and models for new computing systems that manage data more effectively with automated processes, which enables scientists to spend more time focusing on their research and less time dealing with data.

iSGTW: Does your work on data-aware distributed computing connect with your work on the Stork Data Scheduler? If so, could you explain how?

Kosar: Yes, these two projects are very closely related indeed. As part of my CAREER project, we will perform research and training for the underlying theory for data-aware distributed computing, and develop models and algorithms for data scheduling. As part of the recent STCI project, we will be implementing some the models and algorithms developed as part of my CAREER Award in the production-level Stork software. In that sense, these two projects are complementing each other perfectly.

iSGTW: What's next for you?

Kosar: Data management will continue to be a major challenge for next generation petascale distributed computing systems. I want to continue providing novel solutions in this area which will make the life of a domain scientist easier and will lead to high-impact discoveries in science and engineering.

—Interview by Miriam Boon for iSGTW


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map