iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW - 17 June 2009 > Feature - Grid-enabled virus hunting

Feature - Grid-enabled virus hunting

3D replica of senecavirus, a pathogen discovered several years ago by researchers in Pennsylvania. UC San Francisco researcher Eric Delwart and his colleague Chunlin Wang of Stanford University use the RENCI-developed TeraGrid Science Gateway and the Open Science Grid to access grid computing resources in their search for new viruses.

Image courtesy of the Institute for Animal Health, UK

Researchers use distributed computing to make sequence analysis faster and easier.

DNA sequencing and sequence analysis happens daily in many biological sciences laboratories, but analyzing large sets of genetic data increasingly requires computing resources beyond the capabilities of most labs.

The search for the best hardware and software led Eric Delwart, a professor of laboratory medicine at the University of California, San Francisco, and a senior investigator at the Blood Systems Research Institute, and Chunlin Wang, a research associate at the Stanford University Genome Technology Center, to the Renaissance Computing Institute's (RENCI) Engagement Team and then to the distributed computing resources of the TeraGrid and the Open Science Grid (OSG).

Delwart works with Wang to identify new viruses. The team uses a technique called massively parallel pyrosequencing, which can determine sequences for millions of DNA fragments using high-throughput computing. The resulting DNA sequences are then compared to all the sequences in public sequence databases to identify viral fingerprint sequences. One single sequencing reaction generates massive volumes of data that can take months, even years, to analyze on a small-scale computing cluster.

In an effort to analyze more data more efficiently, Delwart and Wang turned to the RENCI Engagement Team, which participates in the TeraGrid Science Gateways program and leads the OSG Engagement program. TeraGrid’s Science Gateways aim to bring new communities of users to TeraGrid resources by providing easy access to the TeraGrid’s distributed computing resources. The OSG Engagement program recruits new users from a wide range of disciplines and helps them become users of the distributed computing systems operated and maintained by OSG members.

The effort to find more computing power paid off for the Delwart and Wang: In early May, they used 70,000 CPU hours on TeraGrid and OSG resources to complete in a week a DNA sequence analysis that would have taken over three months on their own lab cluster. The team submitted its jobs to the national resources using a RENCI-developed, Web services-based computational science platform (link to PDF).

Distribution of BLAST jobs from one Pyrosequencing run (96,866 jobs total, 29 April to 7 May 2009) with the glide-in factory configured for: three TeraGrid resources, one OSG resource, one RENCI resource, and one UNC-CH resource.

Image courtesy of John McGee, Jason Reilly, and Mats Rynge (RENCI)

“We created an application that communicates with RENCI's TeraGrid Science Gateway,” said Jason Reilly, a RENCI senior research software developer. “For the user, it’s very simple — just log in and the application maps the data input to specific tasks to be done. The beauty is you don’t have to submit commands over and over again. You can run hundreds or even thousands of operations and you only have to submit the command once.”

The custom application created by Reilly was dubbed BLASTMaster because it builds on the Basic Local Alignment Search Tool (BLAST) used to search sequence databases. BLASTMaster divides commands into tasks and pushes the work to RENCI's TeraGrid Science Gateway, which submits, monitors, and manages the compute workload on systems that are part of TeraGrid’s nationwide network of high performance machines, and to OSG machines. After entering the initial commands, the researchers merely had to wait for their results.

“Large computer farms that we might use are often composed of heterogeneous smaller clusters,” said Wang. “The BLASTMaster tool and a Web services environment is particularly useful to those of us without much experience using compute clusters. It gives us a uniform interface to submit jobs, which greatly enhances our productiveness.”

The sequence analysis work used TeraGrid resources at Purdue University (West Layfayette, IN), OSG resources at RENCI (Chapel Hill, NC) and a cluster in the University of North Carolina at Chapel Hill computer science department supported by the National Institutes of Health. The work has real-world value taken straight from recent headlines about the H1N1 virus.

“Knowing the genomic sequence of a human virus allows for quicker diagnostics to identify infections,” said Delwart. “Quicker diagnostics can lead to more informed decisions on how an emerging virus is spread and how to control it. Knowing the sequence can also help make vaccines or anti-virals against that virus.”

Karen Green, RENCI


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map