iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW 16 December 2009 > Feature - GRAPPAling with evolutionary history

Feature - GRAPPAling with evolutionary history

This figure illustrates how gene order changes among the eight species. Each thin line represents a single gene and its position in the different species. Most genes are conserved on the same chromosomal arm or Muller element, but gene order is shuffled between species. This figure appeared in the July 2008 issue of Genetics. Image courtesy of Arjun Bhutkar, Stephen Schaeffer et. al., with permission from The Genetics Society of America.

We’ve known for several years now that chimpanzees share 96 percent of our DNA. Our technology tells us how closely humans and chimps are related. But it doesn’t tell us how we’re related. We need new technology for that.

Enter GRAPPA – or Genome Rearrangements Analysis under Parsimony and other Phylogenetic Algorithms if you want a mouthful. GRAPPA has already been used to analyze the evolution of organelles such as chloroplasts and mitochondria, running on cluster computers with upwards of 500 processors. To analyze more complex organisms, however, the team that develops GRAPPA will have to take the code to an entirely new level – the petascale level.

With the help of a $1 million grant from the National Science Foundation, computational science researchers David Bader of Georgia Tech and Jijun Tang of the University of South Carolina have joined forces with genetics researcher Stephen Schaeffer, from Pennsylvania State University, to do just that.

“One of the first open science petascale systems will be the IBM BlueWaters resource supercomputer,” Bader said. “Our goal is to scale GRAPPA up to use that magnificent computer.”

The trick is in the algorithms, according to Bader. “As an example, our first version of GRAPPA eight years ago took an hour and a half to run a problem involving the chloroplasts of a dozen species of bluebell flowers on a 512 processor linux cluster,” Bader said.

A “Tree of Life” diagram. Courtesy of David Bader.

Since then, GRAPPA has undergone a variety of refinements. It shows. “Today that same problem solved in a more biologically meaningful way takes less than five minutes on my laptop,” Bader said. “The biggest speed improvement comes from better algorithms.”

Refining GRAPPA will require a better take on both the science and the programming, which makes the interdisciplinary team a perfect fit.

Over the next four years, the grant – which comes via the American Recovery and Reinvestment Act – will pay for two graduate students at each of the three participating schools. Not only will they be creating invaluable tools for advancing science, but they’ll also be learning very specialized skills in the process.

“I think we’re the pioneers in large-scale evolutionary or reconstruction of evolutionary trees,” Bader said. “We’re breaking new ground with each new algorithm and implementation, where no one has studied before.”

It isn’t clear how far these funds will take the team. Over the course of the grants lifetime, they hope to continue to refine the code, doing test runs on prototypical petascale systems. Perhaps they will be able to study fish. Or mammals. Or even humans.

Either way, a new and improved GRAPPA will be valuable in innumerable ways. Already, it has been used to develop biochemical products, identify target drug receptors, create safer pesticides, and study how viruses evolve in order to make better vaccines.

“For instance, if you know that there’s a specific plant that has a property but that plant is rare or difficult to find, you may be interested in what plants are phylogenetically close or closely related in the family tree,” Bader said. “They may be more abundant or easier to use to produce the right biochemical substance.”

It is clear that Bader believes this is only the beginning of a new era. “If I look at other communities in scientific computing, they’ve matured their methods and techniques over the course of decades to centuries,” Bader said. “We’ve only known about the structure of DNA for fifty years, and we’ve only had the ability to sequence full genomes in the last several years. So we’re really at the infancy of what we see in the area of understanding biological sciences through computational methods.”

Miriam Boon, iSGTW


 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map