iSGTW - International Science Grid This Week
iSGTW - International Science Grid This Week

Home > iSGTW - 15 September 2010 > Feature - Deciphering the tree of life

Feature - Deciphering the tree of life

Image courtesy of Miriam Boon.

What’s a bee without its honey, a butterfly without a flower’s nectar? It’s a pretty puzzle posed by the fossil record, which suggests that insects evolved long before flowering plants did.

With the rise of genetics, a new window has opened onto evolution – one that could provide a fresh perspective on old problems such as the disparity between insect and angiosperm (flowering plant) evolution.

Computational phylogenetics is the development of computational and mathematical techniques that aid in the estimation of evolutionary history, using molecular data such as protein and DNA sequences to construct a “tree of life.” To calibrate their molecular results, evolutionary biologists add the fossil record to the mix, and assume that a new species will evolve at the same rate as its ancestors.

“Often they [other researchers] would force the date of angiosperms to correspond with the fossil records,” said Stephen Smith, a computational evolutionary biologist at Brown University. (Smith was at NESCENT, the National Evolutionary Synthesis Center, when this interview was conducted.)

Despite the need for calibration, constraining models to closely match the fossil record is a solution that, by its very nature, cannot resolve problems with the fossil record itself.

By including species closely related to the flowering plants for which there exist less controversial fossil records, Smith and his colleagues (Michael Donoghue and Jeremy Beaulieu of Yale University) hoped to avoid these issues.

“We wanted to include a broader set of species outside of angiosperms, so that we could put the age of flowering plants in a wider perspective,” Smith explained.

Smith and his colleagues assumed that if a 130 million year old fossil of a species has been found, then that species must be at least that old. Conversely, they did not assume that the absence of older fossils means that the species did not yet exist.

Angiosperms begin to show up in the fossil record between 130 and 140 million years ago, according to Smith, while more traditional computational phylogenetic computations have found slightly older times. But fossils of the wasps, bees, butterflies, and moths which make up the modern pollination ecosystem began to show up earlier still, sometime in the Jurassic and Upper Triassic eras (145-230 million years ago).

With their new approach to mapping out the angiosperm tree of life, Beaulieu, Donoghue and Smith calculated that angiosperms evolved approximately 215 million years ago; the new results match closely with the evolution of the associated insects.

If other experiments support their results, their work could have a variety of implications.

“We have to explore the role that flowering plants played in the evolution of insects,” Smith said. “We thought it was somewhat small, but this might help us re-evaluate what that relationship would be.”

Of course, these results leave some lingering questions to answer, the least of which is why these plants did not show up in the fossil record for 75 million years.

“Some people would suggest it’s because they weren’t there; others would suggest that we found a lot of fossils that were older than that that were ambiguous – we don’t know where they go, and so maybe those were angiosperms and we just didn’t recognize them,” Smith said. Or, he suggested, angiosperms may have first evolved in an environment that is less conducive to fossilization.

This example of a phylogenetic tree was generated using data from a class on entemology.

Image courtesy of Robert Svensson under Creative Commons 2.0 Attribution-Non-Commercial Generic License.

Scaling to the super-tree

Smith’s work on angiosperms used many open source programs including phyutility and BEAST on a run-of-the-mill computer cluster. But with about 150 species, angiosperms contain only a small number of species relative to the entire tree of life. And this is one type of calculation that does not scale nicely.

“We need to think about algorithm design where the key insight for getting a speed up is not just parallelism,” said Tandy Warnow, principal investigator for CIPRES (Cyberinfrastructure for Phylogenetic Research). “There has to be a different algorithm design to get that improvement in running time.”

It’s not that parallelism isn’t helpful. In a recent analysis of 28,000 sequences Warnow conducted at the Texas Advanced Computing Center, parallelism brought the computational time down from five years to one month. Yet that time scales exponentially with the number of sequences, and if they are to meet the grand challenges of phylogenetics, researchers will need algorithms that can handle upwards of 500,000 sequences in a reasonable amount of time.

Accuracy is another problem researchers must surmount. The most accurate methods of deducing a phylogeny through analysis of genetic sequences do not scale to large data sets. To successfully analyze larger sections of the “super-tree,” researchers will need to find algorithms that scale well while remaining sufficiently accurate.

Assembling those large sections into a single super tree of life will pose yet another challenge. Under the United States National Science Foundation’s Assembling the Tree of Life funding program alone, there is a plethora of tree projects. Some, such as CIPRES, are generally helpful. Others, like iPlant, are tackling a fairly large field. But most are much smaller. There’s GRAToL, for green algae, and RedToL, for (you guessed it) red algae. PorToL for porifera (sponges), EToL for euteleost (ray-finned fish), and WormNet II for, well, worms. The list goes on.

“What has not been happening there is getting some top level coordination,” Warnow said. “Each one is going to give you a tree, and each of these trees are going to have some error in it. How do you put all of those trees together?”

This isn’t just, as it might appear at first glance, a social problem of getting scientists to play nice together. It would clearly be helpful for research groups to be thinking about that future time when they will try to put their puzzle pieces together. But there is presently no satisfactory solution for combining those trees when the time does come. Methods for combining trees do exist. But so far, they have combined trees far too small to serve as proof of concept for the eventual super-tree.

Despite the many challenges that face the field, Warnow expressed hopes that a rough super-tree could be assembled in the next decade. Said Warnow, “I think we’re going to make tremendous breakthroughs in terms of accuracy and scalabilty in the next five years or so.”

—Miriam Boon, iSGTW

 iSGTW 22 December 2010

Feature – Army of Women allies with CaBIG for online longitudinal studies

Special Announcement - iSGTW on Holiday

Video of the Week - Learn about LiDAR


NeHC launches social media

PRACE announces third Tier-0 machine

iRODS 2011 User Group Meeting

Jobs in distributed computing


Enter your email address to subscribe to iSGTW.


 iSGTW Blog Watch

Keep up with the grid’s blogosphere

 Mark your calendar

December 2010

13-18, AGU Fall Meeting

14-16, UCC 2010

17, ICETI 2011 and ICSIT 2011

24, Abstract Submission deadline, EGI User Forum


January 2011

11, HPCS 2011 Submission Deadline

11, SPCloud 2011

22, ALENEX11

30 Jan – 3 Feb, ESCC/Internet2


February 2011

1 - 4, GlobusWorld '11

2, Lift 11

15 - 16, Cloudscape III

More calendar items . . .


FooterINFSOMEuropean CommissionDepartment of EnergyNational¬†Science¬†Foundation RSSHeadlines | Site Map