Of bantam brains and fancy footwork: bioinformatics tools help reveal complexity of avian evolution | News

Biosciences, Biotechnology

Of_bantam_brains_and_fancy_footwork:_bioinformatic
03.04.2024

Of bantam brains and fancy footwork: bioinformatics tools help reveal complexity of avian evolution

In 2014 the Science journal featured an article on the bird tree of life, mentioning the essential role of algorithms and supercomputers that enable modern research in evolutionary biology for all types of living beings. Now, a decade and a giant leap in tool development later, part of the team that coordinated the computer analyses at that time co-authored another paper in Nature on the complexity of avian evolution.

Phylogenetic relationships are key to understanding the evolution of species. These relationships are typically identified by comparing, among other things, similarities in DNA or anatomical features. An international team of researchers from the “Bird 10,000 Genomes Project” (B10K) have now analyzed genomes of 363 bird species by using their intergenic regions and a plethora of computational methods. The result is a well supported tree, which however also exhibits a stunning degree of discordance.

In order to attain these results, large amounts of data are necessary to resolve discrepancies, which can be due to the diversity of species sampled, the phylogenetic method used, and the choice of genomic regions. Some of the most essential tools for processing these data were developed by the team of the Computational Molecular Evolution group (CME) at the Heidelberg Institute for Theoretical Studies (HITS), together with scientists from its sister group, the Biodiversity Computing Group (BCG) at the Institute of Computer Science (ICS) of the Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece.

Enabling research in evolutionary biology

"The new computational approaches allowed us to reconstruct over 150,000 local phylogenies across the whole genome, each of which provides a small window into the evolutionary history of birds." says Josefin Stiller, one of the lead authors of the study and former visitor of CME at HITS.

Dr. Alexandros Stamatakis

“What we mainly do is to enable research in evolutionary biology via software, algorithms and model development ,” says CME group leader Alexandros Stamatakis, who also holds an EU-funded ERA chair at FORTH. “The ParGenes software, for example, which is very central for the paper, can efficiently schedule the inference of a huge number of per-gene phylogenetic trees on distinct input gene datasets on a large compute cluster. This is classic fundamental computer science as it focuses on efficient job scheduling.”

"The diversity of birds. Paintings: Jon Fieldsa. Design: Josefin Stiller."

ParGenes is based on RAxML-NG, the group’s flagship phylogenetic inference tool, and Modeltest-NG, a tool for selecting the best fit statistical model of evolution for a given dataset. The NG in the tool names stands for Next Generation, which denotes a set of existing tools (mainly the group’s own tools) that were completely re-designed and re-written since 2014 to yield them more maintainable, versatile, and scalable. Especially RAxML-NG is very flexible in the sense that it seamlessly scales from the laptop to a supercomputer. It was used as stand-alone tool for this paper to infer a tree on the dataset comprising the entire genomes on a supercomputer.

“Pythia”: Machine learning to predict phylogenetic difficulty

“A late addition to this paper was the Pythia difficulty prediction developed by Julia Haag, a PhD student in my group. Given an input dataset it predicts how difficult a phylogenetic inference on that dataset will be, that is, how much signal for a single tree there is in the data, using machine learning techniques,” says Stamatakis. “As our Nature paper focuses a lot on assessing the phylogenetic and evolutionary signal in different genomic regions of the bird genome, it was a very useful addition to the paper as we can now also provide phylogenetic difficulty scores for the distinct genomic regions assessed.”

A versatile and flexible tool for researchers

The tools the CME group have developed and that are being used in this paper are all open source and extremely highly cited. Especially the RAxML-NG tool regularly enables research in different disciplines of the life sciences. During the pandemic, the tool was for example used to analyze how the distinct viral strains evolved.

“In this project, we provide the basic toolbox for our fellow scientists that actually enables them to do their science,” says Stamatakis. “I personally find this very gratifying.”

You can read the publication: https://www.nature.com/articles/s41586-024-07323-1