New Phylogenetic Software Can Deal with the SARS-COV-2 Information Load


An instance phylogenetic tree (left) and its corresponding index tree.

Researchers at UC San Diego, in collaboration with UC Santa Cruz, have developed a brand new software program software for tracing and mapping the evolution of the SARS-CoV-2 virus, that’s able to dealing with the unprecedented quantity of genetic information being generated by the shortly evolving pathogen. The software program is used to effectively and precisely observe new variants of this virus on what’s referred to as a phylogenetic tree: a visible historical past or map of an organism’s genetic adjustments and variations over time and geography. Utilizing this new optimization software, referred to as matOptimize, researchers are actually capable of extra precisely observe the viral genome of SARS-CoV-2,  mapping new variants onto the phylogenetic tree as they develop, and monitoring the evolutionary and transmission dynamics of the virus. 

The software was described within the journal Bioinformatics, with UC San Diego undergraduate laptop engineering scholar Cheng Ye as first creator. Hear extra about Ye’s journey to analysis as an undergraduate, and his expertise engaged on such a well timed venture, on this Q&A. 

“With over 10 million SARS-CoV-2 genome sequences now accessible, sustaining an correct, complete phylogenetic tree of all accessible SARS-CoV-2 sequences is changing into computationally infeasible with current software program, however is important for getting an in depth image of the virus’ evolution and transmission,” the researchers, underneath the course of UC San Diego Electrical and Pc Engineering Professor Yatish Turakhia, write within the paper. 

At the moment, this system used for  SARS-CoV-2 phylogeny known as UShER: Ultrafast Pattern placement on Present tRee. UShER was developed by Turakhia as a postdoctoral researcher at UC Santa Cruz, and is utilized by UC Santa Cruz to take care of the SARS-CoV-2 phylogeny. It’s publicly viewable at –

A couple of months into the pandemic, UShER confronted a problem with including new genetic sequences onto the tree; the group would add sequences step-wise, one after the other, however when the genetic sequence enter was incorrect or ambiguous, the system would lose accuracy. 

 “UShER would make a guess: an informed guess, however nonetheless a guess,” stated Turakhia. 

Thus, these sequences would often be sub-optimally positioned on the tree, producing false mutations. As a way to refine these placements, a tree optimizing technique was wanted. Nonetheless, current tree optimizers had been unable to maintain up with the quantity of  SARS-CoV-2 genetic information being generated, with at the moment 10 million sequences mapped and as much as 100,000 sequences added every day.


Cheng Ye, left, was awarded the Electrical and Pc Engineering Greatest Undergraduate Analysis Award for his work on matOptimize. His advisor, Professor Yatish Turakhia, is pictured at proper. 

That’s when Turakhia labored with Ye and different college students in his lab on the problem of making a greater tree optimizer. Ye had joined Turakhia’s lab by way of the Electrical and Pc Engineering Summer time Analysis Internship Program (SRIP) in January 2021. When it turned clear to Turakhia that Ye’s fundamentals in information constructions, parallel algorithms, programming, and bioinformatics had been fairly sturdy, he entrusted him with taking a number one function on this activity.

 “I used to be initially assigned to work on accelerating sequence alignment on graphic processing models, however I believed the SARS-COV-2 phylogeny venture may be extra thrilling, and it certainly was,” stated Ye. 

“In these days [Cheng] turned an knowledgeable in tree-optimization,” stated Turakhia.

 Lots of the current tree optimizers had been closed supply, so Ye was compelled to work with what was accessible within the literature to plan an answer to the information problem. After a number of months of analysis, Ye developed matOptimize, at the moment the one software able to maintaining with the quantity of quickly evolving SARS-CoV-2 genetic information.

As a way to obtain this, Ye created a real parallel software program, with processing  distributed over a number of CPUs, and a considerably decrease reminiscence requirement. This permits it to be scaled to the extent of information required within the SARS-CoV-2 phylogeny. 

At this time, UShER because the phylogenetic tree software program and matOptimize because the tree optimization technique, are getting used collectively to characterize the SARS-CoV-2 phylogeny. There’s now a whole catalog of genetic sequences which, from phylogenetic inferences, are highlighted as extra harmful or transmissible sequences which UC San Diego and UC Santa Cruz scientists proceed to trace.

Shifting ahead, Turakhia’s group is utilizing this info to review the recombination of SARS-CoV-2, a phenomenon which will result in newer, harmful variants.

 “In collaboration with Professor Russell Corbett-Detig’s group at UC Santa Cruz, Cheng and I developed a software program referred to as RIPPLES, that may sensitively detect recombinants in 1000x bigger datasets,” stated Turakhia. “This software program will assist monitor the emergence of latest SARS-CoV-2 recombinants and is more likely to be utilized to different pathogens as properly sooner or later.”


Supply hyperlink