种系遗传学 -基因组学 -系统生物学-BIO-PROTOCOL

Classification of a Massive Number of Viral Genomes and Estimation of Time of Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 Using Phylodynamic Analsysis

使用系统动力学对大量病毒基因组分类并预估SARS-CoV-2最近共同祖先年代（tMRCA）

XH Xiaowen Hu SG Siqin Guan YH Yiliang He GY Guohui Yi LY Lei Yao* JZ Jiaming Zhang*

0 Q&A 1072 Views Mar 20, 2024

Estimating the time of most recent common ancestor (tMRCA) is important to trace the origin of pathogenic viruses. This analysis is based on the genetic diversity accumulated in a certain time period. There have been thousands of mutant sites occurring in the genomes of SARS-CoV-2 since the COVID-19 pandemic started; six highly linked mutation sites occurred early before the start of the pandemic and can be used to classify the genomes into three main haplotypes. Tracing the origin of those three haplotypes may help to understand the origin of SARS-CoV-2. In this article, we present a complete protocol for the classification of SARS-CoV-2 genomes and calculating tMRCA using Bayesian phylodynamic method. This protocol may also be used in the analysis of other viral genomes.

Key features

• Filtering and alignment of a massive number of viral genomes using custom scripts and ViralMSA.

• Classification of genomes based on highly linked sites using custom scripts.

• Phylodynamic analysis of viral genomes using Bayesian evolutionary analysis sampling trees (BEAST).

• Visualization of posterior distribution of tMRCA using Tracer.v1.7.2.

• Optimized for the SARS-CoV-2.

Graphical overview

Graphical workflow of time of most recent common ancestor (tMRCA) estimation process

Phylogenetic Inference of Homologous/Orthologous Genes among Distantly Related Plants

远缘植物中同源/直系同源基因的系统发育推断

ZX Zilong Xu WS Wenyan Sun

Ziqiang Zhu BZ Bojian Zhong ZZ Zhenhua Zhang*

0 Q&A 577 Views Dec 5, 2023

The recent surge in plant genomic and transcriptomic data has laid a foundation for reconstructing evolutionary scenarios and inferring potential functions of key genes related to plants’ development and stress responses. The classical scheme for identifying homologous genes is sequence similarity–based searching, under the crucial assumption that homologous sequences are more similar to each other than they are to any other non-homologous sequences. Advances in plant phylogenomics and computational algorithms have enabled us to systemically identify homologs/orthologs and reconstruct their evolutionary histories among distantly related lineages. Here, we present a comprehensive pipeline for homologous sequences identification, phylogenetic relationship inference, and potential functional profiling of genes in plants.

Key features

• Identification of orthologs using large-scale genomic and transcriptomic data.

• This protocol is generalized for analyzing the evolution of plant genes.

Computational Analysis and Phylogenetic Clustering of SARS-CoV-2 Genomes

SARS-CoV-2基因组计算分析及系统进化聚类分析

BJ Bani Jolly VS Vinod Scaria*

1 Q&A 5074 Views Apr 20, 2021

COVID-19, the disease caused by the novel SARS-CoV-2 coronavirus, originated as an isolated outbreak in the Hubei province of China but soon created a global pandemic and is now a major threat to healthcare systems worldwide. Following the rapid human-to-human transmission of the infection, institutes around the world have made efforts to generate genome sequence data for the virus. With thousands of genome sequences for SARS-CoV-2 now available in the public domain, it is possible to analyze the sequences and gain a deeper understanding of the disease, its origin, and its epidemiology. Phylogenetic analysis is a potentially powerful tool for tracking the transmission pattern of the virus with a view to aiding identification of potential interventions. Toward this goal, we have created a comprehensive protocol for the analysis and phylogenetic clustering of SARS-CoV-2 genomes using Nextstrain, a powerful open-source tool for the real-time interactive visualization of genome sequencing data. Approaches to focus the phylogenetic clustering analysis on a particular region of interest are detailed in this protocol.

HoSeIn: A Workflow for Integrating Various Homology Search Results from Metagenomic and Metatranscriptomic Sequence Datasets

HoSeIn：整合来自宏基因组和元转录组序列数据集的各种同源性搜索结果的工作流

GR Gaston Rozadilla JM Jorgelina Moreiras Clemente

Christina B. McCarthy*

0 Q&A 3331 Views Jul 20, 2020

Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy. When using taxonomy-dependent alignment-based methods to classify and label reads, the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show inconsistencies, which can be notorious when working with RNA-seq. Moreover, and to the best of our knowledge, existing tools do not criss-cross and integrate information from the different homology searches, but provide the results of each analysis separately. We developed the HoSeIn workflow to intersect the information from these homology searches, and then determine the taxonomic and functional profile of the sample using this integrated information. The workflow is based on the assumption that the sequences that correspond to a certain taxon are composed of:
1) sequences that were assigned to the same taxon by both homology searches;
2) sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one.

Genome-wide Estimation of Evolutionary Distance and Phylogenetic Analysis of Homologous Genes

进化距离的全基因组评估和同源基因的系统进化分析

Meixia Zhao* BZ Biao Zhang JM Jianxin Ma DL Damon Lisch

0 Q&A 6734 Views Dec 5, 2018

Homologous genes, including paralogs and orthologs, are genes that share sequence homologies within or between different species. Homologous genes originate from a common origin through speciation, genetic duplication or horizontal gene transfer. Estimation of the sequence divergence of homologous genes help us to understand divergence time, which makes it possible to understand the evolutionary patterns of speciation, gene duplication and gene transfer events. This protocol will provide a detailed bioinformatics pipeline on how to identify the homologous genes, compare their sequence divergence and phylogenetic relationships, focusing on homologous genes that show syntenic relationships using soybean (Glycine max) and common bean (Phaseolus vulgaris) as example species.

Extraction of DNA from Murine Fecal Pellets for Downstream Phylogenetic Microbiota Analysis by Next-generation Sequencing

采用新一代测序技术从小鼠粪粒中提取DNA进行下游微生物群进化分析

EE Elien Eeckhout AW Andy Wullaert*

2 Q&A 10925 Views Feb 5, 2018

Mouse models are widely used to evaluate the potential impact of the gut microbial composition on health and disease. Standardized protocols for sampling and storing murine feces, as well as for extracting DNA from these fecal pellets are needed to limit experimental variation between different studies. Both efficient lysis of the microbiota and the quality of the obtained fecal DNA are important for allowing the downstream next-generation sequencing to cover the phylogenetic diversity of both Gram-negative and Gram-positive bacteria living in the mouse gut. Here we present a detailed protocol for fecal sample collection and DNA extraction that we validated in a study on the impact of inflammasomes on the murine gut microbiota. This protocol for DNA extraction from murine fecal pellets utilizes a combination of mechanical and chemical lysis, which aligns with the procedure that was recently recommended as a benchmark protocol for DNA extraction from human feces.

Character-State Reconstruction to Infer Ancestral Protein-Protein Interaction Patterns

特征状态重建法推断遗传蛋白质相互作用模式

FR Florian Rümpler GT Günter Theißen

Rainer Melzer*

0 Q&A 12278 Views Aug 20, 2015

Protein-protein interactions are at the core of a plethora of developmental, physiological and biochemical processes. Consequently, insights into the origin and evolutionary dynamics of protein-protein interactions may provide information on the constraints and dynamics of specific biomolecular circuits and their impact on the organismal phenotype.

This protocol describes how ancestral protein-protein interaction patterns can be inferred using a set of known protein interactions from phylogenetically informative species. Although this protocol focuses on protein-protein interaction data, character-state reconstructions can in general be performed with other kinds of binary data in the same way.