A Comprehensive Protocol for Bayesian Phylogenetic Analysis Using MrBayes: From Sequence Alignment to Model Selection and Phylogenetic Inference

Jinxing Wang; Fangmin Chen; Xu Xiao; Xinyao Yang; Wanting Xia

doi:10.21769/BioProtoc.5276

Improve Research Reproducibility A Bio-protocol resource

Submit a Protocol
Receive Our Alerts
Log in
/
Sign up
- My Bio Page
- Edit My Profile
- Change Password
- Log Out
EN
- EN - English
- CN - 中文

Peer-reviewed

A Comprehensive Protocol for Bayesian Phylogenetic Analysis Using MrBayes: From Sequence Alignment to Model Selection and Phylogenetic Inference

JW Jinxing Wang

FC Fangmin Chen email

XX Xu Xiao

XY Xinyao Yang

WX Wanting Xia

Published: Vol 15, Iss 8, Apr 20, 2025 DOI: 10.21769/BioProtoc.5276 Views: 1231

Reviewed by: Prashanth N SuravajhalaSuresh PantheeAnonymous reviewer(s)

PDF

Ask a question

How to cite

Favorite

Cited by

Protocol Collections

Cell Imaging - A Special Collection for Cell Bio 2023

See all

Related protocols

Annotated Bioinformatic Pipelines for Genome Assembly and Annotation of Mitochondrial Genomes

Jessica C. Winn [...] Simo N. Maduna

Mar 5, 2025 1537 Views

Annotated Bioinformatic Pipelines for Phylogenomic Placement of Mitochondrial Genomes

Jessica C. Winn [...] Simo N. Maduna

Mar 5, 2025 1332 Views

PCR-Based Genotyping of Zebrafish Genetic Mutants

Swathy Babu [...] Ichiro Masai

Mar 20, 2025 1375 Views

Abstract

Bayesian phylogenetic analysis is essential for elucidating evolutionary relationships among organisms. Traditional methods often rely on fixed models and manual parameter settings, which can limit accuracy and efficiency. This protocol presents an integrated workflow that leverages GUIDANCE2 for rigorous sequence alignment, ProtTest and MrModeltest for robust model selection, and MrBayes for phylogenetic tree estimation through Bayesian inference. By automating key steps and providing detailed command-line instructions, this protocol enhances the reliability and reproducibility of phylogenetic studies.

Key features

• Robust sequence alignment: Combines GUIDANCE2 and MAFFT to handle complex evolutionary events.

• Automated model selection: Utilizes ProtTest and MrModeltest for protein evolution models and nucleotide substitution models, respectively.

• Streamlined workflow: Provides step-by-step instructions from sequence alignment to phylogenetic tree estimation through Bayesian inference.

Keywords: Bayesian phylogenetic analysis

Evolutionary model selection

Background

Phylogenetic analysis plays a critical role in understanding the evolutionary relationships among species, informing diverse fields such as evolutionary biology, epidemiology, and conservation genetics. The process of generating a phylogenetic tree typically involves key steps including sequence alignment, model selection, and tree inference, each of which is essential for deriving reliable evolutionary conclusions. However, traditional phylogenetic workflows often involve manual sequence alignment and model selection, introducing potential biases and inefficiencies.

To address these challenges, numerous computational tools have been developed. For example, GUIDANCE2 enhances sequence alignment by accounting for alignment uncertainty and evolutionary events such as insertions and deletions [1]. Model selection tools like Protest [2] and MrModeltest2 [3] automate the identification of optimal evolutionary models using statistical criteria such as AIC and BIC, thereby improving the reliability of downstream phylogenetic inferences. Besides, tools such as PAUP* [4] enable comprehensive phylogenetic analysis for nucleotide sequences, while MEGA X [5] facilitates sequence format conversion and preliminary analyses.

Beyond these tools, several non-Bayesian phylogenetic inference methods offer powerful alternatives with distinct advantages. The PHYLIP package [6] provides a comprehensive suite of programs implementing distance matrix, maximum parsimony, and maximum likelihood methods, making it a versatile choice for diverse phylogenetic analyses. Maximum likelihood-based programs like RAxML [7] and IQ-TREE [8] have revolutionized the field with their computational efficiency and accuracy, especially for large datasets. FastTree [9] employs heuristic approaches to construct approximately maximum-likelihood phylogenetic trees with remarkable speed while maintaining reasonable accuracy. PhyML [10] offers robust algorithms for maximum likelihood tree estimation with extensive substitution model options and branch support assessment. These non-Bayesian tools provide complementary strengths to Bayesian methods, often excelling in computational efficiency while still delivering statistically sound phylogenetic inferences.

Bayesian methods, particularly those implemented in MrBayes [11], provide a robust probabilistic framework for estimating phylogenetic trees and evolutionary parameters by incorporating uncertainty and prior knowledge. However, integrating these tools into a cohesive and reproducible workflow remains challenging due to differing format requirements between tools. For example, GUIDANCE2 accepts FASTA/PHYLIP inputs, MrBayes requires NEXUS format [12], and PAUP* demands non-interleaved NEXUS [4,13] for its analyses. These diverging specifications create hidden technical barriers. Our protocol addresses these challenges by presenting a seamless, step-by-step guide that integrates sequence alignment, model selection, and Bayesian inference using MrBayes. It automates critical steps, minimizes manual intervention, reduces potential errors, and ensures reproducibility. Custom Python scripts are included to streamline the parsing of model selection outputs, enhancing data handling efficiency. This structured protocol simplifies the phylogenetic analysis process, improves the accuracy and reliability of results, and is applicable to diverse datasets, including both protein and nucleotide sequences.

The NEXUS format is a common data format for phylogenetic analysis, facilitating greater cooperation in the analysis and visualization of data. PAUP* reads data in NEXUS file format, and all NEXUS files must begin with the declaration "#NEXUS". The Newick format [14] is another widely used format for representing phylogenetic trees, and it is supported by many phylogenetic analysis tools. The protocol leverages MEGA for initial format conversions and PAUP* for format refinement, ensuring seamless data handoffs between tools and preventing pipeline failures from format mismatches. This approach systematically addresses integration challenges and provides a versatile and reliable resource for researchers conducting rigorous evolutionary studies.

Software and datasets

All procedures in this protocol were developed and tested on Windows 10.

1. Python (Version: 3.13.1)

a. Homepage: https://www.python.org/

b. Downloads: https://www.python.org/ftp/python/3.13.1/python-3.13.1-amd64.exe

c. Platform: Windows