搜索

Character-State Reconstruction to Infer Ancestral Protein-Protein Interaction Patterns
特征状态重建法推断遗传蛋白质相互作用模式   

下载 PDF 引用 收藏 提问与回复 分享您的反馈

本文章节

Abstract

Protein-protein interactions are at the core of a plethora of developmental, physiological and biochemical processes. Consequently, insights into the origin and evolutionary dynamics of protein-protein interactions may provide information on the constraints and dynamics of specific biomolecular circuits and their impact on the organismal phenotype.
This protocol describes how ancestral protein-protein interaction patterns can be inferred using a set of known protein interactions from phylogenetically informative species. Although this protocol focuses on protein-protein interaction data, character-state reconstructions can in general be performed with other kinds of binary data in the same way.

Keywords: Protein-protein interaction(蛋白质-蛋白质相互作用), Phylogeny(系统发育), Character-state reconstruction(人物状态重构), Mesquite(牧豆树), Markov model(马尔可夫模型)

Data

  1. Protein-protein interaction data
    A comprehensive list of interactions for the protein family under study should be compiled. As interaction data are typically generated only for proteins whose sequences have been deposited in databases, a recently published comprehensive phylogeny of the protein family under study may yield an upper estimate of the number and phylogenetic breadth of interaction data to be expected. In many cases recently published phylogenetic relationships need to be extracted from the publications itself, however a growing number of phylogenies are being uploaded in online databases such as TreeBASE (http://treebase.org/treebase-web/home.html) or Dryad (http://datadryad.org/).
    1. For obtaining data on protein-protein interactions, databases might be used. Prominent examples of such databases include BioGRID (http://thebiogrid.org/) (Chatr-Aryamontri et al., 2013), the Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu/) (Salwinski et al., 2004), IntAct (http://www.ebi.ac.uk/intact/) (Orchard et al., 2014) and String (http://string-db.org/) (Franceschini et al., 2013), to mention but a few. Above all, UniProt (http://www.uniprot.org/) (UniProt 2015) provides cross-references to a number of these database, thus facilitating searches for potential interaction partners.
    2. Whereas database searches provide a good starting point, they very often do not capture all of the information available. It is therefore advisable to undertake a literature search. Special emphasize should be put on obtaining information from phylogenetically informative proteins, i.e. from proteins that occupy a position in the phylogeny that is critical for resolving the state of a particular trait (i.e. the character-state). Very often these are the early-diverging lineages, as their inclusion (together with more derived taxa) ensures that the whole phylogenetic breadth of a taxonomic group is captured. It might prove useful to obtain new experimental data for proteins that are phylogenetically especially informative. Indeed, generation of new protein-protein interaction data is often combined with character-state reconstruction to better understand the evolution of protein-protein interactions (Liu et al., 2010; Melzer et al., 2014; Li et al., 2015).

  2. Sequence retrieval
    Protein or nucleotide sequences for phylogenetic reconstructions can be retrieved from the NCBI nucleotide collection (http://www.ncbi.nlm.nih.gov/nuccore) or the NCBI protein collection (http://www.ncbi.nlm.nih.gov/protein).

Software

  1. For sequence alignment and subsequent phylogenetic reconstructions one or several of the following programs may be used:

    Table 1. Programs for sequence alignments and phylogenetic reconstructions
    Program
    Purpose
    Reference
    ExPASy translate
    Translation of nucleotide sequences into amino acid sequences.
    (Artimo et al., 2012)
    http://web.expasy.org/translate/
    Clustal 2
    Sequence alignment. Suited especially for closely related sequences.
    (Larkin et al., 2007)
    http://www.clustal.org/clustal2/
    MAFFT7
    Sequence alignment. Suited for closely as well as more distantly related sequences.
    (Katoh and Standley, 2013)
    http://mafft.cbrc.jp/alignment/software/
    RevTrans 1.4
    Converting amino acid alignment into codon alignment.
    (Wernersson and Pedersen, 2003)
    http://www.cbs.dtu.dk/services/RevTrans/
    MEGA 6
    Sequence alignment and phylogenetic reconstruction.
    (Tamura et al., 2013) http://www.megasoftware.net/
    MrBayes 3
    Phylogenetic reconstruction.
    (Ronquist and Huelsenbeck, 2003)
    http://mrbayes.sourceforge.net/index.php

  2. To collate the character matrix
    Microsoft Excel or a similar spreadsheet application
  3. For character-state reconstruction:
    Mesquite 3.02 (Maddison and Maddison, 2015) (http://mesquiteproject.org/)
    Mesquite also provides extensive documentation: (http://mesquiteproject.wikispaces.com/)

Procedure

  1. Compilation of the character matrix
    A character matrix is constructed that contains the names of the proteins and their interaction properties. This can be done using an Excel spreadsheet. Alternatively, data may be entered directly in Mesquite (Figure 1). It is possible to collate information for several interacting partners in separate columns. To conduct a likelihood character-state reconstruction with Mesquite (see below) data have to be coded categorically, i.e. ‘0’ for no interaction and ‘1’ for an interaction. Combinations for which the interactions status is unknown are left blank. Theoretically, one may also introduce three or more categories, e.g. ‘0’: no interaction; ‘1’: weak interaction; ‘2’ strong interaction. However, one needs to be aware of the fact that the categories are still discrete and do not follow a hierarchy (e.g. there is no constraint such that evolution has to proceed from ‘no’ to ‘weak’ to ‘strong’ interactions).
    Coding of interactions can be complicated by the phylogenetic history of the interaction partner. Consider an example in which protein A interacts with protein B in a certain model organism. In another organism, one ortholog of A, termed A here, may exist, but two co-orthologs of B, (B and B’’) occur. If A interacts with B but not with B’’ it is difficult to assign an interaction status to A (Figure 2). One compromise is to designate A as interacting as long as an interaction with either B or B’’ is observed (Melzer et al., 2014). The situation gets more complicated if only incomplete data sets are available. Assume, for example, A is not interacting with B’’, but information on the interaction between A and B is not available. In this case, one may designate the interaction status of A as unknown to avoid the inclusion of false negatives in the dataset (Melzer et al., 2014). It is difficult to estimate how frequently these problems will appear in a particular dataset. It is therefore important to consider the phylogenetic history of the interaction partner in character-state reconstructions.
    If the interaction data gathered rely on different methods it is helpful to also collect data for each method separately (Figure 1). This will later reveal whether the results of the character-state reconstruction depend on the method used to obtain the interaction data.


    Figure 1. Screenshot from a character matrix in Mesquite. Protein names are listed in the second (coloured) column. Interaction characteristics are listed in subsequent columns. Data on homodimerization as well as on heterodimerization with other proteins and data obtained with different techniques (Y2H: yeast two-hybrid; EMSA: electrophoretic mobility shift assay) are collated in separate columns.


    Figure 2. Duplications can complicate coding interactions. Proteins A and B interact in species 1 (as indicated by the double arrow). In species 2, A’ interacts with B’ but not with B’’. This raises the question as how to code the interaction status of A’.

  2. Phylogenetic reconstruction
    A phylogeny covering all of the proteins under study needs to be constructed using one of the many software tools available (e.g. MrBayes, MEGA 6, Bali-Phy, PhyML, see also Table 1). For an overview of basic concepts and methods in phylogeny reconstruction see De Bruyn et al. (2014). The phylogeny can be constructed using the sequences of the proteins under study. However, in principle every tree can be used as long as each protein is assigned to a specific position in the tree. Protein names in the character matrix described above and in the phylogenetic tree have to be identical to be later able to connect the two datasets. Mesquite also offers the possibility to manually draw trees; this may be used for cases in which a computational phylogenetic reconstruction is not feasible.
    The phylogeny may contain proteins for which interaction data are not available. These will later be ignored by the character-state analysis.

  3. Character-state reconstruction
    The character-state reconstruction is done using Mesquite. For general instructions on how to handle Mesquite one may visit the ‘Mesquite ProjectTeam’ YouTube channel (https://www.youtube.com/channel/UCfSmgC0O_dWLI0PEoXZbS4Q).
    1. Import/generate tree:
      Mesquite allows to import trees from other files in several ways (http://mesquiteproject.wikispaces.com/Trees). If trees are read from NEXUS files note that Mesquite cannot handle some special characters (e.g. dash) if present in protein names. When importing a phylogenetic tree, the branch lengths will later be taken into consideration for the character-state reconstruction. If a manually drawn tree is used, all branch lengths will by default be set to 1. This may work well in a number of cases, but it should be kept in mind that proteins from early diverging taxa may possess artificially short branches under this setting (Figure 3). However, Mesquite also allows editing branch lengths (https://mesquiteproject.wikispaces.com/Trees).
      A                                                                                                B
      Branches as displayed by default when tree is drawn              Branches displayed proportional to length
            
      Figure 3. Screenshot of a manually drawn tree in Mesquite.
      A. Branches are by default displayed so that all tips reach the same level. However, branch lengths are by default set to one, as can be seen in B, where branches of the same tree are displayed proportional to length (see scaling on the right side of the tree). This reveals that some branches (e.g. that leading to Amborella trichopoda AmAP3) might be unreasonable short.

    2. Import/generate data matrix:
      There are several ways to generate or import a data matrix implemented in Mesquite (http://mesquiteproject.wikispaces.com/Characters+%26+Matrices). A straightforward approach is to generate a new blank data matrix with the required number of characters and copy/paste the interaction data from the original data source (i.e. from the Excel spreadsheet). The matrix needs to be specified as categorical to be used for the character state reconstruction.
    3. Model specification and character reconstruction:
      Mesquite provides an extensive documentation on the different settings for the character-state reconstruction: http://mesquiteproject.wikispaces.com/Ancestral+States. In our analyses we employed likelihood reconstruction methods (Melzer et al., 2014), but parsimony reconstructions are also available (Li et al., 2015). Two general models can be used for likelihood reconstructions: The ‘Markov k-state 1 parameter model’ (Mk1) and the ‘Asymmetrical Markov k-state 2 parameter model’ (AsymmMk). The principal difference between these two models is that the 2 parameter model allows ‘forward’ and ‘backward’ rates to be different, i.e. the probabilities for gaining and losing an interaction can be different. In the 1 parameter model, gaining and losing an interaction is equally probable. Biologically, it would in most cases make more sense to apply the 2 parameter model, as one may assume that it is more likely to lose an interaction than gaining it. However, several reports have shown that 2 parameter models can lead to implausible results if small to medium sized datasets (data on less than 100 protein-protein interactions) are being used (Mooers and Schluter, 1999; Pagel, 1999). A likelihood ratio test can be used to infer whether the 2 parameter model significantly improves the fit of the model to the data as compared to the 1 parameter model (Pagel, 1999; Ree and Donoghue, 1999). This test is performed by subtracting the - log probability values derived from the two models and multiplying the absolute value of the result by 2 (|(-logLMk1)-(-logLAsymmMk)|•2). The resulting number can be used as test statistic for a Chi-square test with one degree of freedom. The test is also integrated in Mesquite an can be conducted via Analysis: Tree > Values for Current Tree > Asymmetry Likelihood Ratio Test.
      In the Mk1 and AsymmMk models, the rate of a character’s evolution is estimated by Mesquite (http://mesquiteproject.wikispaces.com/Processes+of+Character+Evolution#param). However, it is also possible to create own models with specific parameters (http://mesquiteproject.wikispaces.com/Ancestral+States#editingModels). This can be useful if, for example, the probability of gaining vs. losing an interaction is known from prior experimental evidence.
    4. Evaluation of the results:
      Results are best visualized using pie charts at the internal nodes of the tree (Figure 4). Mesquite offers the possibility to conduct the character-state reconstruction simultaneously over different phylogenetic trees. Also, several characters can be traced at once. This facilitates comparison of character-state reconstructions of one protein with different partners or comparison of character-state reconstructions based on different methods used to assay protein-protein interactions.
    5. Export options:
      Mesquite can export trees and character matrices in numerous ways (http://mesquiteproject.wikispaces.com/Interactions+with+Other+Programs). For a graphical representation of the character state reconstruction results we recommend to export the tree as PDF and use this file for further post-processing with graphics software such as Adobe Illustrator. For a direct comparison of the character state evolution of two different traits one may utilize the mirror tree function (Figure 4).

Representative data



Figure 4. Mirror tree comparing results of character-state reconstruction for homodimerization (left) and heterodimerization (right) of a subfamily of plant transcription factors. Pie charts at internal nodes indicate the probability of the presence (yellow) or absence (black) of an interaction. Hatched circles at terminal positions (e.g. Petunia hybrida PHTM6 on the left tree) and grey circles at internal nodes designate an unknown interaction status.

Acknowledgments

This protocol was adapted from a previously published study (Melzer et al., 2014). This research was supported by a DFG grant to G. T. and R. M. (TH417/5–2). R. M. was supported by a post-doctoral fellowship of the Carl-Zeiss-Foundation.

References

  1. Artimo, P., Jonnalagedda, M., Arnold, K., Baratin, D., Csardi, G., de Castro, E., Duvaud, S., Flegel, V., Fortier, A., Gasteiger, E., Grosdidier, A., Hernandez, C., Ioannidis, V., Kuznetsov, D., Liechti, R., Moretti, S., Mostaguir, K., Redaschi, N., Rossier, G., Xenarios, I. and Stockinger, H. (2012). ExPASy: SIB bioinformatics resource portal. Nucleic Acids Res 40(Web Server issue): W597-603.
  2. Chatr-Aryamontri, A., Breitkreutz, B. J., Heinicke, S., Boucher, L., Winter, A., Stark, C., Nixon, J., Ramage, L., Kolas, N., O'Donnell, L., Reguly, T., Breitkreutz, A., Sellam, A., Chen, D., Chang, C., Rust, J., Livstone, M., Oughtred, R., Dolinski, K. and Tyers, M. (2013). The BioGRID interaction database: 2013 update. Nucleic Acids Res 41(Database issue): D816-823.
  3. De Bruyn, A., Martin, D. P. and Lefeuvre, P. (2014). Phylogenetic reconstruction methods: an overview. Methods Mol Biol 1115: 257-277.
  4. Franceschini, A., Szklarczyk, D., Frankild, S., Kuhn, M., Simonovic, M., Roth, A., Lin, J., Minguez, P., Bork, P., von Mering, C. and Jensen, L. J. (2013). STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41(Database issue): D808-815.
  5. Katoh, K. and Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30(4): 772-780.
  6. Larkin, M. A., Blackshields, G., Brown, N. P., Chenna, R., McGettigan, P. A., McWilliam, H., Valentin, F., Wallace, I. M., Wilm, A., Lopez, R., Thompson, J. D., Gibson, T. J. and Higgins, D. G. (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23(21): 2947-2948.
  7. Li, L., Yu, X. X., Guo, C. C., Duan, X. S., Shan, H. Y., Zhang, R., Xu, G. X. and Kong, H. Z. (2015). Interactions among proteins of floral MADS-box genes in Nuphar pumila (Nymphaeaceae) and the most recent common ancestor of extant angiosperms help understand the underlying mechanisms of the origin of the flower. Journal of Systematics and Evolution: n/a-n/a.
  8. Liu, C., Zhang, J., Zhang, N., Shan, H., Su, K., Zhang, J., Meng, Z., Kong, H. and Chen, Z. (2010). Interactions among proteins of floral MADS-box genes in basal eudicots: implications for evolution of the regulatory network for flower development. Mol Biol Evol 27(7): 1598-1611.
  9. Maddison, W. P. and Maddison, D. R. (2015). Mesquite: a modular system for evolutionary analysis. Version 3.02.
  10. Melzer, R., Härter, A., Rumpler, F., Kim, S., Soltis, P. S., Soltis, D. E. and Theissen, G. (2014). DEF- and GLO-like proteins may have lost most of their interaction partners during angiosperm evolution. Ann Bot 114(7): 1431-1443.
  11. Mooers, A. O. and Schluter, D. (1999). Reconstructing ancestor states with maximum likelihood: Support for one- and two-rate models. Syst Biol 48: 623-633.
  12. Orchard, S., Ammari, M., Aranda, B., Breuza, L., Briganti, L., Broackes-Carter, F., Campbell, N. H., Chavali, G., Chen, C., del-Toro, N., Duesbury, M., Dumousseau, M., Galeota, E., Hinz, U., Iannuccelli, M., Jagannathan, S., Jimenez, R., Khadake, J., Lagreid, A., Licata, L., Lovering, R. C., Meldal, B., Melidoni, A. N., Milagros, M., Peluso, D., Perfetto, L., Porras, P., Raghunath, A., Ricard-Blum, S., Roechert, B., Stutz, A., Tognolli, M., van Roey, K., Cesareni, G. and Hermjakob, H. (2014). The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 42(Database issue): D358-363.
  13. Pagel, M. (1999). The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst Biol 48: 612-622.
  14. Ree, R. H. and Donoghue, M. J. (1999). Inferring rates of change in flower symmetry in asterid angiosperms. Syst Biol 48: 633-641.
  15. Ronquist, F. and Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12): 1572-1574.
  16. Salwinski, L., Miller, C. S., Smith, A. J., Pettit, F. K., Bowie, J. U. and Eisenberg, D. (2004). The database of interacting proteins: 2004 update. Nucleic Acids Res 32(Database issue): D449-451.
  17. Tamura, K., Stecher, G., Peterson, D., Filipski, A. and Kumar, S. (2013). MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30(12): 2725-2729.
  18. UniProt, C. (2015). UniProt: a hub for protein information. Nucleic Acids Res 43(Database issue): D204-212.
  19. Wernersson, R. and Pedersen, A. G. (2003). RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res 31(13): 3537-3539.

简介

蛋白质 - 蛋白质相互作用是许多发育,生理和生物化学过程的核心。 因此,对蛋白质 - 蛋白质相互作用的起源和进化动力学的认识可以提供关于特定生物分子回路的约束和动力学及其对生物表型的影响的信息。
此协议描述了祖先蛋白质 - 蛋白质相互作用模式如何使用一组已知的蛋白质相互作用从系统发育信息物种推断。 虽然该协议聚焦于蛋白质 - 蛋白质相互作用数据,但是字符状态重建通常可以以相同的方式与其他种类的二进制数据一起执行。

关键字:蛋白质-蛋白质相互作用, 系统发育, 人物状态重构, 牧豆树, 马尔可夫模型

数据

  1. 蛋白质 - 蛋白质相互作用数据
    应该编制所研究的蛋白质家族的相互作用的综合列表。由于相互作用数据通常仅对其序列已经沉积在数据库中的蛋白质产生,因此最近公布的研究中的蛋白质家族的全面系统发生可以产生预期的相互作用数据的数量和系统发生宽度的上限估计。在许多情况下,最近发表的系统发育关系需要从出版物本身提取,然而越来越多的系统发育正在上传到在线数据库,如TreeBASE( http://treebase.org/treebase-web/home.html )或Dryad( http://datadryad.org/)。
    1. 为了获得蛋白质 - 蛋白质相互作用的数据,数据库可能 使用。这些数据库的突出例子包括BioGRID ( http://thebiogrid.org/)(Chatr-Aryamontri等人,2013年),数据库 的互动蛋白( http://dip.doe-mbi.ucla.edu/)( Salwinski等 al。,2004),IntAct( http://www.ebi.ac.uk/完整/)(Orchard ,,2014)  和String( http://string-db.org/)(Franceschini em>,2013) 提到但少数。首先,UniProt( http://www.uniprot.org/)(UniProt 2015)提供了对多个这些数据库的交叉引用 便于搜索潜在的互动伙伴
    2. 而且  数据库搜索提供了一个很好的起点,他们经常没有  捕获所有可用的信息。因此,建议 进行文献检索。应特别强调 从系统发育的信息性蛋白质获得信息,即从占据系统发育中的关键位置的蛋白质 用于解析特定特征的状态(即) 字符状态)。很多时候,这些是早期分歧的谱系 它们(包括更多衍生的分类群)确保整体  捕获分类群的系统发生宽度。它可能会证明 用于获得蛋白质的新实验数据 系统发育特别丰富。事实上,生成新的 蛋白质 - 蛋白质相互作用数据通常与字符状态组合 重建以更好地理解蛋白质 - 蛋白质的进化 相互作用(Liu等人,2010; Melzer等人,2014; Li等人,2015)。

  2. 序列检索
    用于系统发生重建的蛋白质或核苷酸序列可以从NCBI核苷酸收集中获得( http://www.ncbi .nlm.nih.gov/nuccore )或NCBI蛋白质收集( http://www.ncbi.nlm.nih.gov/protein )。

软件

  1. 对于序列比对和随后的系统发生重建,可以使用一个或几个以下程序:

    表1.序列比对和系统发育重建程序
    程序
    目的
    参考
    ExPASy翻译
    核苷酸序列翻译成氨基酸序列 (Artimo ,,2012)
    http://web.expasy.org/translate/
    Clustal 2
    序列比对。 适合于密切相关的序列。
    (Larkin等人,2007)
    http://www.clustal.org/clustal2/
    MAFFT7
    序列比对。 适合紧密以及更遥远的相关序列。
    (Katoh和Standley,2013)
    http://mafft.cbrc.jp/alignment/software/
    RevTrans 1.4
    将氨基酸比对转换为密码子比对 (Wernersson和Pedersen,2003)
    http://www.cbs.dtu.dk/services/RevTrans/
    MEGA 6
    序列比对和系统发育重建 (Tamura et al。,2013)http://www.megasoftware.net/
    MrBayes 3
    系统发育重建 (Ronquist和Huelsenbeck,2003)
    http://mrbayes.sourceforge.net/index.php

  2. 整理字符矩阵
    Microsoft Excel或类似的电子表格应用程序
  3. 对于字符状态重建:
    Mesquite 3.02(Maddison和Maddison,2015)( http://mesquiteproject.org/
    Mesquite还提供了大量文档:( http://mesquiteproject.wikispaces.com/

程序

  1. 编译字符矩阵
    构建包含蛋白质的名称及其相互作用性质的字符矩阵。这可以使用Excel电子表格。或者,可以直接在Mesquite中输入数据(图1)。可以在单独的列中整理几个交互伙伴的信息。为了用Mesquite(见下文)进行可能性字符状态重建,必须对数据进行分类编码,即对于无交互,"0"和对于交互"1"。交互状态未知的组合留空。理论上,也可以引入三个或更多个类别,例如'0':无交互; '1':弱相互作用; '2'强的互动。然而,需要注意的是,类别仍然是离散的并且不遵循层次结构(例如 没有约束,因此进化必须从'no'进行到'weak' "强"互动)。
    相互作用的编码可以通过相互作用伙伴的系统发生历史而变得复杂。考虑其中蛋白质A与某种模式生物体中的蛋白质B相互作用的实例。在另一个生物体中,可以存在一个A的直向同源物,称为A ',但是可以存在两个B的同向直系同源物(B )。如果A '与B '交互,而不与B "交互,则难以将交互状态分配给A sup>(图2)。一个折衷是指定A '作为交互,只要观察到与B '或B " et al。,2014)。如果只有不完整的数据集可用,情况会变得更加复杂。例如,假设A '不与B "交互,但是关于A '和B 之间的交互的信息不可用。在这种情况下,可以将A '的相互作用状态指定为未知,以避免在数据集中包含假阴性(Melzer等人,2014)。很难估计这些问题在特定数据集中出现的频率。因此,重要的是考虑角色状态重建中的相互作用伴侣的系统发育历史 如果收集的交互数据依赖于不同的方法,则还有助于分别收集每种方法的数据(图1)。这将稍后揭示字符状态重建的结果是否取决于该方法 用于获取交互数据

    图1.在Mesquite中的字符矩阵的屏幕截图。蛋白质名称列在第二个(有色)列中。交互特征在后续列中列出。关于同源二聚化以及与其它蛋白的异源二聚化的数据和用不同技术(Y2H:酵母双杂交; EMSA:电泳迁移率变动分析)获得的数据在不同的列中汇总。

    图2.重复可能使编码相互作用复杂化。蛋白A和B在物种1中相互作用(如双箭头所示)。在物种2中,A'与B'相互作用,但与B"不相互作用。这提出了如何编码A'的交互状态的问题。

  2. 系统发育重建
    涵盖所研究的所有蛋白质的系统发育需要使用许多可用的软件工具之一来构建(例如MrBayes,MEGA 6,Bali-Phy,PhyML,也参见表1)。关于系统发生重建中的基本概念和方法的概述,参见De Bruyn等人(2014)。可以使用所研究的蛋白质的序列构建系统发育。然而,原则上每个树可以使用,只要每个蛋白质被分配到树中的特定位置。上述和系统发生树中所述的字符矩阵中的蛋白质名称必须相同,以便以后能够连接两个数据集。 Mesquite还提供手动绘制树的可能性;这可以用于其中计算系统发生重建是不可行的情况。
    系统发育可以包含不能获得相互作用数据的蛋白质。这些将稍后被字符状态分析忽略。

  3. 字符状态重建
    字符状态重建使用Mesquite完成。有关如何处理Mesquite的一般说明,可以访问"Mesquite ProjectTeam"YouTube频道( https://www。 youtube.com/channel/UCfSmgC0O_dWLI0PEoXZbS4Q )。
    1. 导入/生成树:
      Mesquite允许从其他的导入树 文件( http://mesquiteproject.wikispaces.com/Tree )。如果 树从NEXUS文件中读取,说明Mesquite不能处理一些 如果存在于蛋白质名称中的特殊字符(例如 dash)。什么时候 导入系统发生树,分支长度将在以后采取 考虑字符状态重建。如果是手动  使用绘制的树,所有分支长度将默认设置为1。 这在许多情况下可能工作良好,但应该牢记 来自早期分歧类群的蛋白质可能具有人为短 分支(图3)。但是,Mesquite也允许 编辑分支长度( https://mesquiteproject.wikispaces.com/Trees )。
      A                                                                                                  B
      在绘制树时默认显示的分支                与长度成正比的分支
            
      图3.在Mesquite中手动绘制的树的屏幕截图。 A.分支 是默认显示,以便所有提示达到相同的水平。然而,  分支长度默认设置为一,如可以在B中看到的,其中 相同树的分支与长度成比例地显示(参见 缩放在树的右侧)。这揭示了一些分支 (例如导致Amborella trichopoda AmAP3的)可能是不合理的 短。

    2. 导入/生成数据矩阵:
      有几个 生成或导入在Mesquite中实现的数据矩阵的方法 ( http://mesquiteproject.wikispaces.com/Characters+%26+Matrices )。一个 直接的方法是生成一个新的空白数据矩阵  所需的字符数,并复制/粘贴交互数据 原始数据源(来自Excel电子表格的即)。矩阵 需要被指定为用于字符状态的分类 重建
    3. 模型规范和字符重构:
      Mesquite提供了关于不同设置的详细文档 用于字符状态重建: http://mesquiteproject.wikispaces.com/Ancestral+States 。在我们的分析 我们使用可能性重建方法(Melzer等人,2014),但是  简约重建(Li em et al。,2015)。二 一般模型可以用于可能性重建:"马可夫 k-状态1参数模型'(Mk1)和'不对称马尔可夫k-状态2 参数模型"(AsymmMk)。这两者之间的主要区别 模型是2参数模型允许"向前"和"向后" 速率是不同的,即获得和失去的概率 交互可以不同。在1参数模型中, 失去互动同样可能。生物学上,它会 大多数情况下应用2参数模型更有意义,一个可能 假设它比失去交互更有可能失去交互。 然而,几个报告显示,2个参数模型可以导致 如果小到中等大小的数据集(数据小于  100蛋白质 - 蛋白质相互作用)(Mooers和Schluter, 1999; pagel,1999)。可以使用似然比测试来推断是否  2参数模型显着改善模型的拟合 数据与1参数模型相比(Page1,1999; Ree和 Donoghue,1999)。该测试通过减去-log进行 从两个模型导出的概率值并乘以 结果的绝对值为2(|(-logL subMk1) - ( - logL AsymmMk)| 2)。的 结果数可以用作卡方检验的检验统计量 具有一个自由度。该测试也集成在Mesquite中 可以通过Analysis:Tree>当前树的值> 不对称似然比检验。
      在Mk1和AsymmMk模型中, 字符的演化速率由Mesquite估计 ( http://mesquiteproject.wikispaces.com/Processes+of+Character+Evolution# param )。 但是,也可以创建具有特定的自己的模型 参数 ( http://mesquiteproject.wikispaces.com/Ancestral+States#editingModels )。 这可以是有用的,例如,如果获得的概率vs. 从先前的实验证据知道失去相互作用。
    4. 评估结果:
      最好使用饼图在内部节点可视化结果 树(图4)。 梅斯基特提供了进行的可能性 字符状态重构同时在不同 系统发育树。 此外,可以一次跟踪几个字符。 这个   便于比较一个的字符状态重建 蛋白与不同的配偶或字符状态的比较 基于用于测定蛋白质 - 蛋白质的不同方法的重建   互动。
    5. 导出选项:
      梅斯基特可以导出树木和   字符矩阵 ( http://mesquiteproject.wikispaces.com/Interactions+with+Other+Programs)。   对于字符状态重建的图形表示 结果,我们建议将树导出为PDF并使用此文件 使用诸如Adobe的图形软件进行进一步的后处理 插画。 对于字符状态演化的直接比较   两个不同的性状可以利用镜像树功能(图 4)。

代表数据



图4.镜像树比较植物转录因子亚家族的同二聚化(左)和异源二聚化(右)的字符状态重建的结果。内部节点的饼图表示存在的概率)或不存在(黑色)的相互作用。在末端位置(例如左侧树上的

Petunia hybrida PHTM6)的阴影圆圈和内部节点的灰色圆圈表示未知的相互作用状态。

致谢

该方案改编自以前发表的研究(Melzer等人,2014)。这项研究得到了DFG授予G.T.和R.M.(TH417/5-2)的支持。 R. M.获得卡尔 - 蔡司基金会的博士后研究生支持。

参考文献

  1. Artimo,P.,Jonnalagedda,M.,Arnold,K.,Baratin,D.,Csardi,G.,de Castro,E.,Duvaud,S.,Flegel,V.,Fortier,A.,Gasteiger, ,Grosdidier,A.,Hernandez,C.,Ioannidis,V.,Kuznetsov,D.,Liechti,R.,Moretti,S.,Mostaguir,K.,Redaschi,N.,Rossier,G.,Xenarios,和Stockinger,H。(2012)。 ExPASy:SIB生物信息学资源门户。核酸研究 40(Web服务器问题):W597-603。
  2. Chatr-Aryamontri,A.,Breitkreutz,BJ,Heinicke,S.,Boucher,L.,Winter,A.,Stark,C.,Nixon,J.,Ramage,L.,Kolas,N.,O'Donnell, L.,Reguly,T.,Breitkreutz,A.,Sellam,A.,Chen,D.,Chang,C.,Rust,J.,Livstone,M.,Oughtred,R.,Dolinski,K.and Tyers, (2013)。 BioGRID互动数据库:2013年更新 核酸研究 41(数据库问题):D816-823。
  3. De Bruyn,A.,Martin,D.P.和Lefeuvre,P。(2014)。 系统发育重建方法:概述 方法 Mol Biol 1115:257-277。
  4. Franceschini,A.,Szklarczyk,D.,Frankild,S.,Kuhn,M.,Simonovic,M.,Roth,A.,Lin,J.,Minguez,P.,Bork,P.,von Mering,和Jensen,LJ(2013)。 STRING v9.1:蛋白质 - 蛋白质相互作用网络,提高了覆盖面和整合度。 Nucleic Acids Res 41(数据库问题):D808-815。
  5. Katoh,K.和Standley,D.M。(2013)。 MAFFT多重序列比对软件版本7:提高了性能和可用性 Mol Biol Evol 30(4):772-780
  6. Larkin,MA,Blackshields,G.,Brown,NP,Chenna,R.,McGettigan,PA,McWilliam,H.,Valentin,F.,Wallace,IM,Wilm,A.,Lopez,R.,Thompson, Gibson,TJ和Higgins,DG(2007)。 Clustal W和Clustal X版本2.0 。生物信息学 23 (21):2947-2948。
  7. Li,L.,Yu,X. X.,Guo,C. C.,Duan,X. S.,Shan,H. Y.,Zhang,R.,Xu,G.X.and Kong,H.Z.(2015)。 Nuphar pumila中花MADS盒基因的蛋白质之间的相互作用 (睡莲科(Nymphaeaceae))和现存的被子植物的最近的共同祖先有助于理解花的起源的基本机制。系统学和进化杂志:n/an/a。 >
  8. Liu,C.,Zhang,J.,Zhang,N.,Shan,H.,Su,K.,Zhang,J.,Meng,Z.,Kong,H.and Chen,Z.(2010)。 基因eudi​​cots中花MADS-box基因的蛋白质之间的相互作用:对进化的影响花发育的调节网络。 Mol Biol Evol 27(7):1598-1611。
  9. Maddison,W. P.和Maddison,D.R。(2015年)。 Mesquite:进化分析的模块化系统 版本 3.02。
  10. Melzer,R.,Härter,A.,Rumpler,F.,Kim,S.,Soltis,P.S。,Soltis,D.E.and Theissen,G。(2014)。 DEF和GLO样蛋白可能在被子植物进化过程中失去了大部分的相互作用配偶。 a> Ann Bot 114(7):1431-1443。
  11. Mooers,A.O。和Schluter,D。(1999)。 以最大似然重构祖先状态:支持一倍和两倍速率模型。 Syst Biol 48:623-633。
  12. Orchard,S.,Ammari,M.,Aranda,B.,Breuza,L.,Briganti,L.,Broackes-Carter,F.,Campbell,NH,Chavali,G.,Chen,C.,del-Toro, N.,Duesbury,M.,Dumousseau,M.,Galeota,E.,Hinz,U.,Iannuccelli,M.,Jagannathan,S.,Jimenez,R.,Khadake,J.,Lagreid, L.,Lovering,RC,Meldal,B.,Melidoni,AN,Milagros,M.,Peluso,D.,Perfetto,L.,Porras,P.,Raghunath,A.,Ricard-Blum,S.,Roechert, B.,Stutz,A.,Tognolli,M.,van Roey,K.,Cesareni,G.and Hermjakob,H。(2014)。 MIntAct项目 - IntAct是11个分子交互数据库的常见策展平台。 Nucleic Acids Res 42(数据库问题):D358-363。
  13. Pagel,M。(1999)。 在系统发育上重建离散字符的祖先字符状态的最大似然方法。 a> Syst Biol 48:612-622
  14. Ree,R.H。和Donoghue,M.J。(1999)。 推测小行星被子植物花对称性的变化率 系统Biol 48:633-641。
  15. Ronquist,F。和Huelsenbeck,J.P。(2003)。 MrBayes 3:混合模型下的贝叶斯系统发育推理。 生物信息学 19(12):1572-1574。
  16. Salwinski,L.,Miller,C.S.,Smith,A.J.,Pettit,F.K.,Bowie,J.U。和Eisenberg,D。(2004)。 相互作用蛋白的数据库:2004年更新。 Nucleic Acids Res < em> 32(数据库问题):D449-451。
  17. Tamura,K.,Stecher,G.,Peterson,D.,Filipski,A.and Kumar,S.(2013)。 MEGA6:分子进化遗传学分析版本6.0。 Mol Biol Evol 30(12):2725-2729。
  18. UniProt,C。(2015)。 UniProt:蛋白质信息的枢纽 核酸研究 43(数据库问题):D204-212。
  19. Wernersson,R。和Pedersen,A.G。(2003)。 RevTrans:来自比对氨基酸序列的编码DNA的多重比对 Nucleic Acids Res 31(13):3537-3539。
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2015 The Authors; exclusive licensee Bio-protocol LLC.
引用:Rümpler, F., Theißen, G. and Melzer, R. (2015). Character-State Reconstruction to Infer Ancestral Protein-Protein Interaction Patterns. Bio-protocol 5(16): e1566. DOI: 10.21769/BioProtoc.1566.
提问与回复

(提问前,请先登录)bio-protocol作为媒介平台,会将您的问题转发给作者,并将作者的回复发送至您的邮箱(在bio-protocol注册时所用的邮箱)。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片或者视频的形式来说明遇到的问题。由于本平台用Youtube储存、播放视频,作者需要google 账户来上传视频。

当遇到任务问题时,强烈推荐您提交相关数据(如截屏或视频)。由于Bio-protocol使用Youtube存储、播放视频,如需上传视频,您可能需要一个谷歌账号。