搜索

A Conceptual Outline for Omics Experiments Using Bioinformatics Analogies
采用生物信息类推法进行组学实验概念概述   

下载 PDF 引用 收藏 提问与回复 分享您的反馈 Cited by

本文章节

Abstract

Hypothetical proteins (HP) are those that are not characterized in the laboratory and so remain “orphaned” in genomic databases. In recent times there has been a lot of progress in characterizing HPs in the laboratory. Various methods, such as sequence capture and Next Generation Sequencing (NGS), have been used to rapidly identify HP functions and their encoded genes. Applications and methods, such as the isolation of single genes, are greatly facilitated by pull-down assays to characterize proteins. Furthermore, there are methods to extract proteins from either the whole cell or a subcellular fraction. But the weakness is that some assays are fairly expensive and laborious, and characterizing HP function is always imperfect. In the recent past, statistical interpretations of the in silico selection strategies have improved the identification of the most promising candidates, including those from various annotation methods, such as protein interaction networks (PIN). Given the improvements in technology that have permitted a substantial increase in computational annotation, we ask if the prediction of HP function in silico (validation of models through algorithms and data subsets) could likewise be improved. In this work, we apply a bioinformatics analogy to each step of a wet lab experiment performed to predict aspects confirming protein function. Although it may be a less bona fide approach, assigning a putative function from conservation observed in homologous protein sequences might be worthwhile to consider prior to a wet lab experiment.

Keywords: Hypothetical proteins(假设蛋白质), Omics(组学), Systems biology(系统生物学), Functional genomics(功能基因组学), Annotation(注释)

Procedure

Experiment steps and bioinformatics analogies

  1. Immunoblotting by a Coomassie stained gel and patterns of selection for the total protein or cytoplasmic, nuclear, membrane or cytoskeletal fractions.
    Analogy: A log-transformation of the data using an error model to get normally distributed noise and statistical procedures can be made using MATLAB and R (Kreutz et al., 2007). The model is applicable for simulation studies and parameter estimation in systems biology for predicting functional candidates. A bioinformatics tool, named aLFQ supports this kind of analogy where one can estimate proteomic data obtained from MS/MS, further enabling error estimation using automatic data (Rosenberger et al., 2014). Availability: Through R/CRAN (http://www.cran.r-project.org). The raw data for such analyses can be obtained from UniProt or Protein Atlas.
  2. Genomic shotgun DNA fragments hybridized to the exome library; PCR amplification and Streptavidin beads.  For example, the PCR amplification step involves finding several polymorphisms along the genome (SNP genotyping, etc.). In particular, biotinylated primer is essential for each SNP for capture of single-stranded DNA (ssDNA) template for the assay to be complete. Although alternative strategies have steadfastly been developed for pyrosequencing, the method covers all phases from PCR amplification to ssDNA template capture within pyrosequencing (Royo et al., 2007).
    Analogy: Next Generation Sequencing (NGS) based annotation using HiSeq or MiSeq Illumina systems and associated materials could be used for thorough predictions (Liu et al., 2012). Galaxy frameworks can be used as an extension with machine learning based tools for sequence and tiling array data analysis. Software: HiSeq or MiSeq Illumina systems. A case study for Hi-Seq/Mi-Seq based high throughput analysis of NGS data using the Galaxy system.  Please follow the help pages (see Galaxy reference).
  3. Immunoproteomics: Kinetic analysis of antibody-peptide binding by surface plasmon resonance (SPR) is essentially used for finding antibodies against hypothetical protein candidates with high affinity. Further, a method called MALDI immunoscreening (MiSCREEN) is being used these days to screen high affinity anti-peptide antibodies (Razavi et al., 2011).
    Analogy: Genetic algorithms (GAs) and swarm intelligence (SI) methods could serve as perfect replicas for feature selection methods using high-dimensional searches. Further, ant colony optimization (ACO) is used to integrate features selected on the basis of significance and applications criteria (Ressom et al., 2006). T-Coffee is a multiple sequence alignment based genetic algorithm package. These can be used to align sequences or to combine the output of favorite alignment methods into one unique alignment (Notredame et al., 2000).
  4. Genome-wide analysis of the chromosomal distribution of co-expressed genes where one of the candidates is uncharacterized. Specificity of genes in chromosomal regions could be determined by qPCR (Boutanaev et al., 2002).
    Analogy: Gene expression programming (GEP) can be used to code complex programs, which are then included in linear chromosomes of fixed length (Ferreira, 2001). These in turn could later be expressed as expression trees (ET). These ETs may further undergo mutation and recombination in predicting the function of HP candidates. One example using MATLAB demonstrates a way to find patterns in gene expression profiles, e.g. finding expressed patterns along a genome sequence (see URL: http://se.mathworks.com/help/bioinfo/ug/example-analyzing-gene-expression-profiles.html).
  5. A major challenge in handling large scale applications for characterizing proteins using mass spectrometry, etc. is how to integrate and model the surplus of data that is produced.
    Analogy: “On the fly” virtual screening where analysis is done using assembled, project-specific workflows to guide the next stages of experimentation (Pasculescu et al., 2014). The scripts can be made open-source and editable so that researchers can rapidly make enhancements in their projects. MaxQuant software package could be attributed to this analogy (Cox et al., 2009).  For example, one can aim at analyzing large MS data sets and further narrow down the complex experimental designs using characterized proteins on a time series, collating them with, for instance, drug-response data.
  6. In vivo and in vitro experimentation of cellular signaling domains
    Analogy: Engineering simulations for in vivo and in vitro experimentation might be used to enable low-cost hypothesis generation and experimental design. Furthermore, in silico models can be used to develop a framework of simulations for paradigm domains, such as cancer systems biology (Bown et al., 2012). An in silico docking experiment can be perused to identify the binding residues of proteins in the open and closed conformation. Furthermore, one can get a molecular view of the system. (Degryse et al., 2008).
  7. Many uncharacterized or HP data ultimately remain unannotated in the sequencing/biochemical information deposited from time to time.
    Analogy: Aggregate different structural and functional evidence with GO relationships based on similactors (Benso et al., 2013). Further, exploit community annotation using a “wiki of uncharacterized proteins.” Please refer to Benso et al. (2013).
  8. Antibodies vs. Aptamers. Are aptamers cost-effective when compared to antibodies for characterizing proteins (see references Aptagen and Basepairbio)? Only few analytical techniques are known to be capable of detecting minute changes with a sensitivity matching that of antibodies. The targeting of whole proteins and selection of specific residual sequences as epitopes is needed for the functional characterization of HPs. For example, a protein such as Twinkle helicase, also known as Progressive External Opthalmoplegia (PEO) in humans, is encoded by the gene C10orf2, which is similar to the GP4 helicase structure and is an interacting partner of the DNA mismatch repair protein, MLH1. A pull-down assay would resolve the purpose.
    Analogy: Applying the potential role of aptamers in elucidating the function of HPs with the possibilities provided by bioinformatics for establishing a benchmark for aptamer-protein prediction methods. With these future perspectives, the role of hypothetical proteins as target molecules for diagnostics and therapies could prove to be very useful in the development of medical technology. For example, we could develop an aptamer prediction webserver, which in turn could be used for pull-down assays or label-free detection to ascertain the function of some classes of proteins, such as HPs (Suravajhala et al., 2014). Please refer to Suravajhala et al. (2014), and see the analogy below.

    Aptamer Analogy
    Purpose: Detailed how-to guide for implementing the bioinformatics analogy for step 8, where the role of HPs as target molecules for diagnostics and therapies could prove to be very useful in the development of medical technology. Here we use the analogy of finding better candidates (as seen pictographically in Figure 1), which could then be applied to infer function for a class of HPs.
    Overview: A pull-down assay uses a small-scale affinity tag to an antibody, similar to immunoprecipitation. In the case of proteins, whose actuality, function or even interacting peers have been theoretically known but seldom experimentally established, pull-down assays can have a significant role. But can bioinformatics play a major role in lessening the scale of experimentation? The use of gene ontology functional data specific to organelles could play a major role in inferring the functions of uncharacterized proteins. For such HPs, their interacting partners remain uncharacterized as well due to the lack of feasible screening methods. Although the methods to identify the functional contexts of activity of the interacting protein have been presented, the necessary experimental boundary to characterize them explicitly does not exist. Therefore, we envisage a better predictive approach for the use of aptamers for pull-down assays or label-free detection. Application of aptamers in this research area would have immense potential as only a few analytical techniques are known to be capable of detecting minute changes with a sensitivity matching that of antibodies. Targeting whole proteins and the selection of specific residual sequences as epitopes is needed for the functional characterization of HPs, such as Twinkle helicase, also known as Progressive External Opthalmoplegia (PEO) in humans, encoded by the gene C10orf2, which is similar to the GP4 helicase structure and an interacting partner of the DNA mismatch repair protein, MLH1. We present here a step-by-step methodology to ensure this analogy is met for a biologist with little experience in bioinformatics.
    Resources: Excel worksheet for transferring the annotation or even further extending the database to SQL or CSV format, and drawing software such as MS Draw or MS Publisher [for methods and software, please refer to Suravajhala and Sundararajan (2012)].
    Steps
    1. Take the HP accession in question from GenBank. Check how bona fide the accession is by identifying its related sequences, the start sites of the protein-coding regions, and whether or not it is a pseudogene. Transfer the sequence information to an Excel worksheet by employing a six-point classification scoring schema as described earlier (Suravajhala and Sundararajan, 2012).
    2. Find the candidate proteins that are localized to the same organelle by virtue of the interaction peers; we will be able to set aside those HPs that form an interacting pool. From the first half of the Figure 1, we show how the HP in question has its interaction peers.
    3. The annotation would then be transferred to the similactors approach, which will involve filtering and enrichment of PPI networks.
    4. Use a concrete database of aptamers that are available (Aptagen/Basepairbio). Target specifically known unknown (KU) regions and use them as putative biomarkers.
    5. Simulate the above list of HPs and candidate proteins from step 3 for identifying better targets from step 4.
    6. Analyze the results, and make a database.
    7. (Optional) Develop a predictive webserver based on machine learning approaches, thereby training a network of proteins and aptamers for possible and easy identification of targets.

Representative data (example)

In a framework for functional prediction (Figure 1), experimentally determined characteristics of the putative interaction partners are perused to make an interactome of hypothetical proteins (hypothome (Desler et al., 2014)). In this process, we suggest a role for the predicted protein in a biological context, thus complementing an interactome with the interactions with predicted proteins, in addition to retaining information on interactions, whether predicted or experimentally verified (left panel in Figure 1). This strategy is essential for characterization of predicted proteins and their interactions with existing biological pathways.
Furthermore, the electronic annotation using methods [described in Benso et al. (2013)] containing similar, yet non-interacting proteins (similactors) (right panel in Figure 1), along with the hypothome data, can be used in training datasets. However, a simulation followed by machine learning predictions can also be applied on a wide number of proteins not specific to HPs alone, thereby drawing an inference for an analogy to functional prediction.


Figure 1. A framework for functional prediction. Experimentally determined characteristics of the putative interaction partners are perused to make an interactome of hypothetical proteins. Left panel: methods for making an interactome of hypothetical proteins as described by Desler et al. (2014). Right panel: electronic annotation methods described by Benso et al. (2013).

Acknowledgments

We would like to gratefully acknowledge Alfredo Benso and his colleagues for proposing similactors approach alongside hypothome. The authors received no funding whatsoever. PS would like to thank Arsalan Daudi and Fanglian He for inviting us to write this manuscript.

References

  1. Aptagen: http://www.aptagen.com/
  2. Basepairbio.com: Aptamers and Their Potential Applications at Base Pair Biotechnologies.
  3. Benso, A., Di Carlo, S., Ur Rehman, H., Politano, G., Savino, A. and Suravajhala, P. (2013). A combined approach for genome wide protein function annotation/prediction. Proteome Sci 11(Suppl 1): S1.    
  4. Boutanaev, A. M., Kalmykova, A. I., Shevelyov, Y. Y. and Nurminsky, D. I. (2002). Large clusters of co-expressed genes in the Drosophila genome. Nature 420(6916): 666-669.        
  5. Bown, J., Andrews, P. S., Deeni, Y., Goltsov, A., Idowu, M., Polack, F. A., Sampson, A. T., Shovman, M. and Stepney, S. (2012). Engineering simulations for cancer systems biology. Curr Drug Targets 13(12): 1560-1574.
  6. Cox, J., Matic, I., Hilger, M., Nagaraj, N., Selbach, M., Olsen, J. V. and Mann, M. (2009). A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc 4(5): 698-705.    
  7. Desler, C., Zambach, S., Suravajhala, P. and Rasmussen, L. J. (2014). Introducing the hypothome: a way to integrate predicted proteins in interactomes. Int J Bioinform Res Appl 10(6): 647-652.    
  8. Degryse, B., Fernandez-Recio, J., Citro, V., Blasi, F. and Cubellis, M. V. (2008). In silico docking of urokinase plasminogen activator and integrins. BMC Bioinformatics 9 Suppl 2: S8.    
  9. Ferreira, C. (2001). Gene expression programming: a new adaptive algorithm for solving problems. Complex Systems 13(2):87-129.    
  10. Galaxy web URL: https://galaxy.cbio.mskcc.org/.
  11. Ressom, H. W., Varghese, R. S. and Goldman, R. (2009). Computational methods for analysis of MALDI-TOF spectra to discover peptide serum biomarkers. In: The Protein Protocols Handbook. Springer, 1175-1183.
  12. Heyer, L. J., Kruglyak, S. and Yooseph, S. (1999). Exploring expression data: identification and analysis of coexpressed genes. Genome Res 9(11): 1106-1115.    
  13. Kreutz, C., Bartolome Rodriguez, M. M., Maiwald, T., Seidl, M., Blum, H. E., Mohr, L. and Timmer, J. (2007). An error model for protein quantification. Bioinformatics 23(20): 2747-2753.    
  14. Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., Lin, D., Lu, L. and Law, M. (2012). Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012: 251364.    
  15. Notredame, C., Higgins, D. G. and Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302(1): 205-217.    
  16. Pasculescu, A., Schoof, E. M., Creixell, P., Zheng, Y., Olhovsky, M., Tian, R., So, J., Vanderlaan, R. D., Pawson, T., Linding, R. and Colwill, K. (2014). CoreFlow: a computational platform for integration, analysis and modeling of complex biological data. J Proteomics 100: 167-173.
  17. Razavi, M., Pope, M. E., Soste, M. V., Eyford, B. A., Jackson, A. M., Anderson, N. L. and Pearson, T. W. (2011). MALDI immunoscreening (MiSCREEN): a method for selection of anti-peptide monoclonal antibodies for use in immunoproteomics. J Immunol Methods 364(1-2): 50-64.    
  18. Ressom, H. W., Varghese, R. S., Orvisky, E., Drake, S. K., Hortin, G. L., Abdel-Hamid, M., Loffredo, C. A. and Goldman, R. (2006). Ant colony optimization for biomarker identification from MALDI-TOF mass spectra. Conf Proc IEEE Eng Med Biol Soc 1: 4560-4563.    
  19. Rosenberger, G., Ludwig, C., Rost, H. L., Aebersold, R. and Malmstrom, L. (2014). aLFQ: an R-package for estimating absolute protein quantities from label-free LC-MS/MS proteomics data. Bioinformatics 30(17): 2511-2513.    
  20. Royo, J. L., Hidalgo, M. and Ruiz, A. (2007). Pyrosequencing protocol using a universal biotinylated primer for mutation detection and SNP genotyping. Nat Protoc 2(7): 1734-1739.    
  21. Suravajhala, P., reddy Burri, H. V. and Heiskanen, A. (2014). Combining aptamers and in silico interaction studies to decipher the function of hypothetical proteins. Eur Chem Bull 3(8): 809-810.
  22. Suravajhala, P. and Sundararajan, V. S. (2012). A classification scoring schema to validate protein interactors. Bioinformation 8(1): 34-39.

简介

假设蛋白(HP)是在实验室中没有表征的那些,因此在基因组数据库中保持"孤儿"。近来,在实验室中表征HP的进展很多。已经使用各种方法,例如序列捕获和下一代测序(NGS),来快速鉴定HP功能及其编码的基因。应用和方法,如单个基因的分离,大大促进了下拉测定表征蛋白质。此外,存在从全细胞或亚细胞级分中提取蛋白质的方法。但是弱点是一些测定是相当昂贵和费力的,并且表征HP功能总是不完美的。在最近的过去,计算机选择策略的统计解释改进了最有希望的候选者的识别,包括来自各种注释方法的那些,例如蛋白质相互作用网络(PIN)。考虑到技术的改进允许计算注释的大量增加,我们询问是否可以改进HP函数在计算机中的预测(通过算法和数据子集验证模型)。在这项工作中,我们应用生物信息学类似于湿实验室实验的每个步骤,以预测方面确认蛋白质功能。虽然它可能是较少的真正的方法,但在湿实验室实验之前,可能值得考虑在同源蛋白质序列中观察到的来自保守性的推定功能。

关键字:假设蛋白质, 组学, 系统生物学, 功能基因组学, 注释

程序

实验步骤和生物信息学类比

  1. 通过考马斯染色的凝胶的免疫印迹和对总蛋白或细胞质,核,膜或细胞骨架部分的选择模式。 类比 使用错误模型对数据进行对数转换以获得正态分布噪声和统计过程可以使用MATLAB和R(Kreutz等人,2007)。该模型适用于系统生物学中的模拟研究和参数估计,用于预测功能候选。称为aLFQ的生物信息学工具支持这种类比,其中可以估计从MS/MS获得的蛋白质组数据,进一步使用自动数据进行误差估计(Rosenberger等人,2014)。可用性:通过R/CRAN( http://www.cran.r-project.org ) )。这些分析的原始数据可以从UniProt或Protein Atlas获得
  2. 基因组鸟枪DNA片段与外显子组文库杂交; PCR扩增和链霉亲和素珠。例如,PCR扩增步骤包括沿着基因组找到几个多态性(SNP基因分型,等)。特别地,生物素化引物对于用于捕获单链DNA(ssDNA)模板的每个SNP是必需的,用于完成测定。虽然已经为焦磷酸测序稳定地开发了替代策略,但是该方法涵盖了焦磷酸测序中从PCR扩增到ssDNA模板捕获的所有阶段(Royo等人,2007)。
    类比: 使用HiSeq或MiSeq Illumina系统和相关材料的基于下一代测序(NGS)的注释可用于彻底预测(Liu等人。 >,2012)。 Galaxy框架可以 用作基于机器学习的工具的扩展,用于序列和拼接阵列数据分析。软件:HiSeq或MiSeq Illumina系统。基于Hi-Seq/Mi-Seq的使用Galaxy系统的NGS数据的高通量分析的案例研究。请按照帮助页面(请参阅Galaxy参考)。
  3. 免疫蛋白质组学:通过表面等离子体共振(SPR)的抗体 - 肽结合的动力学分析基本上用于发现针对具有高亲和力的假定蛋白质候选物的抗体。此外,近来正在使用称为MALDI免疫筛选(MiSCREEN)的方法来筛选高亲和力抗肽抗体(Razavi等人,2011)。
    类比: 遗传算法(GA)和群体智能(SI)方法可以作为使用高维搜索的特征选择方法的完美复制品。此外,蚁群优化(ACO)用于整合基于重要性和应用标准选择的特征(Ressom等人,2006)。 T-Coffee是基于多序列比对的遗传算法包。这些可以用于对齐序列或将喜爱的对齐方法的输出结合成一个唯一的对齐方式(Notredame ,2000)。
  4. 共表达基因的染色体分布的全基因组分析,其中一个候选物是未表征的。染色体区域中基因的特异性可以通过qPCR测定(Boutanaev等人,2002)。
    类比: 基因表达编程(GEP)可用于编码复杂程序,然后将其包括在固定长度的线性染色体中(Ferreira,2001)。这些反过来又可以表达为表达树(ET)。这些ET可以在预测HP候选物的功能中进一步经历突变和重组。使用MATLAB的一个实例证明了一种在基因表达谱中找到模式的方法,例如沿着基因组序列找到表达的模式(参见URL:http://se.mathworks.com/help/bioinfo/ug/example-analyzing-gene-expression-profiles.html )。
  5. 处理使用质谱法表征蛋白质的大规模应用的主要挑战 是如何整合和建模所产生的数据的剩余。 类比: "即时"虚拟筛选,其中使用组合的,项目特定的工作流程来指导下一阶段的实验(Pasculescu ,2014)。脚本可以是开源的和可编辑的,以便研究人员可以在其项目中快速进行增强。 MaxQuant软件包可以归因于这个类比(Cox ,,2009)。例如,可以旨在分析大MS数据集,并且使用表征的蛋白质在时间序列上进一步缩小复杂的实验设计,将它们与例如药物反应数据进行比较。
  6. 体内和体外细胞信号传导域的实验
    类比 :和体外实验的工程模拟可用于实现低成本假设生成,实验设计。此外,在计算机模型中可以用于开发用于范例领域(例如癌症系统生物学)的模拟框架(Bown等人,2012)。可以进行计算机对接实验来识别蛋白质在开放和封闭构象中的结合残基。此外,可以得到系统的分子视图。 (Degryse et al。,2008)。
  7. 许多未表征的或HP数据最终在不时地沉积的测序/生化信息中保持未注释。
    类比: 使用基于类似物的GO关系汇总不同的结构和功能证据(Benso等人,2013)。此外,利用使用"未知的蛋白质的维基"的社区注释。请参考Benso et al。 (2013)。
  8. 抗体与适体。与用于表征蛋白质的抗体相比,适体是成本有效的(参见参考文献Aptagen和Basepairbio)?只有少数分析技术已知能够以与抗体的灵敏度匹配的方式检测微小变化。对于HP的功能表征需要靶向全蛋白和选择特定的残基序列作为表位。例如,蛋白质 Twinkle解旋酶,也称为进行性外部眼肌麻痹(PEO)在人类中,由基因C10orf2编码,其类似于GP4解旋酶结构,并且是DNA错配修复蛋白MLH1的相互作用的伙伴。下拉测定可以解决目的。
    类比: 利用生物信息学提供的可能性来阐明适体在阐明HP的功能中的潜在作用,以建立适体 - 蛋白质预测方法的基准。有了这些未来的观点,假设蛋白质作为诊断和治疗的靶分子的作用可以证明在医学技术的发展中非常有用。例如,我们可以开发适体预测网络服务器,其又可以用于下拉测定或无标记检测以确定某些类别的蛋白质如HP的功能(Suravajhala等人, em>,2014)。请参考Suravajhala等人。 (2014),并参见下面的类比
    适配类比
    目的:详细的操作指南,用于实施步骤8的生物信息学比较,其中HP作为诊断和治疗的靶分子的作用可以证明在医疗技术的发展中非常有用。在这里,我们使用找到更好的候选人的类比(如图1中的象形图所示),然后可以应用于推导一类HP的函数。
    概述:下拉测定法使用小规模亲和标签与抗体,类似于免疫沉淀。在蛋白质的情况下,其实际性,功能或甚至相互作用的同族在理论上已知,但很少在实验上确立,下拉测定可以具有显着的作用。但是生物信息学在减少实验规模方面发挥了重要作用吗?使用特定于细胞器的基因本体功能数据可以在推断未表征蛋白质的功能中起主要作用。对于这样的HP,由于缺乏可行的筛选方法,它们的相互作用配偶体仍然没有表征。尽管已经提出了鉴定相互作用蛋白的活性的功能上下文的方法,但是明确表征它们的必要的实验边界不存在。因此,我们设想使用适体用于下拉测定或无标记检测的更好的预测方法。在该研究领域中适体的应用将具有巨大的潜力,因为只有少数分析技术已知能够以与抗体的灵敏度匹配的方式检测微小的变化。针对HP的功能表征需要靶向整个蛋白质和选择特定的残留序列,例如Twinkle解旋酶(也称为人类进行性外部食欲缺乏(PEO)),由基因C10orf2编码,其类似于GP4解旋酶结构和DNA错配修复蛋白MLH1的相互作用伙伴。我们在这里介绍一个循序渐进的方法,以确保这种类比,以满足生物学家在生物信息学方面没有经验。
    资源:用于传输注释或进一步将数据库扩展为SQL或CSV格式的Excel工作表,以及绘图软件(如MS Draw或MS Publisher)[有关方法和软件,请参阅 Suravajhala和Sundararajan(2012)] 步骤
    1. 从GenBank获取HP质粒。 检查善意如何 加入是通过鉴定其相关序列,起始位点 蛋白质编码区,以及它是否是假基因。 通过使用a将序列信息传送到Excel工作表 六点分类评分模式 (Suravajhala和Sundararajan,2012)。
    2. 找到候选人 蛋白质,由于它们定位于相同的细胞器 交互对等体; 我们将能够留出那些形成的HP 互动池。 从图1的上半部分,我们展示了如何 惠普有其交互对等体。
    3. 注释会 然后转移到类似的方法,这将涉及 过滤和丰富PPI网络
    4. 使用混凝土 可用的适体数据库(Aptagen/Basepairbio)。 目标 特别是已知的未知(KU)地区,并使用它们作为推定 生物标志物
    5. 模拟来自步骤3的HP和候选蛋白质的上述列表,用于从步骤4鉴定更好的目标
    6. 分析结果,建立数据库。
    7. (可选)基于机器学习开发预测性Web服务器 方法,从而培养蛋白质和适体的网络 可能和容易识别目标。

代表数据(示例)

在功能预测的框架(图1)中,注意到推定的相互作用伴侣的实验确定的特征以产生假设蛋白质的相互作用组(hypothome)(Desler等人,2014)。在这个过程中,我们建议预测的蛋白质在生物环境中的作用,从而补充与预测的蛋白质的相互作用的交互作用,除了保留相互作用的信息,无论是预测还是实验验证(图1中的左图)。这种策略对于预测蛋白质及其与现有生物学途径的相互作用的表征是必不可少的 此外,使用方法的电子注释[在Benso等人(2013)]包含相似但不相互作用的蛋白质(类似物)(图1中的右图)以及下斜线数据可用于训练数据集。然而,机器学习预测之后的模拟也可以应用于不特定于HP的大量蛋白质,从而得出对类似功能预测的推断。


图1.功能预测的框架。 推测的相互作用伙伴的实验确定的特征被用来做假想蛋白质的相互作用组。左图:制造假设蛋白质的相互作用组的方法,如Desler等人(2014)所述。右图:Benso等人描述的电子注释方法(2013)。

致谢

我们衷心感谢Alfredo Benso和他的同事们提出类似的解决方案。 作者没有收到任何资金。 PS感谢Arsalan Daudi和Fanglian He邀请我们写这篇手稿。

参考文献

  1. Aptagen: http://www.aptagen.com/
  2. Basepairbio.com:碱基对生物技术中的适配子及其潜在应用
  3. Benso,A.,Di Carlo,S.,Ur Rehman,H.,Politano,G.,Savino,A.and Suravajhala,P.(2013)。 基因组广泛蛋白功能的组合方法 注释/预测。蛋白质组学Sci 11(Suppl 1):S1。    
  4. Boutanaev,A.M.,Kalmykova,A.I.,Shevelyov,Y.Y.and Nurminsky,D.I。(2002)。 果蝇基因组中大量的共表达基因。 420(6916):666-669。        
  5. Bown,J.,Andrews,P.S.,Deeni,Y.,Goltsov,A.,Idowu,M.,Polack,F.A.,Sampson,A.T.,Shovman,M。和Stepney, 癌症系统生物学的工程模拟。 药物靶标 13(12):1560-1574。
  6. Cox,J.,Matic,I.,Hilger,M.,Nagaraj,N.,Selbach,M.,Olsen,J.V.and Mann,M。(2009)。 基于SILAC的定量蛋白质组学的MaxQuant计算平台的实用指南 Nat Protoc 4(5):698-705。    
  7. Desler,C.,Zambach,S.,Suravajhala,P.and Rasmussen,L.J。(2014)。 简介:一种将预测的蛋白质整合到互动体中的方法。 J Bioinform Res Appl 10(6):647-652。    
  8. Degryse,B.,Fernandez-Recio,J.,Citro,V.,Blasi,F.and Cubellis,MV(2008)。 In silico 对接尿激酶纤溶酶原激活物和整联蛋白。 BMC生物信息学9增刊2:S8。  
  9. Ferreira,C。(2001)。 基因表达规划:用于解决问题的新的自适应算法。 复杂系统 13(2):87-129。    
  10. Galaxy网址: https://galaxy.cbio.mskcc.org/。
  11. Ressom,H.W.,Varghese,R.S.and Goldman,R。(2009)。 用于分析MALDI-TOF光谱以发现肽血清生物标志物的计算方法。 In: The Protein Protocols Handbook。 Springer,1175-1183
  12. Heyer,L.J.,Kruglyak,S。和Yooseph,S。(1999)。 探索表达数据:共表达基因的鉴定和分析基因组研究/em> 9(11):1106-1115。    
  13. Kreutz,C.,Bartolome Rodriguez,M.M.,Maiwald,T.,Seidl,M.,Blum,H.E.,Mohr,L.and Timmer,J。(2007)。 蛋白质定量的错误模型 生物信息学 23( 20):2747-2753。    
  14. Liu,L.,Li,Y.,Li,S.,Hu,N.,He,Y.,Pong,R.,Lin,D.,Lu,L.and Law,M.(2012)。 下一代测序系统的比较 > 2012:251364.    
  15. Notredame,C.,Higgins,D.G。和Heringa,J。(2000)。 T-Coffee:快速准确的多重序列比对的新方法 J Mol Biol 302(1):205-217。    
  16. Panderescu,A.,Schoof,EM,Creixell,P.,Zheng,Y.,Olhovsky,M.,Tian,R.,So,J.,Vanderlaan,RD,Pawson,T.,Linding,R。和Colwill, K.(2014)。 CoreFlow:用于复杂生物数据的集成,分析和建模的计算平台 em> J Proteomics 100:167-173。
  17. Razavi,M.,Pope,M.E.,Soste,M.V.,Eyford,B.A.,Jackson,A.M.,Anderson,N.L.and Pearson,T.W。(2011)。 MALDI免疫筛选(MiSCREEN):选择用于免疫蛋白质组学的抗肽单克隆抗体的方法。 J Immunol Methods 364(1-2):50-64。    
  18. Ressom,H.W.,Varghese,R.S.,Orvisky,E.,Drake,S.K.,Hortin,G.L.,Abdel-Hamid,M.,Loffredo,C.A.and Goldman,R.(2006)。 蚁群优化用于从MALDI-TOF质谱鉴定生物标志物。 Proc IEEE Eng Med Biol Soc 1:4560-4563。    
  19. Rosenberger,G.,Ludwig,C.,Rost,H.L.,Aebersold,R.and Malmstrom,L.(2014)。 aLFQ:用于估计来自无标记LC-MS/MS蛋白质组学的绝对蛋白质量的R-package data。 Bioinformatics 30(17):2511-2513。    
  20. Royo,J.L.,Hidalgo,M。和Ruiz,A。(2007)。 焦磷酸测序方案,使用通用生物素化引物进行突变检测和SNP基因分型。 Nat Protoc 2(7):1734-1739。    
  21. Suravajhala,P.,reddy Burri,H.V.和Heiskanen,A。(2014)。 结合适体和计算机互动研究以破译假设蛋白质的功能。 a> Eur Chem Bull 3(8):809-810
  22. Suravajhala,P.和Sundararajan,V.S.(2012)。 用于验证蛋白质相互作用者的分类评分模式。 生物信息 8(1):34-39
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2015 The Authors; exclusive licensee Bio-protocol LLC.
引用:Suravajhala, P. and Bizzaro, J. W. (2015). A Conceptual Outline for Omics Experiments Using Bioinformatics Analogies. Bio-protocol 5(3): e1387. DOI: 10.21769/BioProtoc.1387.
提问与回复

(提问前,请先登录)bio-protocol作为媒介平台,会将您的问题转发给作者,并将作者的回复发送至您的邮箱(在bio-protocol注册时所用的邮箱)。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片或者视频的形式来说明遇到的问题。由于本平台用Youtube储存、播放视频,作者需要google 账户来上传视频。

当遇到任务问题时,强烈推荐您提交相关数据(如截屏或视频)。由于Bio-protocol使用Youtube存储、播放视频,如需上传视频,您可能需要一个谷歌账号。