搜索

Using CRISPR-ERA Webserver for sgRNA Design
使用网络服务器CRISPR-ERA设计sgRNA   

下载 PDF 引用 收藏 提问与回复 分享您的反馈 Cited by

本文章节

Abstract

The CRISPR-Cas9 system is emerging as a powerful technology for gene editing (modifying the genome sequence) and gene regulation (without modifying the genome sequence). Designing sgRNAs for specific genes or regions of interest is indispensable to CRISPR-based applications. CRISPR-ERA (http://crispr-era.stanford.edu/) is one of the state-of-the-art designer webserver tools, which has been developed both for gene editing and gene regulation sgRNA design. This protocol discusses how to design sgRNA sequences and genome-wide sgRNA library using CRISPR-ERA.

Keywords: sgRNA design(sgRNA设计), CRISPR-Cas9 system(CRISPR-Cas9系统), sgRNA library(sgRNA文库), Gene editing(基因编辑), Gene regulation(基因调控)

Background

Genome engineering is essential to the study of biology, which attracted several new technological breakthroughs (Doudna and Charpentier, 2014). CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-CRISPR associated protein 9) technology has proven to have great efficiency and generalizability both in gene editing and gene regulation (Qi et al., 2013; La Russa and Qi, 2015). CRISPR-Cas9 system consists of Cas9 endonuclease and a target-identifying CRISPR RNA duplex (crRNA and trans-activating crRNA (tracrRNA)) that can be simplified into a single guide RNA (sgRNA). sgRNA sequence can match and target with an 18- to 25-bp DNA sequence, with a required DNA motif termed the protospacer-adjacent motif (PAM) adjacent to the binding site. The most commonly used type of Cas9 is derived from Streptococcus pyogenes, and the PAM sequence is NGG (N represents any nucleotide), while NAG works sporadically with lower efficiency.

In CRISPR-Cas9 system, sgRNA with a general 20 bp custom designed sequence determines target specificity and efficiency. Designing sgRNA is an indispensible part of CRISPR related projects. Of the published tools that enable automated sgRNA design, CRISPR-ERA can provide sgRNA searching approaches for both gene editing and gene regulation applications, and provide additional genome-wide sgRNA library building protocol (Liu et al., 2015). Currently, CRISPR-ERA supports sgRNA design for nine organisms with different kinds of manipulations. It provides a user-friendly webserver to enable sgRNA searching in preassembled databases. The preassembled genome-wide sgRNA databases are built by seeking all targetable sites with patterns of N20NGG. To evaluate the efficiency and specificity of each sgRNA, CRISPR-ERA utilizes criteria summarized from published data, and then computes an efficacy score (E-score) and a specificity score (S-score). Criteria will have a slight change within different kinds of manipulation and organisms.

Equipment

  1. Personal computer for CRISPR-ERA website searching
  2. High performance computing cluster for building genome-wide sgRNA library. Taken genome version hg19 as an example, the minimum storage space is 500 G

Software

  1. CRISPR-ERA (http://crispr-era.stanford.edu/)
  2. USCS genome browser (Kent et al., 2002; http://genome.ucsc.edu/)
  3. Bowtie2 (Langmead et al., 2012; http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
  4. NCBI (https://www.ncbi.nlm.nih.gov/)
  5. Perl scripts (Programming language, https://www.perl.org/)
  6. Shell scripts (Programming language, Command Line Interface shell, https://www.linux.org/)

Procedure

  1. Using CRISPR-ERA webserver for sgRNA searching
    1. CRISPR-ERA webserver input (Figure 1)
      1. Choose the type of objective gene manipulation: gene editing using nuclease, gene editing using nickase, gene repression, or gene activation.
      2. Choose the host organism: Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, Rattus norvegicus, Mus musculus, or Homo sapiens. Different type of choice in step A1a presents different optional organisms.
      3. Choose the input format: official gene name, gene location (target region for gene editing or transcriptional start sites (TSS) location for gene regulation), or gene sequence in FASTA format (using a textbox or uploading files).
      Note: By clicking ‘?’ in choice box of every step, users could find detailed instruction.


      Figure 1. CRISPR-ERA input process

    2. CRISPR-ERA webserver output (Figure 2)


      Figure 2. CRISPR-ERA output webpage

      Output webpage contains two parts, ‘See results in UCSC Genome Browser’ and ‘Results’.
      1. By clicking ‘click here to see result in UCSC Genome Browser’, CRISPR-ERA can show all the sequences on UCSC Genome Browser. sgRNA is identified by ‘ID’. The sum of E-score and S-score is represented by color shades referred to the color bar.
      2. Result table contains sgRNA sequences and their properties, such as target gene, transcript ID, distance to TSS, location, strand, etc. The sgRNAs starting with ‘G’ can be screened out, which could be applied in a CRISPR system using U6 promoter. When targetable region belongs to more than one transcript, the result table will show the information of all the transcripts, as shown in Figure 2.
      3. E-score and S-score columns contain the features that affect sgRNA efficiency and specificity. E-score and S-score are computed based on the criteria summarized from published data. E-score could represent the sgRNA efficacy, which contains GC content, poly-T presence and other sequence features. S-score shows the specificity of sgRNA sequence which is based on genome-wide off-target information. All sgRNA sequences can be downloaded.

  2. Genome-wide sgRNA library building pipeline
    1.  Download genome sequence files in FASTA format and genome annotation files in RefFlat or GFF format, from UCSC genome browser or NCBI website. With genome version hg19 as an example, genome sequence and annotation files can be downloaded in http://hgdownload.soe.ucsc.edu/downloads.html.
    2.  The Perl scripts can be received after the material transfer form is submitted, which allow 20 bp sgRNA searching with a default PAM (NGG) sequence and pattern (N20NGG). During the searching step, locations and strand information of all potential sgRNA target sites will be recorded.
      Run Perl program:

      perl find_all_sgRNA_z_f_c_y.pl hg19_dna.fa out_sgRNA.txt out_sgRNA_fasta.txt out_sgRNA_gc_t.txt out_nag_fasta.txt out_no_sgRNA.txt
      # out_sgRNA: all potential sgRNA sequences
      # out_sgRNA_fasta.txt: all potential sgRNA sequences with FASTA format for bowtie next step (with PAM sequence NGG)
      #out_sgRNA_gc_t.txt: all sgRNA sequences with GC content and Poly T information
      #out_nag_fasta.txt: all potential sgRNA sequences with FASTA format for bowtie next step (different with out_sgRNA_fasta.txt, PAM sequence here is NAG)
      # out_no_sgRNA.txt: Number of sgRNA sequences in each chromosome.

      Note: Help information of Perl can be found by commands ‘perl –h’, ‘perldoc perl’, or in http://learn.perl.org/.
    3. Run Bowtie to find all possible off-target sequences (both PAM = NGG, PAM = NAG are considered) containing up to 3-bp mismatches for each sgRNA.

      bowtie -v 2 -k 100 ./hg19 -f out_sgRNA_fasta.txt sgRNA_bowtie_fasta.txt
      bowtie -v 2 -k 100 ./hg19 -f out_nag_fasta.txt sgRNA_nag_bowtie_fasta.txt

      Note: Parameter setting of Bowtie can be found in http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml.
    4. Compute the E-score and S-score by analyzing the sgRNA sequence features. E-score is computed by GC content and poly-T presence (mammalian only), and S-score is computed based on off-target information derived in step B3. Criteria can be customized, and differ in different organisms and gene manipulations (Figure 3).


      Figure 3. An example of E-score and S-score computation. Sequence: GGTGAATGAGGGCTTGCGA.

    5. Extract gene TSS location and coding region in genome annotation files. For gene editing, sgRNA target region is coding region. For gene repression, sgRNA targets a region from upstream -1.5 kbp to downstream 1.5 kbp from TSS, while the target region is -1.5 kbp upstream from TSS for gene activation. By hash searching the eligible sgRNA of these regions in the genome-wide sgRNA library derived in step B4, details of sgRNA for all genes are derived. Then update the E-score and S-score scores according to the additional target location information. Figure 3 is an E-score and S-score computation example of one sgRNA for Pou5f1 repression. The sgRNA database for different gene manipulations formed after the information above integrated.

Data analysis

After finding the objective sgRNA sequences, the essential step is to evaluate the efficiency and specificity of each sgRNA sequence. In this protocol, we provide a general method to compute the E-score and S-score when building genome-wide sgRNA libraries. For sgRNA database for specific gene manipulations, other criteria should be included except the criteria for genome-wide sgRNA libraries, such as exon locations for gene editing and the distance to TSS for gene regulation. For example, efficiency reduces with a longer distance relative to TSS for gene regulation. The more detailed description of E-score and S-score could be found on the ‘Help’ webpage of CRISPR-ERA webserver (http://crispr.stanford.edu/help.jsp).

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 31371341), Tsinghua University Initiative Scientific Research Program (No. 20141081175), and the Open Research Fund of State Key Laboratory of Bioelectronics, Southeast University.

References

  1. Doudna, J. A. and Charpentier, E. (2014). Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346(6213): 1258096.
  2. Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M. and Haussler, D. (2002). The human genome browser at UCSC. Genome Res 12(6): 996-1006.
  3. Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. (2009). Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3): R25.
  4. La Russa, M. F. and Qi, L. S. (2015). The new state of the art: Cas9 for gene activation and repression. Mol Cell Biol 35(22): 3800-3809.
  5. Liu, H., Wei, Z., Dominguez, A., Li, Y., Wang, X. and Qi, L. S. (2015). CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation. Bioinformatics 31(22): 3676-3678.
  6. Qi, L. S., Larson, M. H., Gilbert, L. A., Doudna, J. A., Weissman, J. S., Arkin, A. P. and Lim, W. A. (2013). Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152(5): 1173-1183.

简介

CRISPR-Cas9系统正在成为基因编辑(修饰基因组序列)和基因调控(不修饰基因组序列)的有力技术。 为特定基因或感兴趣区域设计sgRNA是基于CRISPR的应用不可或缺的。 CRISPR-ERA(http://crispr-era.stanford.edu/)是最先进的设计师网络服务器工具之一,已被开发用于基因编辑和基因调控sgRNA设计。 该协议讨论如何使用CRISPR-ERA设计sgRNA序列和全基因组sgRNA文库。
【背景】基因组工程对于生物学的研究至关重要,吸引了几个新的技术突破(Doudna和Charpentier,2014)。已经证明CRISPR-Cas9(集群定期交织的短回文重复CRISPR相关蛋白9)技术在基因编辑和基因调控中具有很高的效率和广泛性(Qi等,2013; La Russa和Qi,2015)。 CRISPR-Cas9系统由可以简化为单导向RNA(sgRNA)的Cas9内切核酸酶和靶标识别CRISPR RNA双链体(crRNA和反式激活性crRNA(tracrRNA))组成。 sgRNA序列可以与18至25bp的DNA序列匹配和靶向,其中所需的DNA基序称为与结合位点相邻的原始邻域基序(PAM)。最常用的Cas9类型来源于化脓性链球菌,PAM序列是NGG(N代表任何核苷酸),而NAG以较低的效率偶尔工作。
  在CRISPR-Cas9系统中,具有一般20bp定制设计序列的sgRNA决定了靶标的特异性和效率。设计sgRNA是CRISPR相关项目不可或缺的一部分。在发布的自动sgRNA设计工具中,CRISPR-ERA可以为基因编辑和基因调控应用提供sgRNA搜索方法,并提供其他全基因组sgRNA库构建方案(Liu et al。,2015)。目前,CRISPR-ERA支持对不同种类操作的9种生物的sgRNA设计。它提供了一个用户友好的网络服务器,以在预组装的数据库中启用sgRNA搜索。预组装的全基因组sgRNA数据库是通过寻找具有N20NGG模式的所有可靶向位点来构建的。为了评估每种sgRNA的效率和特异性,CRISPR-ERA使用从公开数据总结的标准,然后计算功效评分(E评分)和特异性评分(S评分)。标准将在不同种类的操纵和生物体内发生轻微变化。

关键字:sgRNA设计, CRISPR-Cas9系统, sgRNA文库, 基因编辑, 基因调控

设备

  1. 用于CRISPR-ERA网站搜索的个人计算机
  2. 用于构建全基因组sgRNA库的高性能计算集群。以基因组版本 hg19 为例,最小存储空间为500 G

软件

  1. CRISPR-ERA( http://crispr-era.stanford.edu/
  2. USCS基因组浏览器(Kent等人,2002; http:/ /genome.ucsc.edu/
  3. Bowtie2(Langmead 等人,2012; http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
  4. NCBI( https://www.ncbi.nlm.nih.gov/ < / a>)
  5. Perl脚本(编程语言, https://www.perl.org/
  6. Shell脚本(编程语言,命令行界面shell, https://www.linux.org/

程序

  1. 使用CRISPR-ERA网络服务器进行sgRNA搜索
    1. CRISPR-ERA网络服务器输入(图1)
      1. 选择目标基因操作的类型:使用核酸酶进行基因编辑,使用切口酶基因编辑,基因抑制或基因激活。
      2. 选择宿主生物:大肠杆菌,枯草芽孢杆菌,酿酒酵母,黑腹果蝇,秀丽隐杆线虫, ,, , , 步骤A1a中的不同类型的选择呈现不同的任选生物体
      3. 选择输入格式:官方基因名称,基因位置(基因编辑的目标区域或基因调控的转录起始位点(TSS)位置)或FASTA格式的基因序列(使用文本框或上传文件)。
      注意:通过在每一步的选择框中点击"?",用户可以找到详细的说明。


      图1. CRISPR-ERA输入过程

    2. CRISPR-ERA webserver输出(图2)


      图2. CRISPR-ERA输出网页

      输出网页包含两部分,"查看UCSC基因组浏览器"和"结果"中的结果。
      1. 通过点击"点击这里查看UCSC基因组浏览器的结果",CRISPR-ERA可以显示UCSC基因组浏览器上的所有序列。 sgRNA由"ID"识别。 E-score和S-score的总和由颜色栏所指的色调表示。
      2. 结果表包含sgRNA序列及其特性,如靶基因,转录本ID,到TSS的距离,位置,链,等等。可以筛选出以'G'开头的sgRNA,这可以被应用在使用U6启动子的CRISPR系统中。可目标区域属于多个记录,结果表将显示所有成绩单的信息,如图2所示。
      3. E-score和S-score列包含影响sgRNA效率和特异性的特征。 E-score和S-score是根据公布数据总结的标准计算出来的。 E分可以代表sgRNA功效,其含有GC含量,poly-T存在和其他序列特征。 S评分显示了基于全基因组偏离目标信息的sgRNA序列的特异性。所有sgRNA序列都可以下载

  2. 全基因组sgRNA库建设管道
    1. 从UCSC基因组浏览器或NCBI网站下载FASTA格式的基因组序列文件和RefFlat或GFF格式的基因组注释文件。以基因组版本 hg19 为例,基因组序列和注释文件可以在 http://hgdownload.soe.ucsc.edu/downloads.html
    2. 在提交材料传送表单后,可以收到Perl脚本,这样可以使用默认PAM(NGG)序列和模式(N <20> NGG)进行20 bp sgRNA搜索。在搜索步骤中,将记录所有潜在sgRNA靶位点的位置和链信息 运行Perl程序:

      perl find_all_sgRNA_z_f_c_y.pl hg19_dna.fa out_sgRNA.txt out_sgRNA_fasta.txt out_sgRNA_gc_t.txt out_nag_fasta.txt out_no_sgRNA.txt
      #out_sgRNA:所有潜在的sgRNA序列
      > #out_nag_fasta.txt:所有具有FASTA格式的潜在sgRNA序列用于bowtie下一步(与out_sgRNA_fasta.txt不同,PAM序列在这里是NAG)
      #out_no_sgRNA.txt:每个染色体中的sgRNA序列数。

      注意:Perl的帮助信息可以通过命令'perl -h','perldoc perl'或 http://learn.perl.org/
    3. 运行Bowtie找到所有可能的离靶序列(PAM = NGG,PAM = NAG),每个sgRNA含有高达3-bp的错配。


      注意:Bowtie的参数设置可以在 http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
    4. 通过分析sgRNA序列特征来计算E评分和S评分。 E分是通过GC含量和poly-T存在(仅哺乳动物)计算的,并且基于在步骤B3中导出的偏离目标信息计算S评分。标准可以定制,不同的生物和基因操作有所不同(图3)

      图3. E-score和S-score计算的一个例子。 序列:GGTGAATGAGGGCTTGCGA。

    5. 在基因组注释文件中提取基因TSS位置和编码区。对于基因编辑,sgRNA靶区是编码区。对于基因抑制,sgRNA靶向来自TSS上游-1.5kbp至下游1.5kbp的区域,而目标区域为TSS用于基因激活的-1.5kbp上游。通过在步骤B4中得到的全基因组sgRNA文库中搜索这些区域的合格sgRNA,从而得到所有基因的sgRNA的细节。然后根据额外的目标位置信息更新E-score和S-score分数。图3是Pou5f1抑制的一种sgRNA的E评分和S评分计算实例。以上综合信息形成的不同基因操纵的sgRNA数据库。

数据分析

在找到目标sgRNA序列后,基本步骤是评估每种sgRNA序列的效率和特异性。在本协议中,我们提供了一种在构建全基因组sgRNA文库时计算E评分和S评分的一般方法。对于具体基因操作的sgRNA数据库,除了基因组全长sgRNA文库的标准外,还应包括其他标准,例如用于基因编辑的外显子位置和与TSS进行基因调控的距离。例如,相对于用于基因调控的TSS,效率降低较长。 E-score和S-score的更详细的描述可以在CRISPR-ERA webserver的"帮助"网页上找到( http://crispr.stanford.edu/help.jsp )。

致谢

这项工作得到了中国国家自然科学基金(31371341),清华大学计划科学研究计划(20141081175)和东南大学生物电子国家重点实验室开放研究基金的支持。

参考

  1. Doudna,JA和Charpentier,E.(2014)。基因组编辑。基因组工程与CRISPR-Cas9的新前沿。科学 346(6213):1258096.
  2. Kent,WJ,Sugnet,CW,Furey,TS,Roskin,KM,Pringle,TH,Zahler,AM和Haussler,D。(2002)。 UCSC的人类基因组浏览器 Genome Res 12(6):996-1006。 br />
  3. Langmead,B.,Trapnell,C.,Pop,M.and Salzberg,SL(2009)。&nbsp; 新的现有技术:Cas9用于基因激活和抑制。细胞生物学 35(22):3800-3809。
  4. Liu,H.,Wei,Z.,Dominguez,A.,Li,Y.,Wang,X. and Qi,LS(2015)。&nbsp; CRISPR-ERA:用于CRISPR介导的基因编辑,抑制和激活的综合设计工具。生物信息学 31(22):3676-3678。
  5. Qi,LS,Larson,MH,Gilbert,LA,Doudna,JA,Weissman,JS,Arkin,AP和Lim,WA(2013)。&nbsp; 重复使用CRISPR作为基因表达的序列特异性控制的RNA引导平台。细胞 152 (5):1173-1183。
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2017 The Authors; exclusive licensee Bio-protocol LLC.
引用:Liu, H., Wang, X. and Qi, L. S. (2017). Using CRISPR-ERA Webserver for sgRNA Design. Bio-protocol 7(17): e2522. DOI: 10.21769/BioProtoc.2522.
提问与回复

(提问前,请先登录)bio-protocol作为媒介平台,会将您的问题转发给作者,并将作者的回复发送至您的邮箱(在bio-protocol注册时所用的邮箱)。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片或者视频的形式来说明遇到的问题。由于本平台用Youtube储存、播放视频,作者需要google 账户来上传视频。

当遇到任务问题时,强烈推荐您提交相关数据(如截屏或视频)。由于Bio-protocol使用Youtube存储、播放视频,如需上传视频,您可能需要一个谷歌账号。