搜索

Chromosome Dosage Analysis in Plants Using Whole Genome Sequencing
采用全基因组测序法进行染色体剂量分析   

评审
匿名评审
下载 PDF 引用 收藏 提问与回复 分享您的反馈

本文章节

Abstract

Relative chromosome dosage, i.e., increases or decreases in the number of copies of specific chromosome regions in one sample versus another, can be determined using aligned read-counts from Illumina sequencing (Henry et al., 2010). The following protocol was used to identify the different classes of aneuploids that result from uniparental genome elimination in Arabidopsis thaliana, including chromosomes that have undergone chromothripsis (Tan et al., 2015). Uniparental genome elimination results in the production of haploid progeny from crosses to specific strains called “haploid inducers” (Ravi et al., 2014). On the other hand, chromothripsis, which was first discovered in cancer genomes, is a phenomenon that results in clustered, highly rearranged chromosomes. In plants, chromothripsis has been observed as a result of genome elimination (Tan et al., 2015). Detecting variation in chromosome dosage has multiple applications beside those linked to genome elimination. For example, a dosage variant population of poplar hybrids was created by gamma-irradiation of pollen grains. Hundreds of dosage lesions, insertions and deletions, were identified using this technique and provide a way to associate loci with the phenotypic consequences observed in this population (Henry et al., 2015).

This method has been successfully used to detect changes in chromosome dosage in many different species, including Arabidopsis thaliana (Tan et al., 2015), Arabidopsis suecica (Ravi et al., 2014), rice (Henry et al., 2010) and poplar (Henry et al., 2015). It is important to note that dosage plots always indicate dosage variation relative to the control sample used (Note 1). Therefore, this approach is not suitable to detect ploidy variants (diploid vs triploid, for example). Similarly, this technique does not allow the detection of balanced chromosomal rearrangements such as reciprocal translocations.

Keywords: Dosage Analysis(剂量分析), Whole Genome Sequencing(全基因组测序), Chromothripsis(染色体碎裂), Genome Elimination(基因组的消除), Aneuploidy(非整倍体)

Materials and Reagents

  1. 96 microTUBE plate (Covaris Inc., catalog number: 520078 )
  2. Genomic DNA
  3. Illustra Nucleon Phytopure kit (GE Healthcare, catalog number: RPN8511 )
  4. KAPA Hyper Prep kit (KAPA Biosystems, catalog number: KK8504 )
  5. NextFlex-96 adapters (Bioo Scientific, catalog number: 514106 )
  6. Agencourt AMPure XP (Beckman Coulter, catalog number: A63882 )
  7. Fresh 80% ethanol (Sigma-Aldrich, catalog number: E7023 )
  8. DEPC water (BioExpress, catalog number: G-3223-1L )
  9. Qubit dsDNA HS Assay Kit (Thermo Fischer Scientific, catalog number: Q32854 )

Equipment

  1. Ultrasonicator (Covaris, model: E220 Focused-ultrasonicator )
  2. PCR cycler with 96-well plate capacity
  3. Magnetic plate (Thermo Fischer Scientific, model: 12331D )
  4. Qubit 2.0 (Thermo Fisher Scientific, model: Q32866 )
  5. Illumina sequencing platform
  6. Nanodrop 2000 spectrophotometer (Thermo Fischer Scientific, model: ND-2000C )

Software

  1. Burrows-Wheeler aligner (Li and Durbin, 2009)
  2. bwa (http://bio-bwa.sourceforge.net/)
  3. bin-by-sam.py (http://comailab.genomecenter.ucdavis.edu/index.php/Bin-by-sam)
  4. allprep (http://comailab.genomecenter.ucdavis.edu/index.php/Barcoded_data_preparation_tools)
  5. bwa-doall (http://comailab.genomecenter.ucdavis.edu/index.php/Bwa-doall)
  6. Python2.6 or Python2.7

Procedure

  1. Genomic DNA fragmentation by Covaris
    1. Isolate high quality genomic DNA and determine concentration with a minimal yield of 1 μg of and a concentration of at least 20 ng/μl. Please see Note 2 for further details.
    2. Pipet 500 ng of DNA input into each well of the 96 microTUBE Plate and add nuclease-free water to reach 27 μl per well.
    3. Shear DNA in the E220 Covaris sonicator with the following settings:
      Peak Incident Power (W) 175
      Duty Factor 5%
      Cycles per Burst 200
      Treatment time 60 sec
      Proceed directly to the KAPA PCR-free Hyper-Prep.

  2. KAPA PCR-free Hyper-Prep (Note 3)
    1. End repair and A-tailing
      Fragmented DNA 25 μl
      End repair & A-tailing buffer (Hyper-Prep kit) 3.5 μl
      End repair & A-tailing enzyme mix (Hyper-Prep kit) 1.5 μl
      Total volume 30 μl
      a. Incubate in a Thermal Cycler at 20 °C for 30 min followed by 65 °C for 30 min.
      b. Proceed to the ligation step immediately.
    2. Adapter ligation
      End repair & A-tailing reaction 30 μl
      DEPC water 3 μl
      Ligation buffer (Hyper-Prep kit) 15 μl
      DNA ligase (Hyper-Prep kit) 5 μl
      Adapter stock (2.5 μM) 2 μl
      Total volume 55 μl
      a. Incubate at 20 °C for 15 min.
      b. Proceed to post-ligation cleanup step immediately.
    3. Post-ligation cleanup
      Adapter ligation reaction product 54 μl
      Ampure (0.8x vol) 43 μl
      Total volume 97 μl
      1. Mix gently and incubate at room temperature for 15 min to allow the DNA to bind to the beads.
      2. Place reactions on a magnetic plate to separate beads from the solution. When the liquid is clear, remove the supernatant and wash the beads twice with 200 μl 80% ethanol.
      3. Let the beads dry (as recommended by the manufacturer) before eluting with 25 μl DEPC water.
      4. Pool the samples. A typical pooling strategy for 96 samples, representing 96 individual libraries for dosage analysis, would be to subpool 3 μl from groups of 12 libraries. The concentration of the 8 subpools is determined using the Qubit dsDNA HS Assay Kit and equal amounts of DNA from each subpool are then combined into a master pool. The concentration of this master pool is determined using the Qubit dsDNA HS Assay Kit and can be submitted for PCR-free Illumina sequencing if it is at least 20 ng/μl with total DNA content of ~1 μg. An additional Ampure (0.8x) purification, followed by elution at a lower volume may be required to increase the final concentration of the master pool. Irrespective of the pooling strategy, sequencing reads from pooled samples can eventually be reassigned to their respective samples based on the index sequence present in the adapters because the libraries are pooled after adapter ligation.
      5. Submit the PCR-free libraries for sequencing [see (Note 3) for other options]. The Illumina sequencing platform used is typically single read 50 (Note 4). The number of reads needed per individual depends on the scope of the experiment and the genome size of the organism at hand [see (Notes 4, 5) as well as Figures 2 and 3].

  3. Mapping and Dosage Plot
    1. Trim the reads for quality and align them onto the TAIR10 genome using the Burrows-Wheeler aligner (Li and Durbin, 2009). We have provided scripts called allprep.py as well as bwa-doall.py for this purpose. Detailed documentation of these scripts is available from the links provided.
    2. Convert .sai files to .sam files after alignment (If you are using the bwa-doall.py script, this step is automatically performed by the script.).
      bwa samse [database] [filename]_aln.sai [filename].fq > [filename]_aln.sam
    3. Run bin-by-sam.py in a folder that contains the sam files to generate dosage plots. Highly detailed directions and examples for this script is available from on the main documentation page or can be accessed directly here: http://comailab.genomecenter.ucdavis.edu/images/3/30/README-bin-by-sam.pdf.
      bin-by-sam.py -o output-bin-file.txt -s size-of-bins [-c control.sam file] [-u] [-m number of max snps, default is 5] [-b] [-r] [-p ploidy for relative percent calculation] [-C].
      For help on the meaning of different parameters: bin-by-sam.py –h.
      Input:
      Run the script in a directory with the input _aln .sam files.
      Output:
      One file with a line per non-overlapping, consecutive bin along each of the reference sequences and two columns for each input .sam file: one indicating the number of reads mapping to each bin and the other indicating the corresponding dosage relative to the control.
      Specific example: Recommended initial dosage plot analysis based on 1 Mb-sized bins in folder containing a group of .sam files as a starting point.
      bin-by-sam.py -o 1Mb_bin.txt -s 1000000
      After running this initial analysis, the obtained read counts can be used as an indication of the appropriate minimum bin size. As a rule of thumb, no less than an average of 100 read counts per bin should be used (see Figures 2 and 3).
    4. Parameters
      Required:
      -o, output file name (for example “-o Dosage_100kb_control2.txt”)
      -s, bin size in bps (for example “-s 100000” for 100 kb bins)
      Optional:
      -c, to use a control sample for relative percent coverage calculations, specify the file name here. If no file is specified, the mean of all samples is used as control value for each bin (Note 1).
      -u, to use only samtools flagged unique reads (XT: A: U), in which the read maps uniquely to only one location in the genome.
      -m, to specify the maximum number of mapping mismatches allowed for a read to be used. This looks at .sam field 15. The default is 5. This option can be increased if reads are longer or a high number of polymorphisms are expected between the reference genome and the aligned reads. Of course, the most important consideration is to ensure that the same criteria are used for all samples.
      -b, inserts empty lines between reference sequences in the result table for easier JMP parsing (Do not use if the reference sequence contains more than few major chromosomes or contigs).
      -r, “remove file”, a file containing a list of reference sequences to ignore, in the sam header format. There is an included example file Remove-Sample.txt in the archive. This option can be useful if the organelle sequences are included in the genomic sequence for example (Note 6).
      -p, ploidy, default is 2 (diploid), this is used as the multiplier in the relative dosage calculation.
      -C, coverage only mode, which only outputs the read counts columns for each library, but not the relative dosage columns. This option cannot be used when a control library is specified.
    5. Data analysis
      The [sample]/control columns are plotted as an Overlay Plot on JMP for visualization (Figure 1). Other software platforms with graphing functions such as R can also be used as an alternative to JMP for generating the overlay plots for each (sample)/control column.

Representative data


Figure 1. Example dosage plot of a diploid (Sample 1) and a primary Chr3 trisomic (2n + 1) aneuploid (Sample 2) from Arabidopsis thaliana, based on a 100 kb bin size. Relative centromere positions are indicated by red diamonds. The noisy area around the pericentromeric regions of the trisomic Chr3 is due to increased Col-0:Ler ratio (2:1) of the trisomic chromosome when normalized to a diploid control that contains 50% Col-0 (or 1:1 Col-0:Ler). This variability is absent from the 50% Col-0:Ler diploid individual (Sample 1). Reviewing multiple individuals from the same dataset can identify regions with such variation.


Figure 2. Dosage plot analysis on a shattered aneuploid Chr1 from Arabidopsis thaliana divided in 50 kb bins (each dot represents a bin) and using variable number of reads as input: 250,000 reads (top), 1 million reads (middle) and 4 million reads (bottom). This data illustrates how increased read count is necessary for the detection of smaller dosage variations. For species with bigger genome sizes, the number of reads necessary to obtain a similar level of detection increases accordingly. Similarly, for polyploid genomes, read coverage has to be higher to compensate for the relatively smaller increase or decrease in copy number in a higher ploidy background (Figure 3).


Figure 3. Effect of polyploidy on dosage variation detection. Using data from gamma-irradiated poplar, we created an in silico "dilution series" of the signal originating from a deletion event. Reads from a diploid individual carrying a heterozygous deletion were pooled with increasing numbers of reads from a control diploid individual, to model the decrease in coverage expected from the loss of one copy out of a starting ploidy-level ranging from 2 to 12 (y-axis).

Notes

  1. Controls for dosage plot analyses. A control euploid sequence of at least equal coverage is highly recommended for each analysis especially if there are only a few samples (less than 10). If a larger population is used or no control is available, the mean of all samples can be used as the control.
  2. For example, for work in Tan et al., 2015, genomic DNA was isolated using the Illustra Nucleon Phytopure Kit from 2-3 medium sized Arabidopsis leaves. The resulting DNA was analyzed on a Nanodrop 2000 to determine concentration with 260/230 absorbance ratio of around 2. For reliable results, the DNA must be free of RNA, nucleotides, or other compounds that have a spectral light absorbance similar to that of DNA. If using a different protocol for DNA isolation, running 100 ng of the resulting DNA on an electrophoresis gel should show high molecular weight bands devoid of RNA with no smearing, along with the expected 260/230 absorbance readings from Nanodrop 2000.
  3. Although a PCR-free method is described here, amplified genomic libraries as well as exon capture libraries have been used successfully for this analysis.
  4. Because of the technology used by Illumina’s sequencing platform, each sequencing read represents a single data point such that single 50 bp reads, single 100 bp reads or 100 bp paired-end reads each account for one data point in this analysis. The more expensive paired-end reads should therefore only be used if the additional sequence data is needed for mapping or other purposes, such as SNP analysis.
  5. The depth of sequencing determines the sensitivity of the analysis. For Arabidopsis thaliana, a read count of one hundred thousand is adequate to detect primary aneuploidies at a bin size of 150-200 kb. Finer dosage changes (10-50 kb) will require around 1 million reads (Figure 2).
  6. During the analysis, it is important to compare samples. In our experience, there are regions of the genome that exhibit variability in the dosage plots even in control samples, such as, for example, pericentromeric regions or other repeated regions (Figure 1). This is particularly relevant when mapping reads from one species or variety to a reference sequence from a closely-related yet different species. Additionally, in some species, regions similar to organellar sequences are sometimes included in the genomic reference sequence. Because variable amounts of organellar DNA are often co-purified with the genomic DNA, such regions exhibit wide variation in coverage. These types of variable regions are normally easy to identify as they vary in opposite directions in different samples and should be discarded from analysis. If the reference sequence fasta file contains one or two organellar genome sequences, these can be removed using the -r option, or can be omitted when plotting relative dosage.

Acknowledgments

This work was funded by HHMI and the Gordon and Betty Moore Foundation (GBMF) through grant GBMF3068 to L.C. and by the DOE Office of Science, Office of Biological and Environmental Research (BER), grants no. DE-SC0007183 (to L.C. and I.M.H.). This protocol was adapted from Tan et al. (2015).

References

  1. Henry, I. M., Dilkes, B. P., Miller, E. S., Burkart-Waco, D. and Comai, L. (2010). Phenotypic consequences of aneuploidy in Arabidopsis thaliana. Genetics 186(4): 1231-1245.
  2. Henry, I. M., Nagalakshmi, U., Lieberman, M. C., Ngo, K. J., Krasileva, K. V., Vasquez-Gross, H., Akhunova, A., Akhunov, E., Dubcovsky, J., Tai, T. H. and Comai, L. (2014). Efficient genome-wide detection and cataloging of EMS-induced mutations using exome capture and next-generation sequencing. Plant Cell 26(4): 1382-1397.
  3. Henry, I. M., Zinkgraf, M. S., Groover, A. T. and Comai, L. (2015). A System for Dosage-Based Functional Genomics in Poplar. Plant Cell 27(9): 2370-2383.
  4. Li, H. and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14): 1754-1760.
  5. Ravi, M., Marimuthu, M. P., Tan, E. H., Maheshwari, S., Henry, I. M., Marin-Rodriguez, B., Urtecho, G., Tan, J., Thornhill, K., Zhu, F., Panoli, A., Sundaresan, V., Britt, A. B., Comai, L. and Chan, S. W. (2014). A haploid genetics toolbox for Arabidopsis thaliana. Nat Commun 5: 5334.
  6. Tan, E. H., Henry, I. M., Ravi, M., Bradnam, K. R., Mandakova, T., Marimuthu, M. P., Korf, I., Lysak, M. A., Comai, L. and Chan, S. W. (2015). Catastrophic chromosomal restructuring during genome elimination in plants. Elife 4.

简介

可以使用来自Illumina测序的对准读数确定相对染色体剂量,即,一个样品中特定染色体区域的拷贝数相对于另一个样品的拷贝数的增加或减少(Henry等人,2010)。以下方案用于鉴定拟南芥中单亲基因组消除的不同类型的非整倍体,包括染色体发生染色体(Tan等,2015)。单亲基因组消除导致从杂交到称为“单倍体诱导物”的特异性菌株的单倍体后代的产生(Ravi等,2014)。另一方面,在癌症基因组中首次发现的chromothripsis是导致聚类,高度重排的染色体的现象。在植物中,作为基因组消除的结果已经观察到chromothripsis(Tan等人,2015)。检测染色体剂量的变化在与基因组消除相关的那些旁边有多种应用。例如,通过花粉粒的γ-照射产生杨树杂种的剂量变异种群。使用该技术鉴定了数百个剂量损伤,插入和缺失,并提供了一种将基因座与在该群体中观察到的表型后果相关联的方法(Henry等,2015)。
该方法已成功应用于许多不同物种的染色体用量变化,包括拟南芥(Tan et al。,2015),拟南芥(Ravi et al。,2014),水稻(Henry et al。,2010)和杨树(Henry等,2015)。重要的是注意剂量图总是表示相对于使用的对照样品的剂量变化(注1)。因此,这种方法不适合于检测倍性变体(例如二倍体与三倍体)。类似地,该技术不允许检测平衡的染色体重排,例如互易易位。

关键字:剂量分析, 全基因组测序, 染色体碎裂, 基因组的消除, 非整倍体

材料和试剂

  1. 96微TUBE板(Covaris Inc.,目录号:520078)
  2. 基因组DNA
  3. Illustra Nucleon Phytopure试剂盒(GE Healthcare,目录号:RPN8511)
  4. KAPA Hyper Prep试剂盒(KAPA Biosystems,目录号:KK8504)
  5. NextFlex-96衔接头(Bioo Scientific,目录号:514106)
  6. Agencourt AMPure XP(Beckman Coulter,目录号:A63882)
  7. 新鲜80%乙醇(Sigma-Aldrich,目录号:E7023)
  8. DEPC水(BioExpress,目录号:G-3223-1L)
  9. Qubit dsDNA HS测定试剂盒(Thermo Fischer Scientific,目录号:Q32854)

设备

  1. 超声波器(Covaris,型号:E220聚焦超声波器)
  2. 具有96孔板容量的PCR循环仪
  3. 磁性板(Thermo Fischer Scientific,型号:12331D)
  4. Qubit 2.0(Thermo Fisher Scientific,型号:Q32866)
  5. Illumina测序平台
  6. Nanodrop 2000分光光度计(Thermo Fischer Scientific,型号:ND-2000C)

软件

  1. Burrows-Wheeler对齐器(Li和Durbin,2009)
  2. bwa( http://bio-bwa.sourceforge.net/
  3. bin-by-sam.py( http: //comailab.genomecenter.ucdavis.edu/index.php/Bin-by-sam
  4. allprep( http://comailab.genomecenter.ucdavis.edu/index.php/Barcoded_data_preparation_tools
  5. bwa-doall( http://comailab.genomecenter。 ucdavis.edu/index.php/Bwa-doall
  6. Python2.6或Python2.7

程序

  1. 基因组DNA片段化由Covaris
    1. 分离高质量的基因组DNA,并确定浓度与1微克的最小产量和至少20 ng /μl的浓度。详情请参阅注释2。
    2. 吸取500 ng DNA输入到96 microTUBE Plate的每个孔中,加入无核酸酶的水,每孔达到27μl。
    3. 在E220 Covaris超声仪中用以下设置剪切DNA:
      峰值入射功率(W)175
      责任系数5%
      每脉冲周期200
      治疗时间60秒
      直接进入KAPA无PCR Hyper-Prep。

  2. KAPA PCR-free Hyper-Prep(注3)
    1. 结束修理和卸货
      碎片DNA 25μl
      末端修复&拖尾缓冲液(Hyper-Prep试剂盒)3.5μl
      末端修复& A-tailing酶混合物(Hyper-Prep试剂盒)1.5μl
      总体积30μl
      一个。在热循环仪中在20℃下孵育30分钟,随后在65℃下孵育30分钟。
      b。立即进入结扎步骤。
    2. 适配器连接
      末端修复&拖尾反应30μl
      DEPC水3μl
      连接缓冲液(Hyper-Prep试剂盒)15μl
      DNA连接酶(Hyper-Prep试剂盒)5μl
      适配器原液(2.5μM)2微升
      总体积55μl
      一个。在20℃孵育15分钟。
      b。立即进行连接后清理步骤。
    3. 连接后清理
      接头连接反应产物54μl
      Ampure(0.8x体积)43微升
      总体积为97μl
      1. 轻轻混匀,在室温下孵育15分钟,使DNA与珠子结合
      2. 将反应放在磁性板上以从溶液中分离珠子。当液体澄清时,取出上清液,用200μl80%乙醇洗涤珠子两次
      3. 让珠子干燥(由制造商推荐),然后用25μlDEPC水洗脱
      4. 池样品。对于96个样品,代表用于剂量分析的96个单独文库的典型合并策略将是从12个文库的组中分装3μl。使用Qubit dsDNA HS测定试剂盒测定8个亚库的浓度,然后将等量的来自每个亚库的DNA组合成主库。使用Qubit dsDNA HS测定试剂盒确定该主池的浓度,如果其为至少20ng /μl,总DNA含量为?1μg,则可以进行无PCR的Illumina测序。可能需要另外的Ampure(0.8x)纯化,随后在较低体积下洗脱以增加主池的最终浓度。不考虑汇集策略,根据适配器中存在的指数序列,最终可以将汇集样品中的测序读数重新分配给它们各自的样品,因为在适配子连接后库被汇集。
      5. 提交无PCR库用于测序[其他选项参见(注3)]。所使用的Illumina测序平台通常是单读50(注4)。每个个体需要的读取数量取决于实验的范围和正在进行的生物体的基因组大小[参见(注释4,5)以及图2和3]。

  3. 映射和剂量图
    1. 修剪读取质量,并使用Burrows-Wheeler对齐器将其对齐到TAIR10基因组上(Li和Durbin,2009)。为此,我们提供了称为allprep.py和bwa-doall.py的脚本。这些脚本的详细文档可从提供的链接获得。
    2. 在对齐后将.sai文件转换为.sam文件(如果您使用bwa-doall.py脚本,则此步骤由脚本自动执行。)
      bwa samse [database] [filename] _aln.sai [filename] .fq> [filename] _aln.sam
    3. 在包含sam文件的文件夹中运行bin-by-sam.py以生成剂量图。此脚本的高度详细的说明和示例可从主文档页面获取,也可以直接在此处访问: http://comailab.genomecenter.ucdavis.edu/images/3/30/README-bin-by-sam.pdf。
      bin -by-sam.py -o output-bin-file.txt -s size-of-bins [-c control.sam文件] [-u] [-m最大snps数,默认值为5] [-b ] [-r] [-p相对百分比计算的倍性] [-C] 有关不同参数含义的帮助:bin-by-sam.py -h。
      输入:
      在具有输入_aln .sam文件的目录中运行脚本。
      输出:
      一个文件沿着每个参考序列具有沿着每个非重叠的连续仓的线和用于每个输入.sam文件的两个列:一个指示映射到每个仓的读数的数量,另一个指示相对于控制的相应剂量。
      具体示例:基于包含一组.sam文件作为起点的文件夹中的1 Mb大小的bin,推荐的初始剂量图分析。
      bin-by-sam.py -o 1Mb_bin.txt -s 1000000
      在运行该初始分析之后,获得的读取计数可以用作适当的最小仓大小的指示。作为经验法则,应当使用不少于每个仓平均100个读取计数(见图2和图3)。
    4. 参数
      必需:
      -o,输出文件名(例如"-o Dosage_100kb_control2.txt")
      -s,bin大小(以bps为单位)(例如对于100 kb分区,为"-s 100000")
      可选:
      -c,要使用控制示例进行相对百分比覆盖计算,请在此处指定文件名。如果没有指定文件,则所有样品的平均值用作每个bin的控制值(注1)。
      -u,仅使用samtools标记的唯一读取(XT:A:U),其中读取仅映射到基因组中的仅一个位置。
      -m,指定允许使用的读取的最大映射不匹配数。这看起来是.sam字段15.默认值为5.如果读取更长或在参考基因组和比对读段之间预期有大量多态性,则可以增加此选项。当然,最重要的考虑是确保对所有样品使用相同的标准 -b,在结果表中的参考序列之间插入空行以便于JMP解析(如果参考序列包含多个主要染色体或重叠群,则不要使用)。
      -r,"删除文件",一个包含要忽略的引用序列列表的文件,采用sam头格式。在归档中有一个包含的示例文件Remove-Sample.txt。如果细胞器序列包括在基因组序列中(例如注释6),则此选项可能很有用。
      -p,倍性,默认为2(二倍体),这用作相对剂量计算中的乘数??。
      -C,仅覆盖模式,其仅输出每个库的读取计数列,而不输出相对剂量列。指定控制库时,不能使用此选项。
    5. 数据分析
      [sample] /控制列被绘制为JMP上的覆盖图,用于可视化(图1)。具有图形功能(如R)的其他软件平台也可以用作JMP的替代方案,用于为每个(样本)/控制列生成覆盖图。

代表数据


图1.来自拟南芥的二倍体(样品1)和初级Chr3三基因组(2n n + 1)非整倍体(样品2)的示例性剂量曲线图 em> aliana ,基于100 KB的大小。相对的着丝粒位置用红色菱形表示。三染色体Chr3的周子周围区域周围的噪声区域是由于三体染色体的Col-0:L em/r比率(2:1)增加而归一化为包含50%Col-0(或1:1 Col-0:L )的二倍体对照。这种变异性不存在于50%Col-0:Lem二倍体个体(样品1)。从同一数据集查看多个个体可以识别具有此类变化的区域

图2.来自拟南芥的破碎的非整倍体Chr1的剂量图分析,其分为50kb的箱(每个点代表一个箱)和使用可变数目的读取作为输入:250,000读数(上) ,1百万读数(中间)和4百万读数(下)。该数据说明了如何增加读取计数对于检测较小剂量变化是必要的。对于具有更大基因组大小的物种,获得相似水平的检测所必需的读数相应地增加。类似地,对于多倍体基因组,阅读覆盖率必须更高以补偿在更高倍性背景中拷贝数的相对较小的增加或减少(图3)。


图3.多倍体对剂量变化检测的影响使用来自γ-照射的杨树的数据,我们创建了来源于删除事件的信号的 "稀释系列" 。从携带杂合缺失的二倍体个体的读取与来自对照二倍体个体的增加数量的读取汇集,以模拟从2至12的起始倍性水平损失一个拷贝所预期的覆盖减少(y-轴)。

笔记

  1. 剂量图分析的控制。强烈建议对每个分析,尤其是只有少数样品(小于10)时,至少具有相等覆盖度的对照整倍体序列。如果使用更大的群体或没有可用的对照,则所有样品的平均值可以用作对照。
  2. 例如,对于2015年的Tan等人的工作,使用来自2-3个中等大小的拟南芥叶的Illustra Nucleon Phytopure试剂盒分离基因组DNA。在Nanodrop 2000上分析所得的DNA,以确定260/230的吸光度比约为2的浓度。为了获得可靠的结果,DNA必须不含RNA,核苷酸或具有类似于DNA的光谱吸光度的其他化合物。如果使用不同的DNA分离方案,在电泳凝胶上运行100ng所得的DNA应该显示没有RNA的高分子量条带,没有涂抹,以及来自Nanodrop 2000的预期的260/230吸光度读数。
  3. 尽管这里描述了无PCR方法,但是扩增的基因组文库以及外显子捕获文库已经成功地用于该分析。
  4. 由于Illumina的测序平台使用的技术,每个测序读数代表单个数据点,使得单个50bp读数,单个100bp读数或100bp配对末端读数在本分析中占据一个数据点。因此,只有在映射或其他目的(如SNP分析)需要附加序列数据时,才应使用更昂贵的配对末端读取。
  5. 测序的深度决定了分析的灵敏度。对于拟南芥(Arabidopsis thaliana),读数为十万的足以检测150-200kb的大小的原始非整倍体。更精细的剂量变化(10-50 kb)将需要大约1百万次读数(图2)。
  6. 在分析过程中,比较样品很重要。在我们的经验中,基因组的区域在剂量曲线图中表现出变化性,即使在对照样品中,例如,围周着色区域或其他重复区域(图1)。这在将来自一个物种或品种的读取映射到来自密切相关但不同物种的参考序列时特别相关。另外,在一些物种中,与细胞器序列相似的区域有时包括在基因组参考序列中。因为可变量的细胞器DNA通常与基因组DNA共纯化,所以这些区域在覆盖范围上表现出宽的变化。这些类型的可变区通常易于鉴定,因为它们在不同样品中在相反方向上变化,并且应当从分析中丢弃。如果参考序列fasta文件包含一个或两个细胞器基因组序列,可以使用-r选项删除,或在绘制相对剂量时可以省略。

致谢

这项工作是由HHMI和戈登和贝蒂摩尔基金会(GBMF)通过授予GBMF3068由L.C.和美国能源部科学办公室,生物和环境研究办公室(BER),授予号。 DE-SC0007183(L.C.和I.M.H.)。该协议改编自Tan等人(2015)。

参考文献

  1. Henry,IM,Dilkes,BP,Miller,ES,Burkart-Waco,D.and Comai,L。(2010)。  拟南芥中非整倍体的表型结果 186(4):1231-1245 。
  2. Henry,IM,Nagalakshmi,U.,Lieberman,MC,Ngo,KJ,Krasileva,KV,Vasquez-Gross,H.,Akhunova,A.,Akhunov,E.,Dubcovsky,J.,Tai,THand Comai,L 。(2014)。  有效的全基因组检测和编目使用外显子捕获和下一代测序的EMS诱导的突变。 植物细胞 26(4):1382-1397。
  3. Henry,IM,Zinkgraf,MS,Groover,AT和Comai,L.(2015)。  杨树中基于剂量的功能基因组学的系统。 植物细胞 27(9):2370-2383。
  4. Li,H。和Durbin,R。(2009)。  通过Burrows-Wheeler变换快速,准确地进行短读取。 生物信息学 25(14):1754-1760。
  5. Ravi,M.,Marimuthu,MP,Tan,EH,Maheshwari,S.,Henry,IM,Marin-Rodriguez,B.,Urtecho,G.,Tan,J.,Thornhill,K.,Zhu, ,A.,Sundaresan,V.,Britt,AB,Comai,L.and Chan,SW(2014)。  拟南芥的单倍体遗传学工具箱。 Nat Commun 5:5334。
  6. Tan,EH,Henry,IM,Ravi,M.,Bradnam,KR,Mandakova,T.,Marimuthu,MP,Korf,I.,Lysak,MA,Comai,L.and Chan,SW(2015) a class ="ke-insertfile"href ="http://www.ncbi.nlm.nih.gov/pubmed/25977984"target ="_ blank">植物中基因组消除过程中的灾难性染色体重组。 > Elife 4.
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright Tan et al. This article is distributed under the terms of the Creative Commons Attribution License (CC BY 4.0).
引用: Readers should cite both the Bio-protocol article and the original research article where this protocol was used:
  1. Tan, E. H., Comai, L. and Henry, I. M. (2016). Chromosome Dosage Analysis in Plants Using Whole Genome Sequencing. Bio-protocol 6(13): e1854. DOI: 10.21769/BioProtoc.1854.
  2. Tan, E. H., Henry, I. M., Ravi, M., Bradnam, K. R., Mandakova, T., Marimuthu, M. P., Korf, I., Lysak, M. A., Comai, L. and Chan, S. W. (2015). Catastrophic chromosomal restructuring during genome elimination in plants. Elife 4.06516.
提问与回复

(提问前,请先登录)bio-protocol作为媒介平台,会将您的问题转发给作者,并将作者的回复发送至您的邮箱(在bio-protocol注册时所用的邮箱)。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片或者视频的形式来说明遇到的问题。由于本平台用Youtube储存、播放视频,作者需要google 账户来上传视频。

当遇到任务问题时,强烈推荐您提交相关数据(如截屏或视频)。由于Bio-protocol使用Youtube存储、播放视频,如需上传视频,您可能需要一个谷歌账号。