搜索

Whole Genome Bisulfite Sequencing and DNA Methylation Analysis from Plant Tissue
植物组织全基因组的亚硫酸盐测序和DNA甲基化分析

下载 PDF 引用 收藏 提问与回复 分享您的反馈

本文章节

Abstract

This protocol describes whole genome bisulfite-sequencing library preparation from plant tissue and subsequent data analysis. Allele-specific methylation analysis and genome-wide identification of differentially methylated regions are additional features of the analysis procedure.

Part I. Whole genome bisulfite sequencing library preparation

Materials and Reagents

  1. RNAse-treated DNA (at least 1 µg, volume should not exceed 130 µl)
  2. AMPure XP beads (Beckman Coulter, catalog number: A63880 )
  3. 200 proof Ethyl alcohol (Sigma-Aldrich, catalog number: 459844-500ML )
  4. Illumina TruSeq kit (Illumina, catalog number: FC-121-2001 )
  5. Methylcode Invitrogen kit (Life Technologies, InvitrogenTM, catalog number: MECOV50 )
    Note: Other bisulfite conversion kits can be used, but were not tested in this protocol.
  6. T4 DNA ligase (NEB, catalog number: M0202T )
  7. Pfu Cx Hotstart DNA polymerase (Agilent, catalog number: 600412 )
  8. TOPO Blunt cloning kit (Life Technologies, InvitrogenTM, catalog number: 450245 )
  9. TOPO TA cloning kit (Life Technologies, InvitrogenTM, catalog number: 450030 )
  10. Ex Taq DNA polymerase (TaKaRa, Clontech, catalog number: RR001B )
  11. 1 Kb-plus DNA ladder (Life Technologies, InvitrogenTM, catalog number: 10787-018 )
  12. Agarose
  13. 1x TAE buffer (see Recipes)

Equipment

  1. Agarose gel electrophoresis apparatus
  2. Covaris microTUBE prep station (Covaris, part number: 500142 )
  3. Covaris microTUBE AFA Fiber Pre-Slit Snap-Cap (6 x 16 mm) (Covaris, part number: 520045 )
  4. Low-retention 1.5 ml tubes (Thermo Fisher Scientific, catalog number: 02-681-320 )
  5. TempAssure PCR 8-Strips (USA Scientific, catalog number: 1402-4700 )
  6. Magnetic stand for 1.5 ml tubes (DynaMagTM-2 magnet, Life Technologies, InvitrogenTM, catalog number: 12321D )
  7. Focused-ultrasonicator (Covaris, S-Series, model: S220 )
  8. Microcentrifuge (VWR International, catalog number: 93000-204 )
  9. Thermocycler

Procedure

  1. Prepare reagents before starting
    1. Make fresh 80% ethanol.
      Tip: It is important to use fresh 80% ethanol for Ampure bead purification.
    2. Shake the Agencourt AMPure XP bottle to resuspend beads. 180 µl of beads will be needed per sample (ratio of 1.4:1 beads to DNA). Keep at room temperature.

  2. Shear DNA with Covaris sonicator
    1. Fill tank with deionized water to 12.5 on the graduate fill line label.
    2. Turn on the machine, the chiller (set temperature at +3 °C), and the computer.
    3. Launch the application software (Covaris SonoLab 7). Push the degas button. Degas the instrument for at least 30 min before use.
    4. Settings: Peak power 175 W, duty factor 10, cycles/burst 200, time 6 min, temperature 6 °C. Note that the temperature rises while the machine is in use, which could interfere with the shearing cycle. Thus, the machine is initially set to 3 °C as a precaution.
    5. Transfer 130 µl of DNA sample (~10 ng/µl) into the Covaris microTUBE, spin down briefly and proceed with shearing.
    6. Run 1.2% gel with sheared DNA (250 ng) and un-sheared DNA side by side to evaluate shearing. Samples can also be run on a Bioanalyzer. Expected size range is 100-300 bp. See representative data Figure 1.

  3. DNA purification (AMPure beads)
    Refer to Agencourt AMPure XP PCR Purification - Instructions For Use (B37419AA)
    (https://www.beckmancoulter.com/wsrportal/techdocs?docname=B37419AA).
    1. Place the tube into Covaris microTUBE Prep Station and remove the cap. Transfer the 130 µl of sample to a low-DNA binding microfuge tube. Add 180 µl of resuspended beads and mix well by pipetting.
    2. Incubate for 5 min at room temperature.
    3. Place the tube onto the magnetic stand for 2 min to separate beads from the solution.
    4. Aspirate the cleared solution from the tube and discard. Do not disturb the beads.
    5. Dispense 250 μl of 80% ethanol to the tube and incubate for 30 sec at room temperature.
    6. Aspirate the ethanol and discard. Repeat for a total of two washes.
    7. Let the beads air dry for 5 min.
      Tip: Be careful not to over-dry the beads (bead pellet appears cracked if over-dried) as this will significantly decrease elution efficiency.
    8. Remove the microfuge tube from the magnetic stand. Add 52 μl of water to the beads and resuspend the beads by pipetting 10 times. Incubate for 2 min at room temperature.
    9. Place the tube back on the magnetic stand. Transfer 50 µl of the clear supernatant to a new PCR tube without disturbing the beads.

  4. End Repair of the purified sheared DNA (Illumina TruSeq kit)
    1. Preheat thermocycler to 30 °C.
    2. Add 10 µl of Resuspension Buffer and 40 µl End Repair Mix to the sheared DNA from step C9.
    3. Adjust pipette to 50 µl then gently pipette the entire volume up and down 10 times.
    4. Incubate for 30 min at 30 °C.

  5. DNA Purification (AMPure beads)
    1. Vortex the AMPure beads until they are well dispersed.
    2. Prepare a diluted bead mixture by combining 125 µl beads with 35 µl of PCR grade water.
    3. Add the entire volume from step D4 (100 µl) to 160 µl of the diluted beads.
    4. Adjust pipette to 200 µl then gently pipette the entire volume up and down 10 times.
    5. Incubate at RT for 10 min.
    6. Place the tube on the magnetic stand for 5 min.
    7. Remove and discard the supernatant from each tube without disturbing the beads.
    8. With the tube on the magnetic stand, add 200 µl of freshly prepared 80% ethanol without disturbing the beads. Incubate 30 sec.
    9. Remove and discard all the supernatant from each tube without disturbing the beads. Repeat ethanol wash once.
    10. Let the tube stand at RT to dry for 5-8 min. Remove the tube from the stand and resuspend the dried pellet with 17.5 µl Resuspension Buffer from the Illumina TruSeq kit. Gently pipette the entire volume up and down 10 times to mix thoroughly.
    11. Incubate the tube at RT for 2 min.
    12. Place the tube on the magnetic stand at RT for 3 min.
    13. Transfer 15 µl of the clear supernatant to a new PCR tube without disturbing the beads.
      SAFE STOPPING POINT DNA can be stored at -20 °C for up to 7 days.

  6. Adenylate 3’ Ends (Illumina TruSeq kit)
    1. Remove the A tailing mix from -20 °C and thaw at RT.
    2. Preheat thermal cycler at 37 °C.
    3. Add 2.5 µl of Resuspension buffer and 12.5 µl of A tailing mix to the tube from step E13.
    4. Adjust pipette to 30 µl, then gently pipette the entire volume up and down 10 times.
    5. Incubate at 37 °C for 30 min.
    6. Immediately proceed to adapter ligation.

  7. Ligate Adapters (Illumina TruSeq kit)
    Illumina TruSeq DNA adapters, which contain 5-methylcytosines instead of cytosines, are ligated in a 50 μl reaction.
    1. Remove the Adapter Index tubes from -20 °C and thaw at RT.
    2. Combine sample and reagents as indicated.
      DNA sample (from step F6)
      30 µl
      Adapters
      2.5 µl
      T4 DNA Ligase (2,000,000 U/ml)
      2.5 µl
      10x T4 Ligase buffer with ATP
      5 µl
      Water
      10 µl
    3. Incubate overnight at 16 °C.

  8. DNA Purification (AMPure beads)
    1. Vortex the AMPure beads until they are well dispersed.
    2. Add 42.5 µl of the beads to the sample.
    3. Adjust pipette to 85 µl then gently pipette the entire volume up and down 10 times.
    4. Incubate at RT for 10 min.
    5. Place the tube on the magnetic stand for 5 min.
    6. Remove and discard 80 µl of supernatant from each tube.
    7. With the tube on the magnetic stand, add 200 µl of freshly prepared 80% ethanol without disturbing the beads. Incubate 30 sec.
    8. Remove and discard all the supernatant from each tube without disturbing the beads. Repeat ethanol wash once.
    9. Let the tube stand at RT to dry for 8 min. Remove the tube from the stand and resuspend the dried pellet with 22.5 µl Resuspension Buffer. Gently pipette the entire volume up and down 10 times to mix thoroughly. Incubate at RT for 2 min.
    10. Place the tube on the magnetic stand at RT for 5 min.
    11. Transfer 20 of µl of the clear supernatant to new tube without disturbing the beads.
      SAFE STOPPING POINT. DNA can be stored at -20 °C for up to 7 days.

  9. Bisulfite treatment
    1. Use the MethylCode™ Bisulfite Conversion Kit Invitrogen following the manufacturer’s protocol.
      http://tools.lifetechnologies.com/content/sfs/manuals/methylcode_bisulfite_man.pdf
    2. Elute bisulfite-treated DNA in 10 µl.

  10. Library enrichment PCR
    Use 3 µl (from step I2) as a template in each of two PCR reactions.
    Thermal cycler program:
    95 °C for 2 min
    12-15 cycles*:
    95 °C for 20 sec
    60 °C for 30 sec
    72 °C for 1 min
    72 °C for 7 min
    *Keep the number of cycles as low as possible to reduce PCR duplicates.
    PCR master mix (50 µl reaction)
    10x Pfu polymerase buffer
    5 µl
    10 mM dNTPs
    1 µl
    PCR primer cocktail (Illumina TruSeq kit)
    5 µl
    Pfu Cx Hotstart DNA polymerase
    1 µl
    Water
    35 µl

  11. Library purification
    1. Make fresh 80% ethanol.
    2. Vortex the AMPure beads until they are well dispersed.
    3. Add 50 µl of beads to the sample.
    4. Adjust pipette to 85 µl then gently pipette the entire volume up and down 10 times.
    5. Incubate at RT for 10 min.
    6. Place the tube on the magnetic stand for 5 min.
    7. Remove and discard 80 µl of supernatant from each tube.
    8. With the tube on the magnetic stand, add 200 µl of freshly prepared 80% ethanol without disturbing the beads. Incubate 30 sec.
    9. Remove and discard all the supernatant from each tube without disturbing the beads. Repeat ethanol wash once.
    10. Let the tube stand at RT to dry for 8 min. Remove the tube from the stand and resuspend the dried pellet with 16 µl water. Gently pipette the entire volume up and down 10 times to mix thoroughly. Incubate at RT for 2 min.
    11. Place the tube on the magnetic stand at RT for 5 min.
    12. Transfer 15 of µl of the clear supernatant to new tube without disturbing the beads.
      SAFE STOPPING POINT. DNA can be stored at -20 °C for up to 7 days.

  12. Library validation
    1. Subject libraries to quality control on a Bioanalyzer before sequencing. Libraries should have a size range between 250 and 400 bp and the adapter dimer peak, if present, should be less than 10% of the library. See Figure 2 in representative data.
    2. Clone 1-2 µl into TOPO Blunt and sequence a few clones to check that the adapters are ligated as expected.
    3. Bisulfite conversion checkpoint: In plants, the chloroplast genome is expected to be unmethylated. Amplify chloroplast DNA from the library by PCR. Clone into TOPO TA cloning kit following manufacturer’s protocol and sequence by standard Sanger sequencing. Each C in the amplified PCR product should be converted and sequenced as a T.
      Template: 2 µl library
      10x Ex Taq buffer (Mg2+ plus)
      5 µl
      10 mM dNTPs
      1 µl
      10 µM For Oligo
      2.5 µl
      10 µM Rev Oligo
      2.5 µl
      TaKaRA Ex taq (5 units/µl)
      0.5 µl
      Water
      36.5 µl
      Oligos for chloroplast DNA (Groszmann et al., 2011):
      For: 5’-ATGATGTTGTTAGAATTTYATATAGG-3’
      Rev: 5’-CATCATTTARCTATCRCAATTCTTT-3’
      Thermal cycler program:
      95 °C for 2 min
      40 cycles:
      95 °C 15 sec
      52 °C 30 sec
      72 °C 2 min
      72 °C 10 min

  13. High throughput sequencing
    Sequence library (~10 pM) on Illumina HiSeq 2500 machine. Paired end or 80 bp single end reads allow for better mapping to the genome, but standard 40 bp single end reads are also acceptable. Sequencing depth needed depends on the size of the genome and can be calculated using the Lander/Waterman equation.
    The general equation is:
    C = LN / G
    • C stands for coverage
    • G is the haploid genome length
    • L is the read length
    • N is the number of reads
    Please refer to http://support.illumina.com/downloads/sequencing_coverage_calculator.html for more details.

Part II. Whole genome bisulfite sequencing data analysis

Equipment

  1. Linux computer with standard utility applications (including Perl)
    R (http://www.r-project.org/; base installation only is needed)
    Bismark (http://www.bioinformatics.babraham.ac.uk/projects/bismark/)
    FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
    FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)
    SAMtools (http://samtools.sourceforge.net/)
    bedtools (https://github.com/arq5x/bedtools2)

Procedure

Note: For all the scripts used in the Procedure, please download here.

  1. Quality control of sequenced reads
    1. Copy sequencing reads to desired location on local computer or server.
    2. Unzip and untar sequence files if needed.
      gunzip file_name.txt.tar.gz
      tar xvfp file_name.txt.tar
    3. Run quality control of sequencing reads with fastqc (http://hannonlab.cshl.edu/fastx_toolkit/).
      Usage: fastqc file_name.txt
    4. Examine the fastqc report to determine if/how to filter and/or trim reads. Discard adapters and low quality reads (less than 75% quality scores above 25) if necessary.
      Usage: fastq_quality_filter –q25 –p 75 –i file_name. txt –o file_name_trimmed.fastq
    5. Re-run quality control to check final quality of the reads.
      Usage: fastqc file_name_trimmed.fastq

  2. Align bisulfite reads with Bismark
    Align the reads to the reference genome of your choice using Bismark (Krueger and Andrews, 2011).
    We strongly encourage the reader to consult the review written by the authors of Bismark (http://www.ncbi.nlm.nih.gov/pubmed/22290186) before starting the data analysis. Here, we describe alignment of single end reads, but alignment of paired end reads is also possible.
    1. Prepare bisulfite-treated genome for future mapping (only needs to be done once).
      Usage: bismark_genome_preparation --path_to_bowtie reference_genome/fasta/
    2. Map BS-reads to converted genome (map forward and reverse reads separately).
      Usage: bismark --non_directional -o bismark_file_name_trimmed.n1l50 -n 1 -l 50 reference_genome file_name.trimmed.fastq
      Options are as follows:
      -non_directional: reads are not strand-specific (a strand-specific option is also available)
      -o output directory
      -n max number of mismatches
      -l length of seed (first number of nt that are mapped with <n mismatches)
      *fastq: short reads in fastq format

    From this point onwards the analysis follows different steps based on the genotype of the sample used to prepare the library.
    Option #1: Libraries from a single inbred strain

  3. Convert Bismark aligned reads from SAM to BAM, sort, and index
    cd bismark_file_name_trimmed.n1l50
    Usage: SAM_to_BAM_sort_index.pl file_name.fastq_bismark.sam
    Expected output file: file_name.fastq_bismark.sorted.bam

  4. Convert sorted BAM file to SAM format
    Usage: samtools view -h file_name.fastq_bismark.sorted.bam > file_name_sorted.sam Samtools (Li et al., 2009)

  5. Eliminate redundant reads
    Using a sorted SAM file, keep only one sequence per strand that maps to the same start and end position. This eliminates PCR duplicates. A log file (SAM_redundancy_stats.log.txt) of counts for each read is created.
    Usage: ./make_sorted_SAM_non-redundant_fewest_MM.pl file_name_sorted.sam > file_name_sorted_nr.sam
    This is how it works:
    The script reads a sorted SAM file and gets all the reads that map to the same position and strand and then
    1. Sorts them by decreasing prevalence and then by increasing number of mismatches (not counting bisulfite conversions) to the genome.
    2. The most prevalent read with the total highest quality string is kept.
    3. In the case of a tie, the read with the fewest number of mismatches is retained.
    4. In the case of another tie, it prints out every unique read.
      The output SAM file also has 3 new fields added on the end:
      CT == count = number of times this read was found in the input file
      MM == mismatches = number of mismatches compared to the genome sequence (not counting conversions)
      QS == quality score = total quality score for this read

  6. Calculate a methylation value for each cytosine
    Run Bismark’s supplementary script “bismark_methylation_extractor” on sorted SAM files.
    Usage: bismark_methylation_extractor –s file_name_sorted_nr.sam -o bismark_output
    This script creates 12 output files with methylation status of each C in three different contexts (CpG, CHG, CHH) with 4 mappings:
    OT - original top strand
    CTOT - complementary to original top strand
    OB - original bottom strand
    CTOB - complementary to original bottom strand
    Shorthand for methylation call according to context:
    z: unmethylated C in CpG context
    Z: methylated C in CpG context
    x: unmethylated C in CHG context
    X: methylated C in CHG context
    h: unmethylated C in CHH context
    H: methylated C in CHH context

  7. Calculate bisulfite conversion efficiency
    The mean bisulfite conversion rate for each library is calculated based on the methylation status of each cytosine from reads mapping to the chloroplast genome, which is expected to be unmethylated.
    1. Run Usage: ./methylation_summarizer_1.pl bismark_output > bismark_SampleName.summary_step1.txt
    2. Get info about methylation status of each nucleotide from each read mapping to the chloroplast.
      Usage: grep ChrC bismark_*_trim3.n1l50.summary_step_1.txt > bismark_*_trim3.n1l50.summary_step_1.chloroplast.txt./methylation_summarizer_2_conversion.pl
      bismark_*_trim3.n1l50.summary_step_1.chloroplast.txt >bismark_*_trim3.n1l50.conversion.txt
      The output file contains the conversion at each position, and the mean C conversion across the chloroplast is printed to the screen.

    Option #2: Libraries from F1 crosses
  8. Map reads from hybrid libraries
    In the case of F1 hybrid libraries, two parental genomes may be available as reference. You may decide to map reads to both the parental genomes to maximize the number of mapped reads using Bismark as described above. To do that, first map all the reads to the best reference genome, then align the “unmapped” reads to the other genome.
    Make file of unmapped reads from the fastq file (e.g., reads not in SAM file).
    Usage: ./get_fastq_reads_not_in_SAM.pl fastqFile samFile > reads.unmapped.fastq
    To assign reads to a particular strain and to retain as many unique reads as possible, separate the reads by strand.
    Script: split_bismark_SAM_by_read_strand.pl
    Usage: split_bismark_SAM_by_read_strand.pl file_name_bismark.sam

  9. Classify reads by parent-of-origin (allele specific DNA methylation analysis)
    Classify the forward reads according to their parent-of-origin with a bedfile where the C>T SNPs between the two genomes of interest are ignored but all other SNPs are retained, and classify the reverse reads with a bedfile where the G>A SNPs between the two genomes of interest are ignored but all other SNPs are retained. There is only one bedfile, but read strand has to be specified (pos or neg).
    Script: split_bismark_SAM_by_read_strand.pl
    Usage: split_bismark_SAM_by_read_strand.pl file_name_bismark.sam
    Expected output files: file_name_bismark.pos.sam, file_name_bismark.neg.sam
    Script: classify_bismark_reads_by_parent.one_strand.pl
    Usage: classify_bismark_reads_by_parent.one_strand.pl file_name_bismark.pos.sam SNPs.bed pos > file_name_bismark.pos_class.sam"
    Usage: classify_bismark_reads_by_parent.one_strand.pl file_name_bismark.neg.sam SNPs.bed neg > file_name_bismark.neg_class.sam"
    Reads are classified based on their sequence at known SNP positions. After classification, redundant reads from each class are eliminated.
    To differentiate between 4 cases:
    1. ST:Z: maternal
    2. ST:Z: paternal
    3. ST:Z: NE means no evidence for either genome
    4. ST:Z: both means evidence for both (conflicting data)
      awk -F"\t" '$19 == "ST:Z: maternal " {print $0}' file_name_bismark.pos_class.sam > maternal.sam
      awk -F"\t" '$19 == "ST:Z: paternal " {print $0}' file_name_bismark.pos_class.sam > paternal.sam
      awk -F"\t" '$19 == "ST:Z:NE" {print $0}' file_name_bismark.pos_class.sam > NE.sam
      awk -F"\t" '$19 == "ST:Z:both" {print $0}' file_name_bismark.pos_class.sam > both.sam
      Run similar commands with file_name_bismark.neg_class.sam

  10. Combine the SAM files
    Example: cat file_name_neg.sam file_name_pos.sam > file_name_pos_neg.sam
    You may delete the individual pos and neg files after checking the size of the combined files.

  11. Prefix the header to the sam file
    cat header file_name_pos_neg.sam > file_name_pos_neg.header.sam

  12. Convert SAM to BAM, sort, and index
    SAM_to_BAM_sort_index.pl file_name_pos_neg.header.sam
    Expected outputfile: file_name_pos_neg.sorted.bam

  13. Convert sorted BAM file to SAM format
    samtools view -h file_name_pos_neg.sorted.bam > file_name_pos_neg.sorted.sam

  14. Eliminate redundant reads (see step 5)
    Script: make_sorted_SAM_non-redundant_fewest_MM.pl
    Usage: make_sorted_SAM_non-redundant_fewest_MM.pl file_name_pos_neg.sorted.sam > file_name_pos_neg.sorted_nr.sam

  15. Calculate a methylation value for each cytosine (see step 6)
    Run methylation extractor for each set of classified reads as well as for all reads combined.
    Usage: methylation_extractor -s file_name_pos_neg.sorted_nr.sam
    Combine all the nr.sam files into an “allreads_nr.sam” file with cat command and run methylation_extractor again.

  16. Summarize the methylation status across genomic windows
    For each class, organize the 12 methylation extractor files in the output files folder. Divide the genome into 300 nt (windowWidth) windows, overlapping by 100 nt (windowOverlap).

    Script: make_genome_windows_bed.pl (requiring a file of each chromosome and its length)
    Usage: make_genome_windows_bed.pl chromInfo.txt windowWidth windowOverlap > genome_300nt_100_windows.bed

    Script: Summarize_by_window.sh
    Usage: bash Summarize_by_window.sh methylation_outputfiles genome_300nt_100_windows.bed

    Summarize_by_window.sh features: This script summarizes methylation status across overlapping genomic windows of defined size by converting the processed Bismark methylation extractor output files into a set of bed files and determining weighted methylation values as described in Schultz et al. (2012). Bedgraph files for viewing in a genome browser are created in the bedgraph folder. Bismark's methylation extractor output by chr position (after sorting) is summarized by converting the methylation string into ummethylated counts, methylated counts, and percent methylation, producing output in BED-like format (scorePerPos folder). The folder weighted_summaries_by_window has three bed files, which merge sites across each window. For each window it provides counts and weighted percent methylation.

  17. Identify differentially methylated regions (DMRs)
    At least 5-read coverage at each site is required. Differential methylation is assayed by calculating the difference between samples (sample A-sample B of weighted methylation fractions), and confidence (p-value from Fisher's exact test) for each window in all sequence contexts is assigned. P values are corrected with the Benjamini and Hochberg False Discovery Rate (FDR). CG and CHG DMRs were defined as regions with a minimum overlap of 3 informative Cs between windows and, for example, a weighted methylation difference of at least 35 and a corrected p value < 0.01. CHH DMRs were defined as windows with a minimum 10 overlapping informative cytosines and, for example, a weighted methylation difference of at least 10 and a p value < 0.01. The user can use their own criteria for defining DMRs.
    To compare mC window counts in two samples the following scripts are needed:
    merge_bedgraph_data_counts_etc.pl
    compare_methylation_counts_by_window.R
    1. Make matrix of counts for 2 samples, each in 2 contexts using files in the 'weighted_summaries_by_window' folders.
    2. Example:
      ./merge_bedgraph_data_counts_etc.pl genome_300nt_100_windows.bed
      sample_A_reads/weighted_summaries_by_window/CpG.bed
      sample_B_reads/weighted_summaries_by_window/CpG.bed
      sample_A_reads/weighted_summaries_by_window/CHG.bed
      sample_B_reads/weighted_summaries_by_window/CHG.bed
      sample_A_reads/weighted_summaries_by_window/CHH.bed
      sample_B_reads/weighted_summaries_by_window/CHH.bed >
      sample_A_vs_sample_B_reads.300nt_100_windows.txt

      The resulting file should have window counts in this order:
      sample_A/weighted_summaries_by_window/CpG.bed
      sample_B/weighted_summaries_by_window/CpG.bed
      sample_A/weighted_summaries_by_window/CHG.bed
      sample_B/weighted_summaries_by_window/CHG.bed
      sample_A/weighted_summaries_by_window/CHH.bed
      sample_B/weighted_summaries_by_window/CHH.bed
    3. Compare counts across two samples in the same context.
      compare_methylation_counts_by_window.R inputCountsFile outputStatsFile
      Usage:
      compare_methylation_counts_by_window.R
      sample_A_vs_sample_B_reads.300nt_100_windows.txt
      sample_A_vs_sample_B_reads.300nt_100_windows.stats.txt"
      For each window, the output file will have (a) the raw Fisher's exact test p-value reflecting whether the fraction of meth/unmeth counts is the same for both samples, and (b) the difference (sample A-sample B) in weighted methylation fraction. Methylation fractions appear next to the results of each statistical calculation.

  18. Overlap genomic features (genes, transposable elements, etc.) with the windows
    1. Open the stats file in Excel and sort based on p value.
    2. Copy the rows representing selected DMRs [(with a FDR below your threshold (0.01) and
      with a methylation difference above your threshold (e.g. 35%)] into another file and delete all but the desired 5 columns (chr, start, end, difference, FDR value).
    3. Save the file with the extension “bed” and intersect with genomic features using Bedtools (Quinlan and Hall, 2010).
      Run:
      bedtools intersect -wao -a sample_A_vs_sample_B.CpG.bed -b genome_GFF3_genes.gff > intersect_ sample_A_vs_sample_B_genes.txt
      bedtools intersect -wao -a sample_A_vs_sample_B.CpG.bed –b genome_GFF3_transposable_element.gff > intersect_ sample_A_vs_sample_B.CpG.bed _TE.txt

Representative data



Figure 1. 1.2% gel with sheared DNA (250 ng). Expected size range is 100-300 bp.


Figure 2. Bioanalyzer results. Libraries with higher concentration perform better in Illumina sequencing. Bioanalyzer analysis of libraries before sequencing is an important quality control step. Both libraries were in the expected size range (as indicated by blue bracket in panel A) and therefore suitable for sequencing. Library 1 (with higher molarity) was subjected to an additional clean up step using AMPure XP beads in order to reduce the amount of residual adapters (indicated by a red arrow in panel A, and panel B). Library 2 (panel C) was not cleaned up because it was at a low concentration and additional clean up might have caused loss of enough library such that sequencing would not be possible. Sequencing resulted in 12,489,378 and 3,824,500 total non-redundant reads for library 1 and 2, respectively.

Recipes

  1. 50x TAE (1 L)
    242 g Tris base
    57.1 ml glacial acetic acid
    100 ml 0.5M EDTA (pH 8)

Acknowledgements

This work was supported by the NSF (MCB 1121952) and an award to MG from The Pew Charitable Trust’s Pew Scholars Program in the Biomedical Sciences.
This protocol was adapted from Pignatta et al. (2014).

References

  1. Groszmann, M., Greaves, I. K., Albertyn, Z. I., Scofield, G. N., Peacock, W. J. and Dennis, E. S. (2011). Changes in 24-nt siRNA levels in Arabidopsis hybrids suggest an epigenetic contribution to hybrid vigor. Proc Natl Acad Sci U S A 108(6): 2617-2622.
  2. Krueger, F. and Andrews, S. R. (2011). Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27(11): 1571-1572.
  3. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics 25(16): 2078-2079.
  4. Quinlan, A. R. and Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6): 841-842.
  5. Schultz, M. D., Schmitz, R. J. and Ecker, J. R. (2012). 'Leveling' the playing field for analyses of single-base resolution DNA methylomes. Trends Genet 28(12): 583-585.
  6. Pignatta, D., Erdmann, R. M., Scheer, E., Picard, C. L., Bell, G. W. and Gehring, M. (2014). Natural epigenetic polymorphisms lead to intraspecific variation in Arabidopsis gene imprinting. Elife 3: e03198.

简介

该协议描述了植物组织的全基因组亚硫酸氢盐测序文库制备和随后的数据分析。 等位基因特异性甲基化分析和差异甲基化区域的全基因组鉴定是分析程序的附加特征。

第I部分。全基因组亚硫酸氢盐测序文库制备

材料和试剂

  1. RNA酶处理的DNA(至少1μg,体积不应超过130μl)
  2. AMPure XP珠(Beckman Coulter,目录号:A63880)
  3. 200标准乙醇(Sigma-Aldrich,目录号:459844-500ML)
  4. Illumina TruSeq试剂盒(Illumina,目录号:FC-121-2001)
  5. Methylcode Invitrogen试剂盒(Life Technologies,Invitrogen TM ,目录号:MECOV50)
    注意:可以使用其他亚硫酸氢盐转换套件,但未在此协议中测试。
  6. T4 DNA连接酶(NEB,目录号:M0202T)
  7. Pfu Cx Hotstart DNA聚合酶(Agilent,目录号:600412)
  8. TOPO Blunt克隆试剂盒(Life Technologies,Invitrogen TM ,目录号:450245)
  9. TOPO TA克隆试剂盒(Life Technologies,Invitrogen TM ,目录号:450030)
  10. Ex Taq DNA聚合酶(TaKaRa,Clontech,目录号:RR001B)
  11. 1 Kb-plus DNA ladder(Life Technologies,Invitrogen TM,目录号:10787-018)
  12. 琼脂糖
  13. 1x TAE缓冲液(见配方)

设备

  1. 琼脂糖凝胶电泳仪
  2. Covaris microTUBE预备站(Covaris,部件号:500142)
  3. Covaris microTUBE AFA光纤预裂缝快速盖(6 x 16毫米)(Covaris,部件号:520045)
  4. 低保留1.5ml管(Thermo Fisher Scientific,目录号:02-681-320)
  5. TempAssure PCR 8-Strips(USA Scientific,目录号:1402-4700)
  6. 用于1.5ml管的磁力架(DynaMag TM sup-2磁体,Life Technologies,Invitrogen TM ,目录号:12321D)
  7. 聚焦超声波发生器(Covaris,S系列,型号:S220)
  8. 微量离心机(VWR International,目录号:93000-204)
  9. 热循环仪

程序

  1. 开始前准备试剂
    1. 制作新鲜80%乙醇。
      提示:使用新鲜的80%乙醇进行Ampure珠纯化很重要。
    2. 摇动Agencourt AMPure XP瓶重悬珠。 180μl 每个样品需要珠子(1.4:1珠子与DNA的比率)。 放在 室内温度。

  2. 用Covaris超声仪剪切DNA
    1. 填充容器用去离子水到12.5在毕业填充线标签
    2. 打开机器,冷却器(设置温度为+3°C)和计算机。
    3. 启动应用软件(Covaris SonoLab 7)。 推动脱气 按钮。 在使用仪器前至少吹扫30分钟。
    4. 设置:峰值功率175 W,占空比10,周期/突发200,时间6 min,温度6℃。 注意机器温度上升   在使用中,这可能干扰剪切循环。 因此, 机器初始设置为3°C,作为预防措施
    5. 将130μlDNA样品(约10 ng /μl)转移到Covaris microTUBE中,简单旋转并进行剪切。
    6. 用剪切的DNA(250ng)和未剪切的DNA旁边运行1.2%凝胶 侧评价剪切。 样品也可以在Bioanalyzer上运行。 预期大小范围为100-300 bp。 查看代表性数据图1.

  3. DNA纯化(AMPURE珠)
    请参阅Agencourt AMPure XP PCR纯化 - 使用说明(B37419AA)
    https://www.beckmancoulter.com/wsrportal/techdocs?docname=B37419AA)。
    1. 将管放入Covaris microTUBE准备站,并取下盖。 将130μl样品转移到低DNA结合微量离心管中。 加 180μl重悬的珠子,通过吸移混匀。
    2. 在室温下孵育5分钟。
    3. 将管放置在磁力架上2分钟,以从溶液中分离珠子
    4. 从管中吸出清除的溶液并丢弃。 不要打扰珠子。
    5. 将250μl的80%乙醇分配到试管中,并在室温下孵育30秒。
    6. 吸出乙醇并弃去。 重复进行两次洗涤。
    7. 让珠子空气干燥5分钟。
      提示:小心不要过度干燥珠粒(珠粒似乎破裂 如果过干),因为这将显着降低洗脱效率。
    8. 从磁性支架上取下微量离心管。 加入52μl 水至珠子,并通过吸移重悬10次。 在室温下孵育2分钟。
    9. 将管放回 磁性支架。 转移50微升的透明上清液到新的PCR   管而不干扰珠子

  4. 纯化剪切的DNA的末端修复(Illumina TruSeq试剂盒)
    1. 将热水器预热至30°C
    2. 向步骤C9的剪切的DNA中加入10μl重悬缓冲液和40μl末端修复混合物
    3. 调整移液器至50微升,然后轻轻地吸取整个音量上下10次
    4. 在30℃孵育30分钟。

  5. DNA纯化(AMPure珠)
    1. 涡旋AMPure珠,直到它们分散良好
    2. 通过将125μl珠子与35μlPCR级水混合,制备稀释的珠子混合物
    3. 将步骤D4(100μl)的整个体积加入到160μl稀释的珠子中
    4. 调整移液器至200微升,然后轻轻地吸取整个音量上下10次
    5. 在RT孵育10分钟。
    6. 将管在磁力架上放置5分钟。
    7. 从每个管除去和丢弃上清液,而不打扰珠。
    8. 用磁力架上的管,加入200μl新鲜制备的 80%乙醇而不扰乱珠。 孵化30秒。
    9. 从每个管移除并丢弃所有上清液,而不干扰珠。 重复乙醇洗涤一次。
    10. 让管在室温下干燥5-8分钟。 从中取出管 并用17.5μl重悬浮液再悬浮干燥的沉淀 来自Illumina TruSeq试剂盒的缓冲液。 轻轻地吸取整个音量   并向下10次以彻底混合
    11. 在室温下孵育管2分钟。
    12. 将管在磁力架上在室温下放置3分钟。
    13. 将15μl清澈的上清液转移到新的PCR管中,不会影响珠子 安全停止点DNA可以在-20°C下储存长达7天。

  6. 腺苷酸3'末端(Illumina TruSeq kit)
    1. 从-20°C取出拖尾混合物,并在RT下解冻
    2. 在37°C预热热循环仪。
    3. 向步骤E13的管中加入2.5μl重悬缓冲液和12.5μlA尾混合物
    4. 调整移液器至30微升,然后轻轻地吸取整个音量上下10次
    5. 在37℃孵育30分钟。
    6. 立即进行适配器连接。

  7. 管道适配器(Illumina TruSeq套件)
    Illumina TruSeq DNA衔接子,其含有5-甲基胞嘧啶而不是胞嘧啶,在50μl反应中连接。
    1. 从-20°C取出适配器索引管并在RT解冻
    2. 根据指示合并样品和试剂。
      DNA样品(来自步骤F6)
      30微升
      适配器
      2.5μl
      T4 DNA连接酶(2,000,000U/ml)
      2.5μl
      10x T4连接酶缓冲液与ATP
      5微升

      10微升
    3. 在16℃孵育过夜。

  8. DNA纯化(AMPure珠)
    1. 涡旋AMPure珠,直到它们分散良好
    2. 向样品中加入42.5μl的珠子
    3. 调整移液器至85微升,然后轻轻地吸取整个音量上下10次
    4. 在RT孵育10分钟。
    5. 将管在磁力架上放置5分钟。
    6. 从每个管中取出并弃去80μl上清液
    7. 用磁力架上的管,加入200μl新鲜制备的 80%乙醇而不扰乱珠。 孵化30秒。
    8. 从每个管移除并丢弃所有上清液,而不干扰珠。 重复乙醇洗涤一次。
    9. 让试管在室温下干燥8分钟。 从管中取出管 并用22.5μl重悬缓冲液重悬干燥的沉淀。 轻轻吸取整个音量上下10次,彻底混合。   在室温下孵育2分钟。
    10. 将管在磁力架上在室温下放置5分钟。
    11. 将20μl的澄清上清液转移到新试管中,不会影响珠子 安全停止点。 DNA可以在-20°C储存长达7天。

  9. 亚硫酸氢盐处理
    1. 使用MethylCode™Bisulfite Conversion Kit Invitrogen 制造商的协议。
      http://tools.lifetechnologies.com/content/sfs/manuals/methylcode_bisulfite_man.pdf
    2. 洗脱亚硫酸氢盐处理的DNA在10μl。

  10. 文库富集PCR
    在两个PCR反应的每一个中使用3μl(来自步骤I2)作为模板 热循环程序:
    95℃2分钟
    12-15个循环*:
    95℃20秒
    60℃,30秒
    72℃1分钟
    72℃7分钟
    *保持循环数尽可能低,以减少PCR重复。
    PCR主混合物(50μl反应)
    10x Pfu聚合酶缓冲液 5微升
    10 mM dNTPs
    1微升
    PCR引物混合物(Illumina TruSeq试剂盒)
    5微升
    Pfu Cx Hotstart DNA聚合酶
    1微升

    35μl

  11. 图书馆净化
    1. 新鲜80%乙醇。
    2. 涡旋AMPure珠,直到它们分散良好
    3. 向样品中加入50μl珠子。
    4. 调整移液器至85微升,然后轻轻地吸取整个音量上下10次
    5. 在RT孵育10分钟。
    6. 将管在磁力架上放置5分钟。
    7. 从每个管中取出并弃去80μl上清液
    8. 用磁力架上的管,加入200μl新鲜制备的 80%乙醇而不扰乱珠。 孵化30秒。
    9. 从每个管移除并丢弃所有上清液,而不干扰珠。 重复乙醇洗涤一次。
    10. 让试管在室温下干燥8分钟。 从管中取出管   并用16μl水再悬浮干燥的沉淀。 轻轻吸 整个音量上下10次,彻底混合。 在室温孵育 2分钟。
    11. 将管在磁力架上在室温下放置5分钟。
    12. 将15μl的清澈上清液转移到新试管中,不影响珠子 安全停止点。 DNA可以在-20°C储存长达7天。

  12. 库验证
    1. 主题库在测序前在Bioanalyzer上进行质量控制。   文库应该有250到400 bp的大小范围 衔接子二聚体峰(如果存在)应小于文库的10%。 参见图2中的代表性数据
    2. 将1-2μl克隆到TOPO Blunt中并测序几个克隆,以检查衔接子是否如预期的那样连接
    3. 亚硫酸氢盐转化检查点:在植物中,叶绿体基因组 预期是未甲基化的。 从文库扩增叶绿体DNA   通过PCR。 按照制造商的说明克隆到TOPO TA克隆试剂盒中 方案和序列通过标准Sanger测序。 每个C在 扩增的PCR产物应当作为T转换和测序 模板:2μl文库
      10x Ex Taq缓冲液(Mg <2> + +) 5微升
      10 mM dNTPs
      1微升
      10μMfor Oligo
      2.5μl
      10μMRev Oligo
      2.5μl
      TaKaRA Ex taq(5单位/μl)
      0.5μl

      36.5微升
      叶绿体DNA的寡聚体(Groszmann等人,2011):
      对于:5'-ATGATGTTGTTAGAATTTYATATAGG-3'
      Rev:5'-CATCATTTARCTATCRCAATTCTTT-3'
      热循环程序:
      95℃2分钟
      40个周期:
      95°C 15秒
      52°C 30秒
      72℃2分钟
      72℃10分钟

  13. 高通量测序
    在Illumina HiSeq 2500机器上的序列文库(〜10pM)。 配对末端或80bp单端读数允许更好地映射到基因组,但标准的40bp单端读数也是可接受的。 需要的测序深度取决于基因组的大小,可以使用Lander/Waterman方程计算。
    一般公式为:
    C = LN/G
    •C代表覆盖范围
    •G是单倍体基因组长度
    •L是读取长度
    •N是读取次数
    请参阅 http://support.illumina.com/downloads/sequencing_coverage_calculator.html 更多详情。

第二部分。 全基因组亚硫酸氢盐测序数据分析

设备

  1. 具有标准实用程序(包括Perl)的Linux计算机
    R( http://www.r-project.org/;仅需要基本安装)< br /> 俾斯麦( http://www.bioinformatics.babraham.ac.uk/projects/bismark/
    FastQC( http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
    FASTX工具包( http://hannonlab.cshl.edu/fastx_toolkit/
    SAMtools( http://samtools.sourceforge.net/
    bedtools( https://github.com/arq5x/bedtools2

程序

注意:对于过程中使用的所有脚本,请下载 此处

  1. 顺序读取的质量控制
    1. 将测序读数复制到本地计算机或服务器上的所需位置。
    2. 如果需要,解压缩和解压序列文件。
      gunzip file_name.txt.tar.gz
      tar xvfp file_name.txt.tar
    3. 使用fastqc(http://hannonlab.cshl.edu/fastx_toolkit/)对测序读取进行质量控制。
      用法:fastqc file_name.txt
    4. 检查fastqc报告以确定是否/如何过滤和/或修剪 读取。 丢弃适配器和低质量读取(低于75%的质量 分数高于25)。
      用法:fastq_quality_filter -q25 -p 75 -i file_name。 txt -o file_name_trimmed.fastq
    5. 重新运行质量控制以检查最终读取质量。
      用法:fastqc file_name_trimmed.fastq

  2. 将亚硫酸氢盐读数与Bismark对齐
    使用Bismark将读数与您选择的参考基因组对齐(Krueger和Andrews,2011) 我们强烈建议读者查阅Bismark作者的评论( http://www .ncbi.nlm.nih.gov/pubmed/22290186 ),然后开始数据分析。 在这里,我们描述单端读取的对齐,但成对末端读取的对齐也是可能的
    1. 准备亚硫酸氢盐处理的基因组以供将来绘制(只需要做一次) 用法:bismark_genome_preparation --path_to_bowtie reference_genome/fasta/
    2. 将BS-阅读映射到转化的基因组(分别映射正向和反向阅读) 用法:bismark --non_directional -o bismark_file_name_trimmed.n1l50 -n 1 -l 50 reference_genome file_name.trimmed.fastq
      选项如下:
      -non_directional:读取不是线程特定的(一个线程特定的选项也可用)
      -o输出目录
      -n最大错配数
      -l种子长度(与< n个不匹配映射的nt的第一个数)
      * fastq:fastq格式的短读取

    从这一点开始,分析会根据用于准备库的样本的基因型,执行不同的步骤。
    选项#1:来自单一近交系的文库

  3. 将Bismark对齐的读取从SAM转换为BAM,排序和索引
    cd bismark_file_name_trimmed.n1l50
    用法:SAM_to_BAM_sort_index.pl file_name.fastq_bismark.sam
    预期的输出文件:file_name.fastq_bismark.sorted.bam

  4. 将已排序的BAM文件转换为SAM格式
    用法:samtools view -h file_name.fastq_bismark.sorted.bam> file_name_sorted.sam Samtools(Li ,,2009)

  5. 消除冗余读取
    使用排序的SAM文件,每个链只保留一个序列映射到相同的开始和结束位置。 这消除了PCR重复。 将创建每个读取的计数的日志文件(SAM_redundancy_stats.log.txt)。
    用法:./make_sorted_SAM_non-redundant_fewest_MM.pl file_name_sorted.sam> file_name_sorted_nr.sam
    这是它的工作原理:
    该脚本读取排序的SAM文件,并获取映射到相同位置和链的所有读取,然后
    1. 通过减少流行率,然后通过增加的数量对它们进行排序 错配(不计亚硫酸氢盐转化)到基因组
    2. 最流行的读取与总最高质量字符串保持。
    3. 在绑定的情况下,保留具有最少数量的不匹配的读取
    4. 在另一个领带的情况下,它打印每个独特的读取 输出的SAM文件还在末尾添加了3个新字段:
      CT == count =在输入文件
      中找到此读取的次数 MM == mismatches =与基因组序列相比的错配数(不计算转换)
      QS ==质量得分=此读取的总质量得分

  6. 计算每个胞嘧啶的甲基化值
    在排序的SAM文件上运行Bismark的补充脚本"bismark_methylation_extractor"。
    用法:bismark_methylation_extractor -s file_name_sorted_nr.sam -o bismark_output
    此脚本创建12个输出文件,在三个不同的上下文(CpG,CHG,CHH)中具有每个C的甲基化状态,具有4个映射:
    OT - 原始顶部股线
    CTOT - 与原顶链互补
    OB - 原底线
    CTOB - 与原底部链互补
    根据上下文调用甲基化的速度:
    z:CpG语境中的未甲基化C/
    Z:CpG语境中的甲基化C/
    x:CHG语境中的未甲基化C/
    X:CHG语境中的甲基化C/
    h:CHH语境中的未甲基化C H:CHH语境中的甲基化C/

  7. 计算亚硫酸氢盐转化效率
    基于每个胞嘧啶的甲基化状态计算每个文库的平均亚硫酸氢盐转化率,所述甲基化状态来自映射到叶绿体基因组的读数,其预期是未甲基化的。
    1. 运行用法:./methylation_summarizer_1.pl bismark_output> bismark_SampleName.summary_step1.txt
    2. 获得关于从每个读取映射到叶绿体的每个核苷酸的甲基化状态的信息 用法:grep ChrC bismark _ * _ trim3.n1l50.summary_step_1.txt> bismark _ * _ trim3.n1l50.summary_step_1.chloroplast.txt./methylation_summarizer_2_conversion.pl
      bismark _ * _ trim3.n1l50.summary_step_1.chloroplast.txt> bismark _ * _ trim3.n1l50.conversion.txt
      输出文件包含每个位置的转换,平均值为C  跨叶绿体的转化被印刷到屏幕上。

    选项#2:来自F
  8. 从混合库读取地图
    在F sub混合文库的情况下,两个亲本基因组可用作参考。您可以决定将读取映射到父母基因组,以使使用Bismark的映射读取的数量最大化,如上所述。为此,首先将所有读数映射到最佳参考基因组,然后将"未映射"读数与其他基因组比对。
    将来自fastq文件的未映射读取的文件(例如,不在SAM文件中读取)。
    用法:./get_fastq_reads_not_in_SAM.pl fastqFile samFile> reads.unmapped.fastq
    为了将读取分配给特定的菌株并保留尽可能多的独特读取,用链分开读取 脚本:split_bismark_SAM_by_read_strand.pl
    用法:split_bismark_SAM_by_read_strand.pl file_name_bismark.sam

  9. 通过父源(等位基因特异性DNA甲基化分析)对读取进行分类
    根据它们的原始亲本用其中忽略两个感兴趣的基因组之间的C> T SNP,而保留所有其它SNP的床型,对正向读取进行分类,并且用床型分类反向读取,其中G> A SNP之间的两个感兴趣的基因组被忽略,但所有其他单核苷酸多态性保留。只有一个bedfile,但是必须指定read strand(pos或neg)。
    脚本:split_bismark_SAM_by_read_strand.pl
    用法:split_bismark_SAM_by_read_strand.pl file_name_bismark.sam
    预期的输出文件:file_name_bismark.pos.sam,file_name_bismark.neg.sam
    脚本:classify_bismark_reads_by_parent.one_strand.pl
    用法:classify_bismark_reads_by_parent.one_strand.pl file_name_bismark.pos.sam SNPs.bed pos> file_name_bismark.pos_class.sam"
    用法:classify_bismark_reads_by_parent.one_strand.pl file_name_bismark.neg.sam SNPs.bed neg> file_name_bismark.neg_class.sam"
    读数基于其在已知SNP位置的序列分类。 分类后,每个类的冗余读取被消除。
    要区分4种情况:
    1. ST:Z:产妇
    2. ST:Z:父亲
    3. ST:Z:NE表示没有任何基因的证据
    4. ST:Z:两者都是(冲突数据)的证据
      awk -F"\ t"'$ 19 =="ST:Z:maternal"{print $ 0}'file_name_bismark.pos_class.sam> motheral.sam
      awk -F"\ t"'$ 19 =="ST:Z:paternal"{print $ 0}'file_name_bismark.pos_class.sam> paternal.sam
      awk -F"\ t"'$ 19 =="ST:Z:NE"{print $ 0}'file_name_bismark.pos_class.sam> NE.sam
      awk -F"\ t"'$ 19 =="ST:Z:both"{print $ 0}'file_name_bismark.pos_class.sam> both.sam
      使用file_name_bismark.neg_class.sam
      运行类似的命令

  10. 合并SAM文件
    示例:cat file_name_neg.sam file_name_pos.sam> file_name_pos_neg.sam
    在检查合并文件的大小后,您可以删除个别的pos和neg文件。

  11. 将标题前缀到sam文件
    cat header file_name_pos_neg.sam> file_name_pos_neg.header.sam

  12. 将SAM转换为BAM,排序和索引
    SAM_to_BAM_sort_index.pl file_name_pos_neg.header.sam
    预期的输出文件:file_name_pos_neg.sorted.bam

  13. 将已排序的BAM文件转换为SAM格式
    samtools view -h file_name_pos_neg.sorted.bam> file_name_pos_neg.sorted.sam

  14. 消除冗余读取(请参阅步骤5)
    脚本:make_sorted_SAM_non-redundant_fewest_MM.pl
    用法:make_sorted_SAM_non-redundant_fewest_MM.pl file_name_pos_neg.sorted.sam> file_name_pos_neg.sorted_nr.sam

  15. 计算每个胞嘧啶的甲基化值(参见步骤6)
    运行甲基化提取器为每组分类读取以及所有读取组合。
    用法:methylation_extractor -s file_name_pos_neg.sorted_nr.sam
    使用cat命令将所有nr.sam文件组合成一个"allreads_nr.sam"文件,然后再次运行methylation_extractor。

  16. 总结基因组窗口的甲基化状态
    对于每个类,在输出文件文件夹中组织12个甲基化提取器文件。将基因组划分为300 nt(windowWidth)窗口,重叠100 nt(windowOverlap)。

    脚本:make_genome_windows_bed.pl(需要每个染色体及其长度的文件)
    用法:make_genome_windows_bed.pl chromInfo.txt windowWidth windowOverlap> genome_300nt_100_windows.bed

    脚本:Summarize_by_window.sh
    用法:bash Summarize_by_window.sh methylation_outputfiles genome_300nt_100_windows.bed

    Summarize_by_window.sh特征:该脚本通过将处理的Bismark甲基化提取器输出文件转换成一组床文件并确定加权甲基化值,来概述在定义大小的重叠基因组窗口中的甲基化状态,如Schultz等人(2012)。在基因组浏览器中查看的Bedgraph文件在bedgraph文件夹中创建。通过将甲基化串转化为曱基化计数,甲基化计数和甲基化百分比,以BED样形式(scorePerPos文件夹)产生输出,总结由chr位置(排序后)输出的Bismark甲基化提取器。文件夹weighted_summaries_by_window有三个bed文件,它合并每个窗口的站点。对于每个窗口,它提供计数和加权百分比甲基化
  17. 识别差异甲基化区域(DMR)
    每个站点至少需要5读权限。通过计算所有序列上下文中的每个窗口的样品之间的差异(加权甲基化部分的样品A-样品B)和置信度(来自Fisher's精确检验的p值)来分析差异甲基化。 P值用Benjamini和Hochberg假发现率(FDR)校正。 CG和CHG DMRs被定义为在窗口之间具有3个信息性Cs的最小重叠的区域,例如,至少35的加权甲基化差异和校正的p值< 0.01。 CHH DMRs被定义为具有最少10个重叠的信息胞嘧啶的窗口,并且例如,加权甲基化差异为至少10, 0.01。用户可以使用自己的标准定义DMR 要比较两个样本中的mC窗口计数,需要以下脚本:
    merge_bedgraph_data_counts_etc.pl
    compare_methylation_counts_by_window.R
    1. 在2个上下文中使用'weighted_summaries_by_window'文件夹中的文件创建2个样本的计数矩阵。
    2. 示例:
      ./merge_bedgraph_data_counts_etc.pl genome_300nt_100_windows.bed
      sample_A_reads/weighted_summaries_by_window/CpG.bed
      sample_B_reads/weighted_summaries_by_window/CpG.bed
      sample_A_reads/weighted_summaries_by_window/CHG.bed
      sample_B_reads/weighted_summaries_by_window/CHG.bed
      sample_A_reads/weighted_summaries_by_window/CHH.bed
      sample_B_reads/weighted_summaries_by_window/CHH.bed>
      sample_A_vs_sample_B_reads.300nt_100_windows.txt

      生成的文件应按以下顺序显示窗口计数:
      sample_A/weighted_summaries_by_window/CpG.bed
      sample_B/weighted_summaries_by_window/CpG.bed
      sample_A/weighted_summaries_by_window/CHG.bed
      sample_B/weighted_summaries_by_window/CHG.bed
      sample_A/weighted_summaries_by_window/CHH.bed
      sample_B/weighted_summaries_by_window/CHH.bed
    3. 比较同一上下文中两个样品的计数 compare_methylation_counts_by_window.R inputCountsFile outputStatsFile
      用法:
      compare_methylation_counts_by_window.R
      sample_A_vs_sample_B_reads.300nt_100_windows.txt
      sample_A_vs_sample_B_reads.300nt_100_windows.stats.txt"
      对于每个窗口,输出文件将具有(a)原始Fisher精确 测试p值反映甲基/未甲基化计数的分数是否 两个样品相同,和(b)差异(样品A-样品B)   加权甲基化分数。 甲基化级分出现在旁边   每个统计计算的结果。

  18. 用窗口
    重叠基因组特征(基因,可转座元素,等)
    1. 在Excel中打开stats文件,并根据p值排序
    2. 复制表示所选DMR的行[(FDR低于阈值(0.01)和
      甲基化差异高于您的阈值(例如 35%)] 另一个文件,并删除除所需的5列(chr,start,end, 差值,FDR值)。
    3. 使用扩展"bed"保存文件,并使用Bedtools与基因组特征相交(Quinlan和Hall,2010)。
      运行:
      bedtools intersect -wao -a sample_A_vs_sample_B.CpG.bed -b gene_GFF3_genes.gff> intersect_ sample_A_vs_sample_B_genes.txt
      bedtools intersect -wao -a sample_A_vs_sample_B.CpG.bed -b gene_GFF3_transposable_element.gff> 相交_ sample_A_vs_sample_B.CpG.bed _TE.txt

代表数据



图1. 1.2%凝胶与剪切的DNA(250ng)。 预期大小范围为100-300 bp。


图2.生物分析仪结果。在Illumina测序中,具有更高浓度的文库表现更好。生物分析仪在测序之前对文库进行分析是一个重要的质量控制步骤。两个文库都处于预期的大小范围(如图A中的蓝色括号所示),因此适合测序。使用AMPure XP珠对文库1(具有较高的摩尔浓度)进行额外的清除步骤,以减少残余衔接子的量(由图A中的红色箭头和图B表示)。文库2(图C)未被清除,因为其处于低浓度,并且额外的清除可能导致损失足够的文库,使得测序不可能。测序分别得到文库1和2的总共12,489,378和3,824,500个非冗余读数。

食谱

  1. 50x TAE(1 L)
    242克三碱基
    57.1ml冰醋酸 100ml 0.5M EDTA(pH8)

致谢

这项工作得到了NSF(MCB 1121952)的支持,并获得了皮尤慈善信托基金会在生物医学科学奖学金计划的MG奖项。
该协议改编自Pignatta等人(2014)。

参考文献

  1. Groszmann,M.,Greaves,I.K.,Albertyn,Z.I.,Scofield,G.N.,Peacock,W.J.and Dennis,E.S.(2011)。 拟南芥杂交体中24-nt siRNA水平的变化表明表观遗传贡献 到杂交活力。 Proc Natl Acad Sci USA 108(6):2617-2622。
  2. Krueger,F。和Andrews,S.R。(2011)。 Bismark:用于Bisulfite-Seq应用程序的灵活对齐器和甲基化调用者。 Bioinformatics 27(11):1571-1572。
  3. Li,H.,Handsaker,B.,Wysoker,A.,Fennell,T.,Ruan,J.,Homer,N.,Marth,G.,Abecasis,G.and Durbin,R。 序列比对/地图格式和SAMtools 生物信息学< em> 25(16):2078-2079。
  4. Quinlan,A.R。和Hall,I.M。(2010)。 BEDTools:用于比较基因组特征的灵活的实用程序套件。生物信息学 26(6):841-842。
  5. Schultz,M.D.,Schmitz,R.J.and Ecker,J.R。(2012)。 "调平"单碱基分辨DNA甲基化分析的竞争环境。 em> Trends Genet 28(12):583-585。
  6. Pignatta,D.,Erdmann,R.M.,Scheer,E.,Picard,C.L.,Bell,G.W.and Gehring,M.(2014)。 天然表观遗传多态性导致拟南芥基因印记中的种内变异。 3:e03198。
  • English
  • 中文翻译
免责声明 × 为了向广大用户提供经翻译的内容,www.bio-protocol.org 采用人工翻译与计算机翻译结合的技术翻译了本文章。基于计算机的翻译质量再高,也不及 100% 的人工翻译的质量。为此,我们始终建议用户参考原始英文版本。 Bio-protocol., LLC对翻译版本的准确性不承担任何责任。
Copyright: © 2015,  Pignatta et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
引用: Readers should cite both the Bio-protocol article and the original research article where this protocol was used:
  1. Pignatta, D., Bell, G. W. and Gehring, M. (2015). Whole Genome Bisulfite Sequencing and DNA Methylation Analysis from Plant Tissue. Bio-protocol 5(4): e1407. DOI: 10.21769/BioProtoc.1407.
  2. Pignatta, D., Erdmann, R. M., Scheer, E., Picard, C. L., Bell, G. W. and Gehring, M. (2014). Natural epigenetic polymorphisms lead to intraspecific variation in Arabidopsis gene imprinting. Elife 3: e03198.
提问与回复

(提问前,请先登录)bio-protocol作为媒介平台,会将您的问题转发给作者,并将作者的回复发送至您的邮箱(在bio-protocol注册时所用的邮箱)。为了作者与用户间沟通流畅(作者能准确理解您所遇到的问题并给与正确的建议),我们鼓励用户用图片或者视频的形式来说明遇到的问题。由于本平台用Youtube储存、播放视频,作者需要google 账户来上传视频。

当遇到任务问题时,强烈推荐您提交相关数据(如截屏或视频)。由于Bio-protocol使用Youtube存储、播放视频,如需上传视频,您可能需要一个谷歌账号。