SNp mining in crops

SNP Mining in Cereal Crops
Presented By:
SAURABH PANDEY
PALB-3252
Sr. M.Sc.(Ag.)

Seminar flow
• Introduction
• What is SNPs
-Detection methods,
-Techniques used for detection
• Different software used
• Crop lists where SNPs were detected
• Improvement works done
• Case Study
09-05-2015 Department of Plant Biotechnology 2

Introduction

Scenario of Molecular Markers

Molecular markers
First Generation
Second Generation
Third Generation
•RFLP-
Hybridization
based
•RAPD-PCR based
•AFLPs
•SSRs
•ESTs-used in
functional
genomics
•SNPs

Single
Nucleotide
Polymorphisms
DNA sequence variations that occur when a single
nucleotide (A, T, C or G) in the genome sequence is
altered.
SNP: Single DNA base variation found >1%
Mutation: Single DNA base variation found <1%
SNPs

Coding SNPs cSNP Positions that fall within the coding
regions of genes
Regulatory SNPs rSNP Positions that fall in regulatory
regions of genes
Synonymous SNPs sSNP Positions in exons that do not change
the codon to substitute an amino
acid
Non-synonymous SNPs nsSNP Positions that incur an amino acid
substitution
Intronic SNPs iSNP Positions that fall within introns
SNP functional classes
Mooney et al, 2005

SNPs Importance
Abundant in the genome and frequency of SNPs is higher i.e.
 Maize has 1SNP/31bp in non-coding, 1SNP/124bp in coding
region (Ching et al, 2002)
 In rice whole genome shotgun sequences for japonica and indica
3-27 SNP/kb
In Soybean 280SNPs/76.3kb of genomic DNA
Automation is easy. Due to biallelic nature of these molecular
markers and these are directly based on DNA sequence.
• Millions of sequences are available in public database for
several crops i.e. wheat has 132,000 ESTs,
• Till now no. of SNPs has been discovered i.e. Barley have
29447 SNPs , Rice 64837 SNPs
(Yu et.al, 2005)
(Zhu et al,2003)

SNPs Application
• Comparative mapping is performed using datbases.
• Linkage Disequilibrium based Association Studies- eg. Arabidopsis
thaliana
• Marker Assisted Selection(MAS)- used in soybean by identification of SNP
markers with GmNARK gene. Gene indicates hypernodulating mutation
(Kim et al, 2005)
• Genetic diversity studies- In maize using SNPs at 21 loci in chromosome1
(Tenalillon et al, 2005)
• High resolution genetic mapping -5 SNPs are genetically mapped in melon
(Morales et al, 2004)
• Mapping of EST sequences can be done with SNPs(Davis G et al, 2001)
• Identification of cloned genes with SNPs- Secondary application of SNPs
(Jehan and Lakhanpaul, 2006)

Crop Gene function Trait associated
with SNP
Utility of SNP
Barley (Hordeum
vulgare)
B-Amylase gene Degradation of
starch
Enzyme
thermostability
To select barley seedling carying
superior allele of B-Amylase
Wild Barley (H.
spontaneum)
Dhn1 & Dhn5
(Dehydrin)
Adaptive response of
plant to enviromental
stress
Resistance to water
stress
For water stress adaptation
Rice (Oryza
sativa)
1) Wx (waxy)
gene
2) Sd-1(semi-
dwarfing
)gene
1) Control amylose
synthesis by coding
starch synthase
enyme
2) Dwarfism
1) Amylose content
2) Dwarfism
1) For development of new
cultivar
2) Selection of sd-1 in breeding
programme
Wheat (Triticum
aestivum )
1) Pin b
(Puroindolin
b)
2) Rht 1 & Rht
2 gene
1) Thicken the coat
2) Dwarfism
1) Grain hardiness
2) Dwarfism
Breeding program
Soybean (Glycine
max)
Rhg 1& Rhg 4 Soyabean Cyst
nematode resistance
allele
SCN resistance Breeding programme
Onion (Allium
cepa)
SNP allele in
Plastosome
Responsible for
CMS
Cytoplasmic male
sterility and fertility
For development of CMS lines
Musturd (Brassica
juncea )
FAE 1 gene Fatty acid elongase Erucic acid content Breeding programme

CROP Citation No.of SNPs detected
Alfalfa Li et al 2012 872.384
Cotton Bayers et al 2012
Zhu et al 2014
151,712
40,503
Peanut Khera et al 2013
Zhou et al 2014
8,486
1,765
Potato Uitdewilligen
et al., 2013
42,625
Rapeseed Trick et al 2009
Hu et al 2012
Huang et al 2013
41,593
655
892,803
Wheat Allen, 2013
Cavanagh et al., 2013
10,251
25,454
White Clover Nagy et al.,2013 208,854
SNP identification in polyploids

 Approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were
discovered when aligned to the reference genome of the temperate japonica variety,
Nipponbare.
 Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa
gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and
temperate japonica.

The trends of crop sequencing toward breeding practice
Yang et al., 2015

SNPs Discovery
• Two approaches have been adopted for the
discovery of Novel SNPs
 In- vitro Discovery (new sequence data is generated)
 In-silico methods (analysis of available sequence
data)

In vitro approaches
Non sequencing
based methods
Sequencing
based methods Resequencing
based methods

Non sequencing methods
Restriction based techniques DNA conformation technique
RFLP (Restriction Fragment
Length Polymorphism)
CAPS(Cleaved Amplified
Polymorphic Sequence)
dCAPS(Derived Cleaved
Amplified Polymorphic
Sequence
Chip Based methods
Use of probes for hybridization of
whole genome
SSCP (Single Strand Conformational
Polymorphism)
DGGE (Denaturing Gradient Gel
Electrophoresis)
TGGE (Temperature Gradient Gel
Electrophoresis)
Heteroduplex Analysis
TILLING
Target Induced Local Lesion IN Genome-
Cel I Endonuclease

Based on Sequencing methods
Locus specific PCR amplification
Alignment of available genomic sequences
Whole genome shotgun method
Overlapping region BACs and PACs
Reduced Representation Shotgun(RRS)

Finding SNPs: Sequence-based SNP Mining
RANDOM Sequence Overlap - SNP Discovery
GTTACGCCAATACAGGATCCAGGAGATTACC
GTTACGCCAATACAGCATCCAGGAGATTACC
Genomic
RRS
Library
Shotgun
Overlap
BAC
Library
BAC
Overlap
DNA
SEQUENCING
mRNA
cDNA
Library
EST
Overlap
Random
Shotgun
Align to
Reference
> 11 Million SNPs
G
C
Validated - 5..6 MILLON SNPS

Resequencing methods
Pyrosequencing Mass Array
•Sequence by synthesis
•Primer +dNTP  Primer+1N+ PPi
•APS + PPi  ATP
Adenosine Phospho Sulphate
•Luciferin + ATP  Oxyluciferien + light
• Based on MALDI-TOF MS
Matrix Assisted Laser
Desorption/Ionization –Time
of Flight Mass
Spectrophotometery
•Based on variation in
molecular wt. of 4 nucleotides

In silico methods
 Use of software for discovery of SNP from available
sequence database
Manually is not possible, to find out SNP or single
nucleotide difference large no. of sequences
Hence various software i.e. Phred, PolyBayes, autoSNPdb,
SNPserver, etc

Seal et al., 2014

Method for SNP
Identification
(Ganal et al.,2009)
Prerequisite Current
false
discovery
rate(%)
Specifics, Limitations
EST Sequence data Large number of
available EST-
sequences
15-50 Dependent on the expression level or need for normalized
libraries, difficulties in the discrimination of orthologous
from paralogous sequences, low sequence quality
Array analysis Unigene sets based
on
EST-sequences,
array
technology
>20 Not all SNPs identified, large genomes require complexity
reduction
Amplicon Resequencing Unigene sets based
on
EST-sequences,
amplification
primers for
many individual
genes
<5 High reliability but costly, detailed haplotype analysis
possible, many lines can be compared, allele frequency
data with pools of DNA
No Genomic Sequence and
Next Generation Sequencing
Technologies
Novel sequencing
technologies,
complexity
reduction methods,
bioinformatic tools
15-25 Generates large amounts of data, costly bioinformatics,
false discovery rate for genomes without full sequence is
relatively high
Genomic Sequence is available
either through conventional
sequencing or next generation
sequencing
Reference genome,
bioinformatic tools
<5-10 Small genomes can be fully sequenced and compared for
SNPs, for large genomes targeted approaches will be
necessary (e.g. exon capture and multiplex amplification)

SNP Genotyping
For SNPs detection there are more than 30 techniques
available. i.e. Molecular beacons, Padlock probe, Invader
assay etc
All are based on basic principles i.e.
Hybridization
Direct Sequencing
Allele specific primer extension
Single base extension
Endonuclease Cleavage / Ligation

• Illumina GoldenGate Assay
• Sequenom
• Affymetrix/GeneChip
• SNaPshot
• SNPlex
• Taqman
• Dye Terminator Sequencing
• Pyrosequencing (454)
• Reverse-nucleotide sequencing(Solexa)
• Sequencing by ligation (ABI)
SNPs Genotyping platforms

Crop Specific SNP
marker system
development
Non Sequencing
method
Resequencing
methods
Sequencing
methods
In silico method
Crop specific SNP
genotyping
platform
development
Application of SNP
genotyping
Oligonucleotide
ligation assay
Invader assay
Molecular
Beacons
Padlock probes
Marker Assisted
Selection
Gene tagging
Fine mapping Association studies

Case study

Introduction
• Genome analysis in bread wheat poses substantial challenges
• Single nucleotide polymorphisms (SNPs) represent the most frequent
type of genetic polymorphism and can therefore allow the development
of the highest density of molecular markers.
• Whole-genome Illumina paired read sequence data were generated from
16 Australian bread wheat varieties.
• After filtering to remove poor quality and clonal reads, a total of 13 642
million read pairs remained.
• Alignment of these read pairs to the wheat group 7 and 4AL chromosome
assemblies using strict parameters resulted in 3.05%, 3.76% and 3.43% of
read pairs mapping uniquely to chromosomes 7A, 7B and 7D, respectively.
• SNP calling using the SGSautoSNP pipeline predicted a total of 4 018 311
intervarietal SNPs.

Material and Methods
• SNP prediction
SNP predication was performed using SGSautoSNP (Lorenc et al., 2012),
with output in snp format for subsequent analysis and gff format for
presentation on a GBrowse genome viewer at www.wheatgenome.info.
• SNP matrix production and transition/transversion ratio analysis
The snp files generated by SGSautoSNP were parsed using a custom
Python script to generate the SNP matrix file
The transition/transversion ratio for each chromosome was calculated
based on bins of 500 SNPs using VCFtools.
• SNP density and gene analysis
The SNP density plots for each chromosome were generated using a
custom Python script that calculates relative density based on a window
size of 50 000 bp.

Genes identified as being in low-SNP-density regions were
compared with the Swissprot database using BLASTX (BLASTALL
2.2.6) with an E value cut-off 1e-5. The genes with minimum E-
value has been identified in low/high SNP density regions with
UniProtKB entry ID and protein names.
• Validation
A total of 22 SNPs were selected from the three group 7 reference
genomes for validation.
 PCR amplification of the 22 loci was performed using primers
designed to bind to conserved sequence surrounding the SNPs
The purified PCR products were Sanger-sequenced using Big-Dye
3.1 (PerkinElmer, Waltham, MA), using forward and reverse PCR
primers, and analysed using an ABI3730xl.

Results
Whole-genome Illumina paired read sequence data
After filtering 13 642 million read pairs remained
Alignment to the wheat group 7 and 4AL chromosome assemblies
3.05%, 3.76% and 3.43% of read pairs mapping uniquely to
chromosomes 7A, 7B and 7D, respectively
SNP calling using the SGSautoSNP pipeline
total of 4 018 311 intervarietal SNPs

• The majority of SNPs were identified on contigs which do not
form part of the syntenic builds and are predominantly within
intergenic regions.

Figure 2: Ts/Tv ratio across the 7A, 7B and 7D syntenic builds

Phylogenetic relationships of 16 Australian wheat varieties based on SNP data
obtained in this study.
SNP variation is
146171
SNP variation is
968088
Avg. no. of SNPs between
varieties 465278

Figure 3: SNP density across the 7A, 7B and 7D syntenic builds.09-05-2015 Department of Plant Biotechnology 38

• A total of 146 genes were predicted to be in low-SNP-density regions, representing
40, 27 and 79 genes on the A, B and D genomes, respectively.
 these genes include MADS box and Myb transcription factors, signal transduction
pathway genes, a sodium transporter, an ironresponsive transcription factor, a
potassium transporter, callose synthase, sucrose synthase and sugar transporters.
• A total of 14 genes were predicted to be in high-SNP-density regions, representing
10, 3 and 1 gene(s) on the A, B and D genomes, respectively.
 These genes include cellulose synthase, argonaute and ethylene response factors.

The SNPs from the recently published wheat Infinium array (Wang et al. , 2014) were
compared to those predicted by SGSautoSNP. A total of 850 SNPs were identified as
having a match on the group 7 chromosomes at the same position as predicted in our
study . Of these, 482 (57%) were classified as polymorphic single locus, 316 (37%) as
being polymorphic multilocus, while only 52 (6%) were monomorphic.09-05-2015 Department of Plant Biotechnology 40

Conclusion
• This study has revealed a vast number of polymorphisms
occurring within the chromosome 7 homoeologues of
hexaploid wheat among elite Australian varieties.
• This resource is publically available to assist additional
genetic analysis and breeding.
• Furthermore, observed patterns of SNPs across the
homoeologous group 7 chromosomes have provided insight
into the molecular consequences of the evolution and
selection that resulted in modern hexaploid wheat.

Summary
• SNP are future markers, having high density in
genome.
• Although thousands of SNP markers are widely
used in animal and human genome analysis, their
use in plants is still in its infancy.
• SNP mining can provide better understanding of
crops at the gene level , for the detailed analysis of
germplasm and ultimately for the efficient
management of genetic diversity within plant
breeding on a whole genome level.

Thank you

SNp mining in crops

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a SNp mining in crops

Semelhante a SNp mining in crops (20)

Último

Último (20)

SNp mining in crops