Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis

Katrina M Dlugosch, Zhao Lai, Aurélie Bonin, Josè Hierro, Loren H. Rieseberg

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 112430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios.

Original languageEnglish (US)
Pages (from-to)359-367
Number of pages9
JournalG3: Genes, Genomes, Genetics
Volume3
Issue number2
DOIs
StatePublished - 2013

Fingerprint

Centaurea
Metagenomics
Transcriptome
Alleles
Libraries
Genome
Biological Science Disciplines
Titanium
Population
Cluster Analysis
Software
Complementary DNA
Genotype
Observation

Keywords

  • 454 GS FLX
  • Allele clustering
  • Invasive species
  • Normalized ESTs
  • Titanium
  • Yellow starthistle

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Genetics(clinical)

Cite this

Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis. / Dlugosch, Katrina M; Lai, Zhao; Bonin, Aurélie; Hierro, Josè; Rieseberg, Loren H.

In: G3: Genes, Genomes, Genetics, Vol. 3, No. 2, 2013, p. 359-367.

Research output: Contribution to journalArticle

Dlugosch, Katrina M ; Lai, Zhao ; Bonin, Aurélie ; Hierro, Josè ; Rieseberg, Loren H. / Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis. In: G3: Genes, Genomes, Genetics. 2013 ; Vol. 3, No. 2. pp. 359-367.
@article{3f56d9af646d4ddcac09a4d25133a623,
title = "Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis",
abstract = "Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 112430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios.",
keywords = "454 GS FLX, Allele clustering, Invasive species, Normalized ESTs, Titanium, Yellow starthistle",
author = "Dlugosch, {Katrina M} and Zhao Lai and Aur{\'e}lie Bonin and Jos{\`e} Hierro and Rieseberg, {Loren H.}",
year = "2013",
doi = "10.1534/g3.112.003871",
language = "English (US)",
volume = "3",
pages = "359--367",
journal = "G3 (Bethesda, Md.)",
issn = "2160-1836",
publisher = "Genetics Society of America",
number = "2",

}

TY - JOUR

T1 - Allele identification for transcriptome-based population genomics in the invasive plant Centaurea solstitialis

AU - Dlugosch, Katrina M

AU - Lai, Zhao

AU - Bonin, Aurélie

AU - Hierro, Josè

AU - Rieseberg, Loren H.

PY - 2013

Y1 - 2013

N2 - Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 112430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios.

AB - Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 112430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios.

KW - 454 GS FLX

KW - Allele clustering

KW - Invasive species

KW - Normalized ESTs

KW - Titanium

KW - Yellow starthistle

UR - http://www.scopus.com/inward/record.url?scp=84883182719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883182719&partnerID=8YFLogxK

U2 - 10.1534/g3.112.003871

DO - 10.1534/g3.112.003871

M3 - Article

C2 - 23390612

AN - SCOPUS:84883182719

VL - 3

SP - 359

EP - 367

JO - G3 (Bethesda, Md.)

JF - G3 (Bethesda, Md.)

SN - 2160-1836

IS - 2

ER -