Big data of tree species distributions: how big and how good?

Josep M. Serra-Diaz, Brian Enquist, Brian Maitner, Cory Merow, Jens C. Svenning

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background: Trees play crucial roles in the biosphere and societies worldwide, with a total of 60,065 tree species currently identified. Increasingly, a large amount of data on tree species occurrences is being generated worldwide: from inventories to pressed plants. While many of these data are currently available in big databases, several challenges hamper their use, notably geolocation problems and taxonomic uncertainty. Further, we lack a complete picture of the data coverage and quality assessment for open/public databases of tree occurrences. Methods: We combined data from five major aggregators of occurrence data (e.g. Global Biodiversity Information Facility, Botanical Information and Ecological Network v.3, DRYFLOR, RAINBIO and Atlas of Living Australia) by creating a workflow to integrate, assess and control data quality of tree species occurrences for species distribution modeling. We further assessed the coverage – the extent of geographical data – of five economically important tree families (Arecaceae, Dipterocarpaceae, Fagaceae, Myrtaceae, Pinaceae). Results: Globally, we identified 49,206 tree species (84.69% of total tree species pool) with occurrence records. The total number of occurrence records was 36.69 M, among which 6.40 M could be considered high quality records for species distribution modeling. The results show that Europe, North America and Australia have a considerable spatial coverage of tree occurrence data. Conversely, key biodiverse regions such as South-East Asia and central Africa and parts of the Amazon are still characterized by geographical open-public data gaps. Such gaps are also found even for economically important families of trees, although their overall ranges are covered. Only 15,140 species (26.05%) had at least 20 records of high quality. Conclusions: Our geographical coverage analysis shows that a wealth of easily accessible data exist on tree species occurrences worldwide, but regional gaps and coordinate errors are abundant. Thus, assessment of tree distributions will need accurate occurrence quality control protocols and key collaborations and data aggregation, especially from national forest inventory programs, to improve the current publicly available data.

Original languageEnglish (US)
Article number30
JournalForest Ecosystems
Volume4
Issue number1
DOIs
StatePublished - Dec 1 2017

Fingerprint

biogeography
species occurrence
distribution
Fagaceae
Dipterocarpaceae
Pinaceae
species pool
Arecaceae
Central Africa
national forests
Myrtaceae
forest inventory
South East Asia
data quality
atlas
quality control
biosphere
modeling
uncertainty
biodiversity

Keywords

  • Big data
  • Occurrence data
  • Quality control and assessment
  • Tree distributions

ASJC Scopus subject areas

  • Forestry
  • Ecology, Evolution, Behavior and Systematics
  • Ecology
  • Nature and Landscape Conservation

Cite this

Big data of tree species distributions : how big and how good? / Serra-Diaz, Josep M.; Enquist, Brian; Maitner, Brian; Merow, Cory; Svenning, Jens C.

In: Forest Ecosystems, Vol. 4, No. 1, 30, 01.12.2017.

Research output: Contribution to journalArticle

Serra-Diaz, Josep M. ; Enquist, Brian ; Maitner, Brian ; Merow, Cory ; Svenning, Jens C. / Big data of tree species distributions : how big and how good?. In: Forest Ecosystems. 2017 ; Vol. 4, No. 1.
@article{1ccdd209d2d64a4988a577734bf49747,
title = "Big data of tree species distributions: how big and how good?",
abstract = "Background: Trees play crucial roles in the biosphere and societies worldwide, with a total of 60,065 tree species currently identified. Increasingly, a large amount of data on tree species occurrences is being generated worldwide: from inventories to pressed plants. While many of these data are currently available in big databases, several challenges hamper their use, notably geolocation problems and taxonomic uncertainty. Further, we lack a complete picture of the data coverage and quality assessment for open/public databases of tree occurrences. Methods: We combined data from five major aggregators of occurrence data (e.g. Global Biodiversity Information Facility, Botanical Information and Ecological Network v.3, DRYFLOR, RAINBIO and Atlas of Living Australia) by creating a workflow to integrate, assess and control data quality of tree species occurrences for species distribution modeling. We further assessed the coverage – the extent of geographical data – of five economically important tree families (Arecaceae, Dipterocarpaceae, Fagaceae, Myrtaceae, Pinaceae). Results: Globally, we identified 49,206 tree species (84.69{\%} of total tree species pool) with occurrence records. The total number of occurrence records was 36.69 M, among which 6.40 M could be considered high quality records for species distribution modeling. The results show that Europe, North America and Australia have a considerable spatial coverage of tree occurrence data. Conversely, key biodiverse regions such as South-East Asia and central Africa and parts of the Amazon are still characterized by geographical open-public data gaps. Such gaps are also found even for economically important families of trees, although their overall ranges are covered. Only 15,140 species (26.05{\%}) had at least 20 records of high quality. Conclusions: Our geographical coverage analysis shows that a wealth of easily accessible data exist on tree species occurrences worldwide, but regional gaps and coordinate errors are abundant. Thus, assessment of tree distributions will need accurate occurrence quality control protocols and key collaborations and data aggregation, especially from national forest inventory programs, to improve the current publicly available data.",
keywords = "Big data, Occurrence data, Quality control and assessment, Tree distributions",
author = "Serra-Diaz, {Josep M.} and Brian Enquist and Brian Maitner and Cory Merow and Svenning, {Jens C.}",
year = "2017",
month = "12",
day = "1",
doi = "10.1186/s40663-017-0120-0",
language = "English (US)",
volume = "4",
journal = "Forest Ecosystems",
issn = "2095-6355",
publisher = "SpringerOpen",
number = "1",

}

TY - JOUR

T1 - Big data of tree species distributions

T2 - how big and how good?

AU - Serra-Diaz, Josep M.

AU - Enquist, Brian

AU - Maitner, Brian

AU - Merow, Cory

AU - Svenning, Jens C.

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Background: Trees play crucial roles in the biosphere and societies worldwide, with a total of 60,065 tree species currently identified. Increasingly, a large amount of data on tree species occurrences is being generated worldwide: from inventories to pressed plants. While many of these data are currently available in big databases, several challenges hamper their use, notably geolocation problems and taxonomic uncertainty. Further, we lack a complete picture of the data coverage and quality assessment for open/public databases of tree occurrences. Methods: We combined data from five major aggregators of occurrence data (e.g. Global Biodiversity Information Facility, Botanical Information and Ecological Network v.3, DRYFLOR, RAINBIO and Atlas of Living Australia) by creating a workflow to integrate, assess and control data quality of tree species occurrences for species distribution modeling. We further assessed the coverage – the extent of geographical data – of five economically important tree families (Arecaceae, Dipterocarpaceae, Fagaceae, Myrtaceae, Pinaceae). Results: Globally, we identified 49,206 tree species (84.69% of total tree species pool) with occurrence records. The total number of occurrence records was 36.69 M, among which 6.40 M could be considered high quality records for species distribution modeling. The results show that Europe, North America and Australia have a considerable spatial coverage of tree occurrence data. Conversely, key biodiverse regions such as South-East Asia and central Africa and parts of the Amazon are still characterized by geographical open-public data gaps. Such gaps are also found even for economically important families of trees, although their overall ranges are covered. Only 15,140 species (26.05%) had at least 20 records of high quality. Conclusions: Our geographical coverage analysis shows that a wealth of easily accessible data exist on tree species occurrences worldwide, but regional gaps and coordinate errors are abundant. Thus, assessment of tree distributions will need accurate occurrence quality control protocols and key collaborations and data aggregation, especially from national forest inventory programs, to improve the current publicly available data.

AB - Background: Trees play crucial roles in the biosphere and societies worldwide, with a total of 60,065 tree species currently identified. Increasingly, a large amount of data on tree species occurrences is being generated worldwide: from inventories to pressed plants. While many of these data are currently available in big databases, several challenges hamper their use, notably geolocation problems and taxonomic uncertainty. Further, we lack a complete picture of the data coverage and quality assessment for open/public databases of tree occurrences. Methods: We combined data from five major aggregators of occurrence data (e.g. Global Biodiversity Information Facility, Botanical Information and Ecological Network v.3, DRYFLOR, RAINBIO and Atlas of Living Australia) by creating a workflow to integrate, assess and control data quality of tree species occurrences for species distribution modeling. We further assessed the coverage – the extent of geographical data – of five economically important tree families (Arecaceae, Dipterocarpaceae, Fagaceae, Myrtaceae, Pinaceae). Results: Globally, we identified 49,206 tree species (84.69% of total tree species pool) with occurrence records. The total number of occurrence records was 36.69 M, among which 6.40 M could be considered high quality records for species distribution modeling. The results show that Europe, North America and Australia have a considerable spatial coverage of tree occurrence data. Conversely, key biodiverse regions such as South-East Asia and central Africa and parts of the Amazon are still characterized by geographical open-public data gaps. Such gaps are also found even for economically important families of trees, although their overall ranges are covered. Only 15,140 species (26.05%) had at least 20 records of high quality. Conclusions: Our geographical coverage analysis shows that a wealth of easily accessible data exist on tree species occurrences worldwide, but regional gaps and coordinate errors are abundant. Thus, assessment of tree distributions will need accurate occurrence quality control protocols and key collaborations and data aggregation, especially from national forest inventory programs, to improve the current publicly available data.

KW - Big data

KW - Occurrence data

KW - Quality control and assessment

KW - Tree distributions

UR - http://www.scopus.com/inward/record.url?scp=85050374354&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050374354&partnerID=8YFLogxK

U2 - 10.1186/s40663-017-0120-0

DO - 10.1186/s40663-017-0120-0

M3 - Article

AN - SCOPUS:85050374354

VL - 4

JO - Forest Ecosystems

JF - Forest Ecosystems

SN - 2095-6355

IS - 1

M1 - 30

ER -