SlideShare a Scribd company logo
1 of 11
Phyteuma vagneri
(
Phyteuma confusum
MODELING PRESENT AND PROSPECTIVE DISTRIBUTION OF PHYTEUMA GENUS IN
CARPATHIAN REGION WITH MACHINE LEARNING TECHNIQUES USING OPEN CLIMATIC AND
SOIL DATA
Assoc. Prof. Dr. Alexander Mkrtchian
Main problems in species distribution modeling
Problem Solution
Data of different formats, different
quality, duplicated data
Data preprocessing
Presence-only data availale Introducing randomly distributed
background data as a surrogate for
absence data
Data distribution violates normality
assumption
Using non-parametric methods (machine
learning)
Spatially autocorrelated data Explicit methods to account for spatial
autocorrelation; rigorous testing of
modeling results on test sets
• Mackey B. G., Nix H. A., Hutchinson M. F., Macmahon J. P., Fleming P. M. (1988). Assessing
representativeness of places for conservation reservation and heritage listing // Environmental
Management, 12 (4), pp. 501–514
• Franklin J. (1995). Predictive vegetation mapping: geographic modelling of biospatial patterns in
relation to environmental gradients // Progress in Physical Geography, 1995, 19, pp. 474–499
• Guisan A.; Zimmermann N. E. (2000). Predictive habitat distribution models in ecology // Ecological
Modelling, 135, pp. 147–186
• Elith J., Graham K. H., Anderson R. P. et al. (2006). Novel methods improve prediction of species’
distributions from occurrence data // Ecography, 29, pp. 129–151
• Elith J., Leathwick J. R., Hastie T. A. (2008). Working guide to boosted regression trees // Journal of
Animal Ecology, 2008, 77, 802–813
• Elith J., Phillips S. J., Hastie T., Dudı´k M., Chee Y. En, Yates C. J. (2011). A statistical explanation of
MaxEnt for ecologists // Diversity and Distributions, 17, pp. 43–57
• https://rspatial.org/raster/sdm/index.html
Main theoretical works
• BIOCLIM – based on «ecological envelope» principle, calculating the
probability of species in the attributes hyperspace
• Maxent – based on minimizing relative entropy of probability
densities in the attributes hyperspace
• General linear models (GLM)
• General additive models (GAM)
• Artificial neural networks (ANN)
• Random forest (RF)
• Bootstrap aggregation (bagging)
• Boosted regression trees (BRT)
• Support vectors machines (SWM)
Main types of models for species distribution modeling
From Elith J., Graham K. H., Anderson R. P. et al.
(2006). Novel methods improve prediction of
species’ distributions from occurrence data //
Ecography, 29, pp. 129–151
 Model specifics and features selected have to be
meaningful from an ecological viewpoint
 Predicted spatial pattern should look plausible from
an expert viewpoint
 Residuals on test set has to be small and random
 Quality, size and spatial extent of data, as well as
appropriate features selected are mostly more
important than the choice of the concrete model and
tuning of its parameters
Data obtained from GBIF for
the Carpathian region
contained 148 records of
Phyteuma genus in total.
After removing duplicates, 80
records have been kept. A
double of observation records
(160 points) were thus
designated as background, with
random coverage of geographic
space inside study area. Six
data points (3 from observed
data and another 3 from
simulated background data)
have subsequently been
removed due to omissions in
predictors data.
Data on climatic conditions were derived from WorldClim
database (1 km. sq. spatial resolution)
There are layers of 19 bioclimatic variables, derived from monthly temperature and
precipitation with a consideration to have biological significance.
17 out of 19 bioclimatic variables were taken out as predictive variables for SDM.
For data on soil conditions, SoilGrids digital maps of soil
properties were used (250 m spatial resolution)
From 11 available physical and chemical soil properties, 4 were chosen as the most
suitable predictors: soil acidity, organic carbon stock, cation exchange capacity, and
total nitrogen. Among six standard depth intervals available, 15–30 cm depth interval
was chosen as the most appropriate for the purpose.
• ML methods examined: Maxent, Random
Forest, Artificial Neural Networks (ANN),
Boosted Regression Trees.
• Assessing the performance of the models and
tuning their parameters: AUC and TSS criteria
calculated for testing data with 6-fold cross-
validation
• Tools used: R programming language and
software environment, SDMtune package
Method AUC
(training)
AUC
(testing)
TSS
(training)
TSS
(testing)
Maxent
(ME)
0.8162 0.796
(0.0138)
0.6114 0.5938
(0.0252)
Artificial neural networks
(ANN)
0.9437 0.9369
(0.005)
0.7508 0.7891
(0.018)
Random forest
(RF)
1 0.9321
(0.0076)
1 0.7851
(0.0261)
Boosted regression trees (BRT) 0.964 0.9304
(0.0062)
0.8145 0.775
(0.0162)
Performance metrics of different SDM methods calculated on training dataset and
with aforementioned testing procedure. For testing case, standard deviations are given in
parentheses.
Model AUC
(training)
AUC
(testing)
TSS
(training)
TSS
(testing)
Full set of 21 variables 0.9437 0.9369
(0.005)
0.7508 0.7891
(0.018)
Reduced set of 6 variables 0.94 0.9381
(0.0036)
0.7384 0.7893
(0.0132)
Performance metrics of ANN SDM method calculated with a full and reduced sets
of predictive variables
ANN method appears to work the best for the case
Relative occurrence probabilities (left) and predicted habitat (right) of Phyteuma
Predicted occurrence probabilities (left) and habitat area (right) of Phyteuma
for year 2050 based on RCP 4.5 climate projections

More Related Content

Similar to Modeling present and prospective distribution of Phyteuma genus in Carpathian region with machine learning techniques using open climatic and soil data

Similar to Modeling present and prospective distribution of Phyteuma genus in Carpathian region with machine learning techniques using open climatic and soil data (20)

Updating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling MethodologiesUpdating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling Methodologies
 
D1T3 enm workflows updated
D1T3 enm workflows updatedD1T3 enm workflows updated
D1T3 enm workflows updated
 
Society for American Archaeology - 2015
Society for American Archaeology - 2015Society for American Archaeology - 2015
Society for American Archaeology - 2015
 
A Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge SystemsA Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge Systems
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
 
Long term socio ecological research sites for crp6
Long term socio ecological research sites for crp6Long term socio ecological research sites for crp6
Long term socio ecological research sites for crp6
 
Predictive association between trait data and eco-geographic data for Nordic ...
Predictive association between trait data and eco-geographic data for Nordic ...Predictive association between trait data and eco-geographic data for Nordic ...
Predictive association between trait data and eco-geographic data for Nordic ...
 
Rethinking data intensive science using scalable analytics systems
 Rethinking data intensive science using scalable analytics systems Rethinking data intensive science using scalable analytics systems
Rethinking data intensive science using scalable analytics systems
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
TERN Surveillance Training 2019 - Day 1, Session 2
TERN Surveillance Training 2019 - Day 1, Session 2TERN Surveillance Training 2019 - Day 1, Session 2
TERN Surveillance Training 2019 - Day 1, Session 2
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...
2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...
2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...
 
Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...Bayesian network-based predictive analytics applied to invasive species distr...
Bayesian network-based predictive analytics applied to invasive species distr...
 
Plant and Animal Genome XXIII, Together for better HTP digital phenotyping
Plant and Animal Genome XXIII, Together for better HTP digital phenotypingPlant and Animal Genome XXIII, Together for better HTP digital phenotyping
Plant and Animal Genome XXIII, Together for better HTP digital phenotyping
 
land health surveillance highlights
land health surveillance highlightsland health surveillance highlights
land health surveillance highlights
 
PREDICTIVE DATA MINING ALGORITHMS FOR OPTIMIZED BEST CROP IN SOIL DATA CLASSI...
PREDICTIVE DATA MINING ALGORITHMS FOR OPTIMIZED BEST CROP IN SOIL DATA CLASSI...PREDICTIVE DATA MINING ALGORITHMS FOR OPTIMIZED BEST CROP IN SOIL DATA CLASSI...
PREDICTIVE DATA MINING ALGORITHMS FOR OPTIMIZED BEST CROP IN SOIL DATA CLASSI...
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
 
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
DATA MINING ATTRIBUTE SELECTION APPROACH FOR DROUGHT MODELLING: A CASE STUDY ...
 

More from Alexander Mkrtchian

More from Alexander Mkrtchian (9)

Annual precipitation data processing and interpolation for the weather statio...
Annual precipitation data processing and interpolation for the weather statio...Annual precipitation data processing and interpolation for the weather statio...
Annual precipitation data processing and interpolation for the weather statio...
 
Lviv city climate tendencies
Lviv city climate tendenciesLviv city climate tendencies
Lviv city climate tendencies
 
Impact of prospective climate changes on future distribution of eco-climate b...
Impact of prospective climate changes on future distribution of eco-climate b...Impact of prospective climate changes on future distribution of eco-climate b...
Impact of prospective climate changes on future distribution of eco-climate b...
 
Обробка та геостатистична інтерполяція даних щодо кількості опадів метеостанц...
Обробка та геостатистична інтерполяція даних щодо кількості опадів метеостанц...Обробка та геостатистична інтерполяція даних щодо кількості опадів метеостанц...
Обробка та геостатистична інтерполяція даних щодо кількості опадів метеостанц...
 
Quantifying Landscape Changes through Land Cover Transition Potential Analysi...
Quantifying Landscape Changes through Land Cover Transition Potential Analysi...Quantifying Landscape Changes through Land Cover Transition Potential Analysi...
Quantifying Landscape Changes through Land Cover Transition Potential Analysi...
 
Новітні та прогнозні зміни кліматичних умов західних регіонів України: просто...
Новітні та прогнозні зміни кліматичних умов західних регіонів України: просто...Новітні та прогнозні зміни кліматичних умов західних регіонів України: просто...
Новітні та прогнозні зміни кліматичних умов західних регіонів України: просто...
 
The analysis of relations between land surface morphometry and spectral chara...
The analysis of relations between land surface morphometry and spectral chara...The analysis of relations between land surface morphometry and spectral chara...
The analysis of relations between land surface morphometry and spectral chara...
 
Interpolation of meteodata using the method of regression-kriging
Interpolation of meteodata using the method of regression-krigingInterpolation of meteodata using the method of regression-kriging
Interpolation of meteodata using the method of regression-kriging
 
Modeling the location of natural cold-limited treeline and alpine meadow habi...
Modeling the location of natural cold-limited treeline and alpine meadow habi...Modeling the location of natural cold-limited treeline and alpine meadow habi...
Modeling the location of natural cold-limited treeline and alpine meadow habi...
 

Recently uploaded

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

Recently uploaded (20)

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Modeling present and prospective distribution of Phyteuma genus in Carpathian region with machine learning techniques using open climatic and soil data

  • 1. Phyteuma vagneri ( Phyteuma confusum MODELING PRESENT AND PROSPECTIVE DISTRIBUTION OF PHYTEUMA GENUS IN CARPATHIAN REGION WITH MACHINE LEARNING TECHNIQUES USING OPEN CLIMATIC AND SOIL DATA Assoc. Prof. Dr. Alexander Mkrtchian
  • 2. Main problems in species distribution modeling Problem Solution Data of different formats, different quality, duplicated data Data preprocessing Presence-only data availale Introducing randomly distributed background data as a surrogate for absence data Data distribution violates normality assumption Using non-parametric methods (machine learning) Spatially autocorrelated data Explicit methods to account for spatial autocorrelation; rigorous testing of modeling results on test sets
  • 3. • Mackey B. G., Nix H. A., Hutchinson M. F., Macmahon J. P., Fleming P. M. (1988). Assessing representativeness of places for conservation reservation and heritage listing // Environmental Management, 12 (4), pp. 501–514 • Franklin J. (1995). Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients // Progress in Physical Geography, 1995, 19, pp. 474–499 • Guisan A.; Zimmermann N. E. (2000). Predictive habitat distribution models in ecology // Ecological Modelling, 135, pp. 147–186 • Elith J., Graham K. H., Anderson R. P. et al. (2006). Novel methods improve prediction of species’ distributions from occurrence data // Ecography, 29, pp. 129–151 • Elith J., Leathwick J. R., Hastie T. A. (2008). Working guide to boosted regression trees // Journal of Animal Ecology, 2008, 77, 802–813 • Elith J., Phillips S. J., Hastie T., Dudı´k M., Chee Y. En, Yates C. J. (2011). A statistical explanation of MaxEnt for ecologists // Diversity and Distributions, 17, pp. 43–57 • https://rspatial.org/raster/sdm/index.html Main theoretical works
  • 4. • BIOCLIM – based on «ecological envelope» principle, calculating the probability of species in the attributes hyperspace • Maxent – based on minimizing relative entropy of probability densities in the attributes hyperspace • General linear models (GLM) • General additive models (GAM) • Artificial neural networks (ANN) • Random forest (RF) • Bootstrap aggregation (bagging) • Boosted regression trees (BRT) • Support vectors machines (SWM) Main types of models for species distribution modeling From Elith J., Graham K. H., Anderson R. P. et al. (2006). Novel methods improve prediction of species’ distributions from occurrence data // Ecography, 29, pp. 129–151
  • 5.  Model specifics and features selected have to be meaningful from an ecological viewpoint  Predicted spatial pattern should look plausible from an expert viewpoint  Residuals on test set has to be small and random  Quality, size and spatial extent of data, as well as appropriate features selected are mostly more important than the choice of the concrete model and tuning of its parameters
  • 6. Data obtained from GBIF for the Carpathian region contained 148 records of Phyteuma genus in total. After removing duplicates, 80 records have been kept. A double of observation records (160 points) were thus designated as background, with random coverage of geographic space inside study area. Six data points (3 from observed data and another 3 from simulated background data) have subsequently been removed due to omissions in predictors data.
  • 7. Data on climatic conditions were derived from WorldClim database (1 km. sq. spatial resolution) There are layers of 19 bioclimatic variables, derived from monthly temperature and precipitation with a consideration to have biological significance. 17 out of 19 bioclimatic variables were taken out as predictive variables for SDM. For data on soil conditions, SoilGrids digital maps of soil properties were used (250 m spatial resolution) From 11 available physical and chemical soil properties, 4 were chosen as the most suitable predictors: soil acidity, organic carbon stock, cation exchange capacity, and total nitrogen. Among six standard depth intervals available, 15–30 cm depth interval was chosen as the most appropriate for the purpose.
  • 8. • ML methods examined: Maxent, Random Forest, Artificial Neural Networks (ANN), Boosted Regression Trees. • Assessing the performance of the models and tuning their parameters: AUC and TSS criteria calculated for testing data with 6-fold cross- validation • Tools used: R programming language and software environment, SDMtune package
  • 9. Method AUC (training) AUC (testing) TSS (training) TSS (testing) Maxent (ME) 0.8162 0.796 (0.0138) 0.6114 0.5938 (0.0252) Artificial neural networks (ANN) 0.9437 0.9369 (0.005) 0.7508 0.7891 (0.018) Random forest (RF) 1 0.9321 (0.0076) 1 0.7851 (0.0261) Boosted regression trees (BRT) 0.964 0.9304 (0.0062) 0.8145 0.775 (0.0162) Performance metrics of different SDM methods calculated on training dataset and with aforementioned testing procedure. For testing case, standard deviations are given in parentheses. Model AUC (training) AUC (testing) TSS (training) TSS (testing) Full set of 21 variables 0.9437 0.9369 (0.005) 0.7508 0.7891 (0.018) Reduced set of 6 variables 0.94 0.9381 (0.0036) 0.7384 0.7893 (0.0132) Performance metrics of ANN SDM method calculated with a full and reduced sets of predictive variables ANN method appears to work the best for the case
  • 10. Relative occurrence probabilities (left) and predicted habitat (right) of Phyteuma
  • 11. Predicted occurrence probabilities (left) and habitat area (right) of Phyteuma for year 2050 based on RCP 4.5 climate projections