class: center, middle, inverse, title-slide # An Open Science Framework for Research on Cyanobacteria in Lakes and Ponds ## US EPA, Region 7 ### Jeff Hollister, Farnaz Nojavan, Betty Kreakie, Stephen Shivers, and Bryan Milstead ###
2017-10-11
Lenexa, KS
--- class: center, middle, inverse # Twitter? ![Yes!!!](http://www.reactiongifs.us/wp-content/uploads/2014/11/thumbs_up_waynes_world.gif) ### hashtag: \#cyanobacteria ### me: @jhollist --- class: center, middle, inverse # Who, what, why, and how? --- # Who are we? .left-column[ - Ecologists - Computational focus - Enough to be dangerous - 3 FTE - Myself - Betty Kreakie - Bryan Milstead - 2 Post-docs - Farnaz Nojavan - Stephen Shivers ] .right-column[ <img src="figure/comp_eco_crew.jpg" style="margin-left: 75px"></img> ] --- # What do we do? - Apply computational approaches to understand water quality impacts in lakes - Open Science ![The workflow](figure/cyano_model_function.jpg) --- # What is open science? - Access to materials - Reproducible/ Repeatable - The Web! - A process, not a state <img src="figure/open_sign_4344960203_821cb56ec9_o_resize.jpg" style="height: 80%; width: 80%; margin-left: 100px"></img> --- # Why open science? - Often required - Government/Funders/Journals - Benefits researchers - [Mciernan et al. (2016) How open science helps researchers succeed](https://elifesciences.org/content/5/e16800) - Improves quality - [The classic example: Reinhart and Rogoff](http://www.newyorker.com/news/john-cassidy/the-reinhart-and-rogoff-controversy-a-summing-up) - Benefits to society - ["Sharing of Data Leads to Progress on Alzheimer’s"](http://www.nytimes.com/2010/08/13/health/research/13alzheimer.html) <img src="https://media.giphy.com/media/1M9fmo1WAFVK0/giphy.gif" style="width: 60%; height: 60%; margin-left: 20%;"></img> --- # How are we open? .left-column[ - R package development - Research compendia - Tooling for common problems - Visualization - Sharing and collaborating - Publishing - Apply to our research efforts ] .right-column[ <img src = "https://media.giphy.com/media/l0MYyv6UK0Bd4DE76/giphy.gif" style="width: 110%; height: 110%; margin-top: 50px;"></img> ] --- class: center, middle, inverse # R Packages --- # Why R Packages - Useful structure - Infrastructure for sharing - GitHub - CRAN - We are an R shop! <img src="https://media.giphy.com/media/l2JhtKtDWYNKdRpoA/giphy.gif" style="width: 80%; height: 80%; margin-left: 10%;"></img> --- # Research Compendia .left-column[ - Define - Origins - [Gentleman and Lang (2004)](http://biostats.bepress.com/bioconductor/paper2) - Part of - Reproducible Research - Literate Programming (ala Donald Knuth) - ROpenSci efforts - [rrrpkg](https://github.com/ropensci/rrrpkg) - [ROpenSci unconf 2017 discussion](https://github.com/ropensci/unconf17/issues/5) ] .right-column[ <img src="http://www.dlib.org/dlib/january17/nuest/nuest-fig1.png" style="margin-left: 75px"></img> from Nüst, Konkol, et al (2017), https://doi.org/10.1045/january2017-nuest ] --- # Packages as Research Compendia - R, Data, and Vignettes folders - Other examples - [Carl Boettiger's template](https://github.com/cboettig/template) - [Ben Marwick](https://github.com/benmarwick/Pleistocene-aged-stone-artefacts-from-Jerimalai--East-Timor) - Our examples - https://github.com/usepa/LakeTrophicModelling - https://github.com/usepa/Microcystinchla) - GitHub and Zenodo (Archive) ![ghz](ghz.jpg) --- # Packages to solve common problems - `lakemorpho` - `elevatr` - `goatscape` (in development) <img src="https://media3.giphy.com/media/ID4NXWnwuLnLq/200.webp#0-grid1" style="width: 80%; height: 80%; margin-left: 10%;"></img> --- # `lakemorpho` .footnote[Package URL: <https://cran.r-project.org/package=lakemorpho>] .left-column[ - Lake morphometry metrics in R - Version 1.0 - August 2014 - Version 1.1.0 - December 2016 - `sf` support to be added - [National Lake Morphometry](https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B495CBAED-9BB9-49B4-80A7-1C91DE5FCA95%7D) - [Hollister and Milstead (2010)](http://dx.doi.org/10.1080/07438141.2010.504321) - [Hollister *et. al.* (2011)](http://dx.doi.org/10.1371/journal.pone.0025764) - [Hollister and Stachelek (2017)](https://f1000research.com/articles/6-1718/v1) ] .right-column[ ![lakemorpho](figure/lakemorpho.png) ] --- class: center, middle background-image: url('figure/lakemorpho_demo.png') background-position: 50% 50% # lakemorpho::demo <!-- # [lakemorpho: Demo](http://server.jwhollister.com:8787) --> --- # `elevatr` .footnote[Package URL: <https://cran.r-project.org/package=elevatr>] .left-column[ - Access elevation data in R - Mapzen - AWS - USGS - Version 0.1.1 - January 2017 - Version 0.1.3 - March 2017 - Will be paired with `lakemorpho` - `sf` support to be added ] .right-column[ ![elevatr](figure/elevatr.png) ] --- class: center, middle background-image: url('figure/elevatr_demo.png') background-position: 50% 50% # elevatr::demo <!-- # [elevatr: Demo](http://server.jwhollister.com:8787) --> --- # `goatscape` .left-column[ - New effort with Bryan Milstead - What's in a name? - Summarizes ancillary data for a user-defined landscape polygon - Census (via `censusapi`) - Landcover - Impervious - Accepts arbitrary spatial data for the landscape - Based on `sf` and tidy by design - <https://github.com/usepa/goatscape> ] .right-column[ <img src="figure/goatscape_logo.jpg" style="width: 90%; height: 90%;"></img> ] --- class: middle, center, inverse # Data Visualization --- # Shiny: Cyanobacteria Monitoring Collaborative .footnote[Project URL: <http://cyanos.org>] .left-column[ - Started in 2013 - New England Region Cyanobacteria Monitoring Workgroup - Three Projects - bloomWatch - cyanoScope - Monitoring - Data Viz with Shiny ] .right-column[ ![cyano web](figure/cyano_web.jpg) ] --- class: middle, center background-image: url("figure/shiny.jpg") # [Shiny: Demo](https://cyanos.shinyapps.io/cyanoviz/) --- class: middle, center, inverse # Sharing and Collaborating --- # GitHub - What is it? - How do we use it? <img src = "http://www.storybench.org/wp-content/uploads/2015/09/github-octocat.jpg" style="margin-left: 5%; margin-top: -20px; width: 700px;"></img> --- class: middle, center background-image: url("figure/github_demo.jpg") # [GitHub: Demo](https://github.com/usepa) --- class: middle, center, inverse # Open Access --- # Publishing - Preprints - [Hollister *el al.* (2016) PeerJ Preprints](http://dx.doi.org/10.7287/peerj.preprints.1319v2) - Open first - [Milstead *et al.* (2013) PLoS One](http://dx.doi.org/10.1371/journal.pone.0081457) - [Hollister and Kreakie (2016) F1000Research](https://dx.doi.org/10.12688/f1000research.7955.2) - Money where our mouth(s) is(are) - [Kreakie *et al.* (2015) LakeLines](http://www.nalms.org/media.acux/beb75c9c-f812-4753-b888-79864899c6d6) <img src = "figure/oa_journals.jpg" style="margin-left: 15%; width: 500px;"></img> --- class: middle, center, inverse # Open Science Research --- # Models and field research - Random forest models of trophic state and chlorophyll *a* - Re-thinking the Lake Trophic State Index - Chlorophyll *a* and microcystin - Temporal and spatial dynamics of cyanobacteria blooms - New work - Lake photic zone temperature - Phytoplankton community analysis <img src="https://media.giphy.com/media/qJsJI0MhazjGw/giphy.gif" style="width: 70%; height: 70%; margin-left: 10%;"></img> --- # Random forest models of trophic state and chlorophyll *a* .left-column[ - National - Data - National Lakes Assessment - Land cover - `randomForest` package - Variable selection - All variables (water quality and GIS) - 68.7% Total Accuracy - GIS only variables - 49% Total Accuracy - But ...] .right-column[ <img src="figure/hollisterES15-00703R_fig11.jpg" style="width: 100%; height: 100%; margin-top: 50px;"></img> ] --- # Random forest models of trophic state and chlorophyll *a* - How is it open and reproducible? - [GitHub](https://github.com/usepa/LakeTrophicModelling) - [10.5281/zenodo.40271](http://dx.doi.org/10.5281/zenodo.40271) - [PeerJ Pre-print](https://peerj.com/preprints/1319/) - [Ecosphere (OA)](http://onlinelibrary.wiley.com/doi/10.1002/ecs2.1321/full) ![ecosphere](figure/ecosphere.jpg) --- # Re-thinking the Lake Trophic State Index .left-column[ - Led by Farnaz Nojavan - Hierarchical model - Nitrogen and Phosphorus - POLR: Revised Trophic State Index - Total Accuracy - 0.6 - Balanced Accuracy - 0.68 to 0.78 ] .right-column[ <img src="figure/dag-0.jpg" style="width: 100%; height: 100%; margin-top: 50px"></img> ] --- # Re-thinking the Lake Trophic State Index .left-column[ - Hierarchical model - Nitrogen and Phosphorus - POLR: Revised Trophic State Index - Total Accuracy - 0.6 - Balanced Accuracy - 0.68 to 0.78 ] .right-column[ ![models](figure/predcit_acc-0.jpg) ] --- # Re-thinking the Lake Trophic State Index - How is it open and reproducible? - [GitHub](https://github.com/usepa/rethinking_tsi) - [10.5281/zenodo.556175](https://doi.org/10.5281/zenodo.556175) - OA (when published) ![ecol_model](figure/ecol_model.jpg) --- # Chlorophyll *a* and microcystin .left-column[ - National - Diagnostic tool - Probability - Exceeding microcystin advisory - Given chlorophyll *a* concentration ] .right-column[ <img src="https://f1000researchdata.s3.amazonaws.com/manuscripts/9675/6edf7059-7b8a-43c8-8488-a06c5f761572_figure1.gif" style="width: 120%; height: 120%; margin-top: 50px;"></img> ] --- # Chlorophyll *a* and microcystin - The numbers! ![mcyst_table](figure/mcys_table.jpg) --- # Chlorophyll *a* and microcystin - How is it open? - [GitHub](https://github.com/usepa/microcystinchla) - [Zenodo](http://dx.doi.org/10.5281/zenodo.55273) - [F1000Research](https://f1000research.com/articles/5-151/v2) - Pre-print and peer-reviewed in one! ![f1000](figure/f1000.jpg) --- # Temporal and spatial dynamics of cyanobacteria blooms - Led by Stephen Shivers - Rhode Island - Field effort - 2 ponds - Yawgoo Pond (the nice wooded site) - Warwick Pond (gritty and (somewhat) urban site) - Twice weekly - Seven sampling locations in each <img src="figure/yawg_warw.png" style="width: 83%; margin-top: -2%; margin-left: 10%;"></img> --- # Temporal and spatial dynamics of cyanobacteria blooms .left-column[ - Measurements - Chlorophyll *a* - Phycocyanin - Microcystin - Turbidity - Physical profiles - Secchi - Plankton - Nutrients ] .right-column[ <img src="figure/chla.png" style="width: 120%; height: 120%; margin-top: 50px;"></img> ] --- # Temporal and spatial dynamics of cyanobacteria blooms - How will it be open? - [Private (for now) GitHub]() - Zenodo - Open Access publications - Data publication? ![cyano_space_time](figure/cyano_space_time.jpg) --- # New work - Hierarchical Bayes models of microcystin - Lake photic zone temperature - Phytoplankton community analysis <img src="figure/Lila1.jpg" style="width: 42%; height: 42%; margin-left:30%"></img> --- # Thanks! .center[ ## Jeff Hollister US EPA </br> Atlantic Ecology Division </br> Narragansett, RI </br> email: [hollister.jeff@epa.gov](mailto:hollister.jeff@epa.gov) </br> twitter: [@jhollist](https://twitter.com/jhollist) </br> github: [jhollist](https://github.com/jhollist) </br> Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). ]