Water Data Boot Camp - Fall 2019
https://datadevils.github.io/DataBootCamp/
Why this course now?
"The word of the year in the world of water is digital"
- Increased recognition that access to water data and analytics is essential to better inform public policies and business decisions.
- Technological advancements in satellites, drones, sensors and so on create more data that can be transformed into actionable information for water management.
- Cultural, academic, and legislative shift towards open water data and transparency.
Sources: Imagine an Internet of Water, 3 Ways the course of water sustainability changed in 2017
UNIT 1: How has Falls Lake reservoir impacted streamflow?
(Focus: EXCEL)
What you will learn:
The data analysis process.
How to find streamflow data.
Tools and tricks to answer the following questions in Excel:
- How do monthly streamflows compare?
- How has the probability of a 100 or 500 year flood changed?
- How have minimum streamflows changed?
Does the river spend more or less time below the 7Q10 threshold?
- How has mean annual streamflow changed over time? Before vs after reservoir construction?
A teaser showing the same analysis done in the R and Python scripting environments where the final trend analysis will be done for all streams in North Carolina, with results displayed on a map.
Primary focus:
Techniques for finding data and bringing it into your Excel workspace.
Basic data management skills and documentation.
How to manipulate data in Excel:
- Pivot tables
VLookup
functionsIf
statementsSumIf
, CountIf
, etc.
UNIT 2: Developing Water Balance Sheets from Online Data?
(Focus: R or Python)
What you will learn:
- R/Python scripting environments.
- Finding, importing, and "tidying" water use and supply data from on-line repositories.
- Constructing water usage tables from USGS county level water use data.
- Downscaling 1/8° degree resolution hydrologic models into county level water supply data.
- Combining water use and supply data to construct standard Physical Supply and Use Tables (PSUT).
- Summarizing data and producing formatted tables and plots.
- Generating reproducible workflows and documentation.
Primary focus (R):
- Introducing R and R Studio.
- Introducing R-Markdown
- Basics of syntax and coding in R.
- ...
Primary focus (Python):
Introducing Git and GitHub.
Introducing Jupyter notebooks.
Basics of syntax and coding in Python:
- Data types and object oriented programming
- Variables and collections: Lists, Tuples, Sets, & Dictionaries
Retrieving on-line data using the requests
, urllib
packages.
Manipulating dimensional data using the NetCDF4
, NumPy
and Pandas
packages:
- Importing and exporting CSV files
- Filtering/subsetting records; handling missing data
- Grouping and summarizing data
- Reshaping, melting, and transposing tables
- Combining datasets
- Plotting
Spatial analysis with GeoPandas
, shapely
and fiona
:
- Constructing spatial objects from coordinate data
- Reading in GIS files
- Spatial joins
Plotting and mapping data using matplotlib
and basemaps
.
Exporting data into Excel using openpyxl
.
UNIT 3: How often is Jordan Lake exceeding water quality limits?
(Focus: Data Interaction and Visualization)
What you will learn:
- How to find and download water quality data.
- How to assess water quality parameters in context with policy thresholds.
- How to build decision support tools with user inputs in Excel, R, and Python.
Primary focus:
- Building decision-support tools for an end user.
- Communicating results with data visualization tools.