Background: How has Falls Lake affected streamflow?

Prior to 1978, flooding of the Neuse River caused extensive damage to public and private properties including roadways, railroads, industrial sites, and farmlands. The Falls Lake Project was developed by the U.S. Army Corps of Engineers to control damaging floods and to supply a source of water for surrounding communities. Construction of the dam began in 1978 and was completed in 1981. In addition to recreation opportunities, Falls Lake now provides flood control, water supply, water quality, and fish and wildlife conservation resources (https://111.ncparks.gov/falls-lake-state-recreation-area/history).

Now, some 3 dozen years after Falls Lake was constructed, we want to evaluate its impact on downstream streamflow, using this as an opportunity to examine some data analytic techniques, specifically ones using Microsoft Excel. These techniques will cover ways to get data into Excel and how to manage, wrangle, and analyze those data.

This document begins with a review of the analytical workflow of most data projects. We apply this workflow to understand how streamflow has changed since Falls Lake was constructed. In doing so, we focus on using Excel to tackle the assignment, but companion documents will examine how R and Python can accomplish the same set of tasks in a scripting environment.




Analytical workflow: A Preview

Data analysis projects follow a similar workflow, one that begins with a general question or objective that guides our process of turning data into actionable information. Let’s begin by examining our workflow, depicted in the diagram below.



Analytic Workflow


Applying the Analytical Workflow: Falls Lake


Obtaining the data

The USGS actively supports Rcran development by building a community of practice. This community has developed several R packages that can be used to load and analyze streamflow data: https://owi.usgs.gov/R/packages.html

The package we will be using to load data into R is the dataRetrieval library. Information on how to use this library can be found here: https://owi.usgs.gov/R/dataRetrieval.html#2



Installing packages and loading libraries

Rcran is open source, meaning anyone can create packages for other people to use. To access these packages you must first install them onto your machine. Installing packages only needs to be done on one occasion; however, you may decide to install packages every time you open your script if you think the package is being regularly updated.

There are 2 primary ways to install a package onto your machine.


Method 1 - Manual

  1. In the menu click on Tools then Install Packages.
  2. Select the package(s) you wish to install. Start typing and it will provide options.


Method 2 - Code

install.packages("XXX")

You can install multiple packages by typing: install.packages(c("XXX1","XXX2","XXX3"))

The c() creates a list that r can read through.

For this project we will load in the following libraries:

  • dataRetrieval - USGS package to download data directly to R
  • EGRET - USGS package that contains some basic statistics for streamflow data
  • ggplot2 - popular plotting library in R
  • dplyr and magrittr are used to link commands together by piping. Pipes, %>% link each call made in R to the previous call. We’ll learn more on this later.
  • lubridate - package used for dates and time
  • leaflet - package used for maps

In the case of the USGS dataRetrieval package, the package is under active development. In this case you may want to reinstall the package each time you use it to make sure you have all the latest functionality.

Libraries are now installed on your screen, but they aren’t loaded into your current session. You have to tell R to load the library for use using library(XXX).


Prep the workspace by installing (if needed) and loading libraries

# install.packages("dataRetrieval", repos=c("http://owi.usgs.gov/R", getOption("repos")))
# install.packages(c('EGRET', 'ggplot2', 'dplyr', 'magrittr', 'lubridate', 'leaflet'))

library(dataRetrieval)
library(EGRET)
library (ggplot2)

# Library for pipes
library(dplyr)
library(magrittr)

# Used for dates
library(lubridate)

# Used for mapping
library(leaflet)

# Access vignette if needed
#vignette("dataRetrieval", package="dataRetrieval")


Download data directly from NWIS

We will use the USGS packages to download data directly from the USGS. To do this we need to identify what we want to download. We need to define:


  • siteNumbers - this is the USGS gauge containing the data we want to download
    • We will define the variable siteNo and assign it to Clayton gauge: 02087500




  • We also can identify the start and the end date. In this case we will define the following:
    • start.date = "1930-10-01"
    • end.date = "2017-09-30"


Identify parameters

# Identify gauge to download
siteNo <- '02087500' #Clayton, don't forget to add the zero if it is missing

#siteNo <- '02089000' #Goldsboro, don't forget to add the zero if it is missing

# Identify parameter of interest: https://help.waterdata.usgs.gov/code/parameter_cd_query?fmt=rdb&inline=true&group_cd=%
pcode = '00060' #discharge (cfs)

# Identify statistic code for daily values: https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html
scode = "00003"  #mean

# Identify start and end dates
start.date = "1930-10-01"
end.date = "2017-09-30"

# Pick service
serv <- "dv"


We can now put all this information to call the data from the USGS NWIS website. There are a couple of ways we can do this, but we will use readNWISdv.


Steps:

  1. Put the data into the variable neuse
    • neuse <- renameNWISColumns(neuse) provides meaningful column names to the data


  1. See what the column names are by: colnames(neuse)


  1. Access different attributes associated with this stream gauge
    • To know what that list is: names(attributes(neuse))


  1. From this list look at information on variables:
    • parameterInfo <- attr(neuse, "variableInfo") provides information on variables
    • siteInfo <- attr(neuse, "siteInfo") provides information on the site


Load and inspect data

# Load in data using the site ID
neuse <- readNWISdata(siteNumbers = siteNo, 
                      parameterCd = pcode, 
                      statCd = scode, 
                      startDate=start.date, 
                      endDate=end.date, 
                      service = serv)

# Summarize the variables in the neuse dataframe
summary(neuse)
##   agency_cd           site_no             dateTime         
##  Length:31777       Length:31777       Min.   :1930-10-01  
##  Class :character   Class :character   1st Qu.:1952-07-01  
##  Mode  :character   Mode  :character   Median :1974-04-01  
##                                        Mean   :1974-04-01  
##                                        3rd Qu.:1995-12-31  
##                                        Max.   :2017-09-30  
##  X_00060_00003   X_00060_00003_cd      tz_cd          
##  Min.   :   45   Length:31777       Length:31777      
##  1st Qu.:  301   Class :character   Class :character  
##  Median :  525   Mode  :character   Mode  :character  
##  Mean   : 1111                                        
##  3rd Qu.: 1200                                        
##  Max.   :22500
# Dimensions of the neuse dataframe (rows, columns)
dim(neuse) 
## [1] 31777     6
# Rename columns to something more useful
neuse <- renameNWISColumns(neuse)
colnames(neuse)
## [1] "agency_cd" "site_no"   "dateTime"  "Flow"      "Flow_cd"   "tz_cd"
# Access names of attributes
attributes(neuse)$names
## [1] "agency_cd" "site_no"   "dateTime"  "Flow"      "Flow_cd"   "tz_cd"
# Store some attributes for later...
parameterInfo <- attr(neuse, "variableInfo")
siteInfo <- attr(neuse, "siteInfo")


Exploratory Data Analysis: Plot data

Now that we have our data, we should explore and inspect the data before we dive into our analysis. This may expose irregularities in the data, stemming either from our import process or in the dataset itself. Plots and summary statistics are a quick way to reveal any data gaps or outliers.

Compute cms values from your cfs ones: (Hint: To create a new column use $)

# Create cms column to plot cubic meters per second
neuse$Flow_cms <- neuse$Flow*0.028316847

# Summarize the new variable
summary(neuse$Flow_cms)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.274   8.523  14.866  31.460  33.980 637.129


Plot streamflow data to look for gaps/outliers

We will use ggplot2 to plot all figures.


  1. Create a base ggplot object (i.e. a plot)
    • Define your data and your aes (aesthetic, i.e. your x and y variables)
    • aes can also include color and shape configurations (e.g. if you want to color lines by month)


  1. Define the type of plot you are creating
    • geom_line() creates a line plot
    • geom_point() creates a scatter
    • geom_bar() creates a bar chart
    • geom_hist() creates a histogram (note: you only need an x variable)
    • There are many more and you can combine plot types!


  1. Plot the data… there are a lot of additions you can add to the plot with a +
    • labs sets labels
    • theme can adjust fonts, legends, etc.
    • scale_color_brewer and scale_fill_brewer can change your color palette
    • Again, there are many more options!


# Using ggplot2 #
#################
# Create a date variable that is not POSIXct
neuse$Date <- as.Date(neuse$dateTime)

# Plot the basic line plot
plot <- ggplot(data = neuse, 
               aes(x = Date , y = Flow_cms)) +
        geom_line()

# We can add options by simply using "+"
plot <- plot + 
        labs(x = '', y = 'Streamflow (cms)')
plot

# We can color the line by confidence as well
plot2 <- ggplot(data = neuse,
                aes(x = Date, y = Flow_cms, colour = Flow_cd))+
         geom_line() +
         labs(x = '', y = 'Streamflow (cms)', colour = 'Confidence')

plot2

# And we can change color themes (Google RColorBrewer for more palettes)
plot2_Dark2 <- plot2 +
               scale_color_brewer(palette = 'Dark2')

plot2_Dark2


Exploratory Data Analysis: Summary Statistics


Know your data. Whenever you get data, it’s good practice to look at and understand the raw data to make sure it looks reasonable and you understand something about its quality and content. One of the first things to assess in the metadata is how confident you are in the quality of the data being used in your analysis. Is the data fit for purpose?


How confident are we in the data? * If you noticed in the summary of the data there was a column with confidence codes on the quality of the data.

  • table creates a table that counts the number of occurrences for each category of confidence code.
    • Notice the table is in an array. We want to put it into a data frame (similar to the neuse data) so we can manipulate the columns. The command as.data.frame puts the table into a format we can use.


  • colnames allows you to rename the columns.
    • To perform operations on the data you must call the column name. You do this by first listing the name of your data: confidence and then calling on the column of interest by using $ and then the column name.


  • You can create a pie chart and set the color of the pie using the following command: pie()


  • You are creating the pie chart using the percent column and labeling the chart using the category column. cex refers to the size of the label.
#How confident are we in the data? #
####################################
# Number of observations
nrow(neuse)
## [1] 31777
# Table of variable values (Flow_cd)
table(neuse$Flow_cd)
## 
##     A   A e 
## 31635   142
# Create dataframe from the table (shows number of observations per Flow_cd value)
confidence <- as.data.frame(table(neuse$Flow_cd))

# Rebane columns
colnames(confidence) <- c("Category", "Number")

# Create variable that shows % of observations
confidence$Percent <- (confidence$Number/nrow(neuse))*100
confidence$Percent <- round(confidence$Percent,2) # Round 

confidence
##   Category Number Percent
## 1        A  31635   99.55
## 2      A e    142    0.45
# Plot a pie chart (for fun)
pie(confidence$Percent, 
    labels=confidence$Category, 
    col=c("cornflowerblue", "lightblue", "darkorange", "orange"), 
    cex=0.7)


Using functions


Summary statistics provide another quick way to examine our data for peculiarities. Here, we’ll outline the process for computing min, max, mean, and percentiles from our data. There are many ways to accomplish this task. We will explore 2 methods: functions and piping (dplyr).


Functions allow you to program a process you wish to repeat many times. Typically you call on this function to perform some task on some data.


Let’s make a function that:


  1. Creates a dataframe to store the output from our function once it runs.
    • A data frame is a matrix of rows and columns
    • Here we will create a data frame with 8 rows and 4 columns
  2. Gives the data frame meaningful column headers using colnames
    • We want to fill in the first column of data with the names of the statistics we wish to run.


The form of a function is: function name = function(parameter1, parameter2, ...) {the tasks to perform}

# Create an empty dataframe
sum.stats <- as.data.frame(matrix(nrow=8, ncol=4))

# Rename columns
colnames(sum.stats) <- c("Statistics","p1930-2017","p1930-1979","p1984-2017")

#Fill in first column
sum.stats$Statistics <- c("Min",
                          "10th percentile",
                          "25th percentile",
                          "Median",
                          "Mean",
                          "75th percentile", 
                          "90th percentile",
                          "Max")

# Function to fill in other columns #
#####################################
# Our function has 2 parameters:
# The first parameter is the data we want to go into the function
# The second is the column we want to fill in our dataframe
gen_stats <- function(data, column.name) {
  
  # Task: Run a stat and place that value into a specific row and column. 
  sum.stats[1,column.name] <- min(data$Flow_cms)            
  sum.stats[2,column.name] <- quantile(data$Flow_cms, 0.10)    
  sum.stats[3,column.name] <- quantile(data$Flow_cms, 0.25)
  sum.stats[4,column.name] <- median(data$Flow_cms)            
  sum.stats[5,column.name] <- mean(data$Flow_cms)
  sum.stats[6,column.name] <- quantile(data$Flow_cms, 0.75)
  sum.stats[7,column.name] <- quantile(data$Flow_cms, 0.90)
  sum.stats[8,column.name] <- max(data$Flow_cms)             
  
  #we return the dataframe. This gives us access to the data frame outside of the function
  return(sum.stats)
}


# Call the function for 1930-2017
sum.stats <- gen_stats(data = neuse, column.name = 'p1930-2017')

# Now we can subset the data and rerun function before and after Falls Lake #
#############################################################################
# Before 1980
neuse.pre1980 <- subset(neuse, Date <= "1979-12-31")
summary(neuse.pre1980$Date)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "1930-10-01" "1943-01-23" "1955-05-17" "1955-05-17" "1967-09-08" 
##         Max. 
## "1979-12-31"
# After 1980
neuse.post1984 <- subset(neuse, Date >= "1984-01-01")
summary(neuse.post1984$Date)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "1984-01-01" "1992-06-08" "2000-11-15" "2000-11-15" "2009-04-23" 
##         Max. 
## "2017-09-30"
# Call the function to calculate summary statistics
sum.stats <- gen_stats(data = neuse.pre1980, column.name = 'p1930-1979')
sum.stats <- gen_stats(data = neuse.post1984, column.name = 'p1984-2017')

#round values
sum.stats$`p1930-2017` <- round(sum.stats$`p1930-2017`, 3)
sum.stats$`p1930-1979` <- round(sum.stats$`p1930-1979`, 3)
sum.stats$`p1984-2017` <- round(sum.stats$`p1984-2017`, 3)

sum.stats
##        Statistics p1930-2017 p1930-1979 p1984-2017
## 1             Min      1.274      1.274      2.973
## 2 10th percentile      5.890      4.955      7.249
## 3 25th percentile      8.523      8.523      8.750
## 4          Median     14.866     16.452     13.309
## 5            Mean     31.460     33.085     29.432
## 6 75th percentile     33.980     34.263     32.848
## 7 90th percentile     83.252     84.384     81.553
## 8             Max    637.129    637.129    557.842

Using dplyr and pipes


dplyr and pipes Pipes create a chain of commands that build on the previous command. There are two libraries you need to do this: dplyr and magrittr. The symbol %>% is the pipe that connects a chain of commands together.

  • In this example we will create a new data frame, sum.stats2 to hold the summary statistics.


  • We will then perform a series of commands on the neuse data. Essentially we are saying to grab the neuse data and then calculate the following variables in the sum.stats2 frame.


  • Repeat the commands while subsetting the data to different time periods using the group_by() command.
# Create dataframe
sum.stats2 <- as.data.frame(matrix(nrow=8, ncol=4))
colnames(sum.stats2) <- c("Statistics","p1930-2017","p1930-1980","p1984-2017")

#Fill in first column
sum.stats2$Statistics <- c("Min","p10","p25","Median","Mean","p75", "p90","Max")

sum.stats2 <- neuse %>%
              summarize(min = min(Flow_cms, na.rm=TRUE),
                        p10 = quantile(Flow_cms, 0.10, na.rm=TRUE),
                        p25 = quantile(Flow_cms, 0.25, na.rm=TRUE),
                        median = median(Flow_cms, na.rm=TRUE),
                        mean = mean(Flow_cms, na.rm=TRUE),
                        p75 = quantile(Flow_cms, 0.75, na.rm=TRUE),
                        p90 = quantile(Flow_cms, 0.90, na.rm=TRUE),
                        max = max(Flow_cms, na.rm=TRUE))

sum.stats2 <- as.data.frame(sum.stats2)

# Repeat for the other time periods. Use the 'filter' command:
sum.stats2[2,] <- neuse %>%
                  filter(Date<="1979-12-31") %>%
                  summarize(min = min(Flow_cms, na.rm=TRUE),
                            p10 = quantile(Flow_cms, 0.10, na.rm=TRUE),
                            p25 = quantile(Flow_cms, 0.25, na.rm=TRUE),
                            median = median(Flow_cms, na.rm=TRUE),
                            mean = mean(Flow_cms, na.rm=TRUE),
                            p75 = quantile(Flow_cms, 0.75, na.rm=TRUE),
                            p90 = quantile(Flow_cms, 0.90, na.rm=TRUE),
                            max = max(Flow_cms, na.rm=TRUE))

sum.stats2[3,] <- neuse %>%
                  filter(Date>="1984-01-01") %>%
                  summarize(min = min(Flow_cms, na.rm=TRUE),
                            p10 = quantile(Flow_cms, 0.10, na.rm=TRUE),
                            p25 = quantile(Flow_cms, 0.25, na.rm=TRUE),
                            median = median(Flow_cms, na.rm=TRUE),
                            mean = mean(Flow_cms, na.rm=TRUE),
                            p75 = quantile(Flow_cms, 0.75, na.rm=TRUE),
                            p90 = quantile(Flow_cms, 0.90, na.rm=TRUE),
                            max = max(Flow_cms, na.rm=TRUE))

# Reshape and round tables #
############################
# Reshape using 't'
final.stats <- as.data.frame(t(sum.stats2))

# Round numbers
final.stats <- round(final.stats, 3)

# Change row names to column names
final.stats$Statistics <- row.names(final.stats)

# Reorder dataframe
final.stats <- final.stats[,c(4,1,2,3)]
colnames(final.stats) <- c("Statistics", "p1930-2017", "p1930-1980", "p1984-2017")

final.stats  
##        Statistics p1930-2017 p1930-1980 p1984-2017
## min           min      1.274      1.274      2.973
## p10           p10      5.890      4.955      7.249
## p25           p25      8.523      8.523      8.750
## median     median     14.866     16.452     13.309
## mean         mean     31.460     33.085     29.432
## p75           p75     33.980     34.263     32.848
## p90           p90     83.252     84.384     81.553
## max           max    637.129    637.129    557.842


Looking at Seasonal Variation?


Here we will look at the seasonal variation in streamflow using two methods: pipes and the for loop. for loops are great when you need to repeat a process multiple times.

dplyr and pipes We will use the new command group_by(). This command allows us to run our summary statistics based on particular groups of data, in this case by months.


# Add year and month values
neuse$Year <- year(neuse$Date);  neuse$Month <- month(neuse$Date)
summary(neuse)  
##   agency_cd           site_no             dateTime         
##  Length:31777       Length:31777       Min.   :1930-10-01  
##  Class :character   Class :character   1st Qu.:1952-07-01  
##  Mode  :character   Mode  :character   Median :1974-04-01  
##                                        Mean   :1974-04-01  
##                                        3rd Qu.:1995-12-31  
##                                        Max.   :2017-09-30  
##       Flow         Flow_cd             tz_cd              Flow_cms      
##  Min.   :   45   Length:31777       Length:31777       Min.   :  1.274  
##  1st Qu.:  301   Class :character   Class :character   1st Qu.:  8.523  
##  Median :  525   Mode  :character   Mode  :character   Median : 14.866  
##  Mean   : 1111                                         Mean   : 31.460  
##  3rd Qu.: 1200                                         3rd Qu.: 33.980  
##  Max.   :22500                                         Max.   :637.129  
##       Date                 Year          Month       
##  Min.   :1930-10-01   Min.   :1930   Min.   : 1.000  
##  1st Qu.:1952-07-01   1st Qu.:1952   1st Qu.: 4.000  
##  Median :1974-04-01   Median :1974   Median : 7.000  
##  Mean   :1974-04-01   Mean   :1974   Mean   : 6.523  
##  3rd Qu.:1995-12-31   3rd Qu.:1995   3rd Qu.:10.000  
##  Max.   :2017-09-30   Max.   :2017   Max.   :12.000
# Run dplyr::group_by
month.flow1 <- neuse %>%
               group_by(Month) %>%
               summarise(p1930to2017 = mean(Flow_cms, na.rm=T)) %>%  
               round(3)

month.flow2 <- neuse %>%
               filter(Date<="1979-12-31") %>%
               group_by(Month) %>%
               summarise(p1930to1980 = mean(Flow_cms, na.rm=T)) %>%  
               round(3)

month.flow3 <- neuse %>%
               filter(Date>="1984-01-01") %>%
               group_by(Month) %>%
               summarise(p1984to2017 = mean(Flow_cms, na.rm=T)) %>%  
               round(3)

#create dataframe and bind 3 tables together
month.flow <- as.data.frame(cbind(month.flow1, 
                                  month.flow2[, 2], # only add second column
                                  month.flow3[, 2])) 


for loops Here we want to loop through each month and calculate the summary statistics. The format for the for loop is: for (j in 1:n){ do something ... }, where j is a count variable. It will be 1 the first time the loop runs, 2 the second time, etc. n is the number of times you want to run the loop. The task to perform is in between the {}.


# Set up data frame
month.flow <- as.data.frame(matrix(nrow=0,ncol=4));                 
colnames(month.flow) <- c("Month","p1930to2017","p1930to1980","p1984to2017")

# Create month/year variable
neuse$Month <- month(neuse$Date)
neuse$Year <- year(neuse$Date)

# unique() pulls out the unique values in a column
unique.month <- unique(neuse$Month)

# Set up for loop to run for as many unique months as you have
for (j in 1:length(unique.month)) {
  
  # Subset data to month of interest
  zt <- subset(neuse, Month == unique.month[j]) #subset data for month 'j'
  
  # Further subset data by year
  zt.early <- subset(zt, Year <= 1979)            
  zt.late <- subset(zt, Year >= 1984)
  
  # Fill in dataframe
  month.flow[j,1] <- unique.month[j];                                #month
  month.flow[j,2] <- round(mean(zt$Flow_cms, na.rm=TRUE),3)          #period of record
  month.flow[j,3] <- round(mean(zt.early$Flow_cms, na.rm=TRUE),3);   #pre 1979 data
  month.flow[j,4] <- round(mean(zt.late$Flow_cms, na.rm=TRUE),3)     #post 1984 data
}

month.flow
##    Month p1930to2017 p1930to1980 p1984to2017
## 1     10      17.832      16.937      20.176
## 2     11      21.101      22.382      20.613
## 3     12      29.342      30.978      26.724
## 4      1      45.487      50.574      38.709
## 5      2      53.087      60.095      43.548
## 6      3      55.650      56.668      53.067
## 7      4      44.792      46.461      42.960
## 8      5      28.041      28.643      27.113
## 9      6      21.439      20.182      21.309
## 10     7      21.166      23.410      18.881
## 11     8      20.522      23.392      17.240
## 12     9      20.397      19.517      23.180
# Reorder from water year to calendar year
month.flow <- arrange(month.flow, Month) #automatically sorts ascending

# Reshape month flow for ggplot (make long)
month.flow.long <- data.frame() # empty dataframe
for (g in c('p1930to2017', 'p1930to1980', 'p1984to2017')) {
  
  # Keep slice of dataframe
  tdf <- month.flow[c('Month', g)]
  
  # Create label
  tdf$group <- g
  
  # Rename columns
  names(tdf) <- c('Month', 'flow_cms', 'group')
  
  # Bind to month.flow.long
  month.flow.long <- rbind(month.flow.long, tdf)
}

# Plot results
plot <- 
ggplot(data = month.flow.long,
       aes(x = Month, y = flow_cms, colour = group)) +
  geom_point(size = 2) +
  geom_line() +
  labs(x = '', y = 'Mean streamflow (cms)', colour = 'Time period')

plot



Load multiple sites for analysis and plot on Leaflet

Let’s say you want to know where all stream gauges are in North Carolina. You can use the USGS NWIS website or you can plot the sites in R. Here, we call on all site data in North Carolina and use leaflet() to map sites.

# What sites are available in NC?
nc.sites <- readNWISdata(stateCd="NC", 
                         parameterCd = pcode, 
                         service = "site", 
                         seriesCatalogOutput=TRUE)
length(unique(nc.sites$site_no)) #number of sites in NC
## [1] 740
# Filter data
nc.sites2 <- filter(nc.sites, parm_cd %in% pcode) %>%        # this data if have discharge
             filter(stat_cd %in% scode) %>%                  # for mean daily day
             filter(begin_date <= "1950_01_01") %>%  
             filter(end_date >= "2010_01_01") %>%  #with data in these years
             # Create variable period: how many days of data are collected?
             mutate(period = as.Date(end_date) - as.Date(begin_date)) %>%  
             filter(period > 30*365)         #only keep sites with 30+ years of data

length(unique(nc.sites2$site_no))
## [1] 67
# Where are sites located?
# On this data...
leaflet(data=nc.sites2) %>%
# ...add this map background (can change)
addProviderTiles("CartoDB.Positron") %>% 
# ...add circle markers based on lat/long
addCircleMarkers(~dec_long_va,~dec_lat_va,
# ... add attributes
color = "red", 
radius=3, 
stroke=FALSE,
fillOpacity = 0.8, 
opacity = 0.8,
popup=~station_nm)


That’s a lot of data. Let’s look at discharge over time for all stations in the the Upper Neuse River basin upstream of Falls Lake. You know the HUC code is 03020201.

# Let's focus on the upper Neuse upstream of Falls Lake
upper.neuse <- subset(nc.sites2, huc_cd=="03020201")

# How many unique sites are there?
unique.sites <- unique(upper.neuse$site_no) 
length(unique.sites)
## [1] 6
# Load in and plot data #
#########################
# Create a function to load data
load_NWIS_data <- function(site_number) {
  
  print(paste0('Loading data for site: ', site_number))
  
  df <- readNWISdv(siteNumbers = site_number, 
                   parameterCd = pcode, 
                   statCd = scode)
  
  df <- renameNWISColumns(df)
  df$Flow_cms <- df$Flow*0.028316847 # flow in cms
  
  return(df)
}

# Set dataframe of site names (kind of like a dictionary)
site_names <- unique(upper.neuse[c('site_no', 'station_nm')])

# Create a function to plot the data
plot_by_site <- function(input.data, site_number) {
  
  print(paste0('Plotting for site: ', site_number))
  
  plot <-
    ggplot(data = input.data,
           aes(x = Date, y = Flow_cms)) +
    geom_line() +
    labs(x = '', 
         y = 'Stream flow (cms)',
         title = site_names$station_nm[site_names$site_no == site_number])
  
  return(plot)
    
}

# Run plot in for loop
for (i in unique.sites) {
  
  plot <- 
    load_NWIS_data(site_number = i) %>%
    plot_by_site(site_number = i)
  
  assign(paste0('plot_', i), plot, envir = parent.frame())
  
}
## [1] "Loading data for site: 02085000"
## [1] "Plotting for site: 02085000"
## [1] "Loading data for site: 02085500"
## [1] "Plotting for site: 02085500"
## [1] "Loading data for site: 02086500"
## [1] "Plotting for site: 02086500"
## [1] "Loading data for site: 02087500"
## [1] "Plotting for site: 02087500"
## [1] "Loading data for site: 02088000"
## [1] "Plotting for site: 02088000"
## [1] "Loading data for site: 02088500"
## [1] "Plotting for site: 02088500"
# We now have a lot of plots!
# Try calling them...