Exploring trends in streamflow over time

Background

Water security is becoming increasingly important as population and water demand continue to grow. This is especially true with changing climate conditions that introduce new variability into our expectations of water supply. Briefly, we want to know whether the average annual streamflow has changed over time.

Prep the workspace by installing (if needed) and loading libraries

The method for installing and loading libraries, as well as downloading data from the USGS, are explained in the LoadStreamflowDescription.Rmd file.

#install.packages("dataRetrieval", repos=c("http://owi.usgs.gov/R", getOption("repos")))
library(dataRetrieval); library (ggplot2); library(EGRET)
library(dplyr); library(magrittr); library(lubridate)
library(trend);  #calculates trends using non-parametric formulas
library(leaflet)

Download data from NWIS

#Identify gauge to download
siteNo <- '02087500' #don't forget to add the zero if it is missing

#Identify parameter of interest: 
pcode = '00060' #discharge (cfs)

#Identify statistic code for daily values: 
scode = "00003"  #mean

#Identify start and end dates
start.date = "1930-10-01"
end.date = "2017-09-30"

#Load in data
neuse <- readNWISdv(siteNumbers = siteNo, parameterCd = pcode, statCd = scode, startDate=start.date, endDate=end.date)
#rename columns to something more useful
  neuse <- renameNWISColumns(neuse); colnames(neuse)

## [1] "agency_cd" "site_no"   "Date"      "Flow"      "Flow_cd"

#Create cms column to plot cubic meters per second
neuse$Flow_cms <- neuse$Flow*0.028316847

Aggregate Daily Data to Monthly and Annual for Trend Analysis

There is a lot of noise in daily streamflow data that makes it hard to pull out trends. Often, when looking at long-term trends, the data are aggregated by month and/or year. We will use pipes to aggregate the data by month and year. See previous questions for a brief explanation on how pipes work.

#Look at trends based on monthly total streamflow
neuse$Year <- year(neuse$Date);   neuse$Month <- month(neuse$Date)
neuse$WaterYear <- ifelse(neuse$Month>=10,neuse$Year+1,neuse$Year)

#Use pipes to summarize by month
month.flow <- neuse %>%
  group_by(Year, Month) %>%
  summarise(Total_cms = sum(Flow_cms, na.rm=T), n=n()) %>%  round(3)
month.flow <- as.data.frame(month.flow);  

#lets also look at water year
annual.flow <- neuse %>%
  group_by(WaterYear) %>%
  summarise(Total_cms = sum(Flow_cms, na.rm=T), n=n()) %>%  round(3)
annual.flow <- as.data.frame(annual.flow);  
#remove those missing more than 10%
annual.flow <- subset(annual.flow, n>=330)

Plot the annual trends over time and fit linear and Mann Kendall regressions

In excel, we were able to run a linear regression on annual streamflow trends. Let’s replicate that activity in R now.

############################################
# linear regression
############################################
linear = lm(Total_cms ~ WaterYear , data = annual.flow);  summary(linear)

## 
## Call:
## lm(formula = Total_cms ~ WaterYear, data = annual.flow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7341.4 -2997.8  -498.1  2606.9 10508.2 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 64953.09   35324.48   1.839   0.0694 .
## WaterYear     -27.08      17.89  -1.514   0.1338  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4191 on 85 degrees of freedom
## Multiple R-squared:  0.02624,    Adjusted R-squared:  0.01479 
## F-statistic: 2.291 on 1 and 85 DF,  p-value: 0.1338

  x.est <- as.data.frame(seq(1931,2020,1)); colnames(x.est)<-"WaterYear"
  y.est <- as.data.frame(predict(linear,x.est, interval="confidence"))

Exercise 1: Plot points and linear regression

Plot the total flow by water year
Add a line with the linear regression

Oftentimes, time series data (such as water quantity or water quality) do not meet the assumptions of linear regressions. We won’t go into those here, but for a list, see http://r-statistics.co/Assumptions-of-Linear-Regression.html. It is common for streamflow trends to be assessed using non-parametric methods, such as the Mann-Kendall (MK) test. The MK test is appropriate for identifying where changes show a monotonic increase or decrease over time, and whether that change is significant. The MK test is often paired with Theil-Sen line to estimate the slope of the trend. This method is robust against outliers and essentially calculates the median of the slope between all pairs of sample points within your data. We calculate the MK test to get the significance of the trend and the Theil-Sen test to get the slope, intercept and confidence interval for plotting the data.

The MK and Sen test are located in the trend library. Make sure you have installed and loaded this library. Both tests require the data to be provided as a time series object. The function ts(data column of interest, time of first observation, number of observations per unit of time) is used to create a time-series object. This function assumes the data are at equi-distant units of time.
- For example, if you are looking at annual data you would have a frequency of 1, because there is one point per unit of time. ts(data,1930,1).
- Or if you wanted to look at the annual trend but include monthly values you would have 12 observations per year (natural time period). In this case, ts(data, 1930.083, 12). Where 1/12 = 0.083 and there are 12 months in each year.
To plot the regression we need to use the regression to estimate the trend for each point in time. Because the Sen test provides confidence intervals, we can also show the range of possibilities for the trend line.
Plot the confidence interval as a polygon and the estimate as a line. The polygon() command essentially draws a polygon based on polygon(x vectors, y vectors). Oftentimes, drawing a polygon looks like polygon(c(x,rev(x)), c(y2,rev(y1))). The first half of the x-vector is the value of x, corresponding to the part of the polygon tracing the upper curve. The second part is the reverse rev of x, corresponding to the part of the polygon that is tracing out the lower curve along the decreasing value of x. Similarly, the first part of the polygon (e.g. higher values) and the second part is the reverse (e.g lower values) of the polygon.

############################################  
# Mann Kendall test
############################################
temp.ts<-ts(annual.flow$Total_cms, start=min(annual.flow$WaterYear, freq=1));  
  mk <- mk.test(temp.ts); mk

## Mann-Kendall Test
##  
## two-sided homogeinity test
## H0: S = 0 (no trend)
## HA: S != 0 (monotonic trend)
##  
## Statistics for total series
##      S     varS    Z    tau  pvalue
## 1 -401 74404.33 -1.5 -0.107 0.14253

  sen <- sens.slope(temp.ts); sen

##  
## Sen's slope and intercept
##  
##  
## slope:  -27.8002
## 95 percent confidence intervall for slope
## 10.0201 -65.247
##  
## intercept: 12216.0641
## nr. of observations: 87

  #sen$b.sen*sen$nobs;
  #sen$b.sen.up*sen$nobs;  sen$b.sen.lo*sen$nobs
  #sen$intercept  
  
# create lines to plot on graph
  #y-axis. We are using the equation y=a+b*x, where a is the intercept and b is the slope
  estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(annual.flow)[1]-1,1)),1)
  #upper confidence interval
  upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(annual.flow)[1]-1,1)),1)
  #lower confidence interval
  lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(annual.flow)[1]-1,1)),1)

#Plot Annual Flow over time
par(mar = c(5,5,3,5)) #set plot margins
plot(annual.flow$WaterYear, annual.flow$Total_cms, type='n', yaxt="n",
     ylab="Total Streamflow (cms)", xlab = '', cex.lab=0.9)
axis(2, las=2, cex.axis=0.8)
box()
#add data to the plot
  points(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8), cex=0.8, pch=19)  
  lines(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8))  

#Add linear regression to plot
  lines(x.est$WaterYear, y.est$fit, col="red", lty=2);
  
#draw a polygon between the lower and upper confidence interval  
  polygon(c(rev(annual.flow$WaterYear), (annual.flow$WaterYear)), c(rev(lowerc1), (upperc1)),  col=rgb(0,0,0.2,0.2), border=NA)
  lines(annual.flow$WaterYear, estimate, lty=2, col=rgb(0,0,.2,1))
  legend("bottomleft",c("Annual Flow", "Linear Regression", "MK regression", "MK confidence interval"), col=c("black","red",rgb(0,0,0.2),rgb(0,0,0.2,0.2)),
         pch=c(19,NA,NA,22), pt.cex=c(0.8,1,1,2), lty=c(1,2,2,NA), pt.bg=c(NA,NA,NA,rgb(0,0,0.2,0.2)), cex=0.8)

Based on this plot, you can see that it is likely streamflow has decreased over time. However, we are not 100% confident that is true.

Does the trend change before or after Falls Lake was built?

Here, we will repeat the MK and Sen tests for subsets of the data (before Falls Lake, after Falls Lake) and plot those data on the same graph.

#create the plot
par(mar = c(5,5,3,5)) #set plot margins
  plot(annual.flow$WaterYear, annual.flow$Total_cms, type='n', yaxt="n", ylim=c(0,max(annual.flow$Total_cms, na.rm=TRUE)),
       ylab="Total Streamflow (cms)", xlab = '', cex.lab=0.9)
  axis(2, las=2, cex.axis=0.8)
  
  #add the data
  points(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8), cex=0.8, pch=19)  
  lines(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8))  
  #draw a rectangle around the period Falls Lake was built (x1,y1,x2,y2)
  rect(1980,0,1984,100000, col=rgb(0.7,0,0,0.2), border=NA)

#Subset and run the MK test
######################################
# Pre Falls Lake
#######################################
pre.falls <- subset(annual.flow, WaterYear<=1980)
  #create time series
  temp.ts<-ts(pre.falls$Total_cms, start=min(pre.falls$WaterYear, freq=1));  
  mk <- mk.test(temp.ts); mk

## Mann-Kendall Test
##  
## two-sided homogeinity test
## H0: S = 0 (no trend)
## HA: S != 0 (monotonic trend)
##  
## Statistics for total series
##     S     varS    Z    tau  pvalue
## 1 -77 14291.67 -0.6 -0.063 0.52495

  sen <- sens.slope(temp.ts); sen

##  
## Sen's slope and intercept
##  
##  
## slope:  -25.9626
## 95 percent confidence intervall for slope
## 61.8574 -107.4761
##  
## intercept: 12148.8019
## nr. of observations: 50

  # create lines to plot on graph
  estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(pre.falls)[1]-1,1)),1)
  upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(pre.falls)[1]-1,1)),1)
  lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(pre.falls)[1]-1,1)),1)
  
  #plot the polygon and line
  polygon(c(rev(pre.falls$WaterYear), (pre.falls$WaterYear)), c(rev(lowerc1), (upperc1)),  col=rgb(0,0,0.2,0.2), border=NA)
    lines(pre.falls$WaterYear, estimate, lty=2, col=rgb(0,0,.2,1))

######################################
# Post Falls Lake
#######################################
post.falls <- subset(annual.flow, WaterYear>=1984)
    #create time series
    temp.ts<-ts(post.falls$Total_cms, start=min(post.falls$WaterYear, freq=1));  
    mk <- mk.test(temp.ts); mk

## Mann-Kendall Test
##  
## two-sided homogeinity test
## H0: S = 0 (no trend)
## HA: S != 0 (monotonic trend)
##  
## Statistics for total series
##   S     varS Z   tau  pvalue
## 1 3 4550.333 0 0.005 0.97635

    sen <- sens.slope(temp.ts); sen

##  
## Sen's slope and intercept
##  
##  
## slope:  1.9004
## 95 percent confidence intervall for slope
## 171.663 -178.9858
##  
## intercept: 10195.6163
## nr. of observations: 34

    # create lines to plot on graph
    estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(post.falls)[1]-1,1)),1)
    upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(post.falls)[1]-1,1)),1)
    lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(post.falls)[1]-1,1)),1)
    
    #plot the polygon and line on the graph
    polygon(c(rev(post.falls$WaterYear), (post.falls$WaterYear)), c(rev(lowerc1), (upperc1)),  col=rgb(0,0,0.2,0.2), border=NA)
      lines(post.falls$WaterYear, estimate, lty=2, col=rgb(0,0,.2,1))

What do you notice? This is often a challenge in looking at streamflow trends because the time period assessed can drastically change your results.

Are there seasonal trends?

Streamflow is greatly influenced by climate patterns and shifts in climate can occur seasonally. Let’s look at each month to see what patterns may emerge. To do this, we are going to create a page for 12 plots to sit on in 3 rows and 4 columns. We will then use a for loop to subset the data, perform the trend test, and create a plot for each month of the year.

#Create dataframe
month.stats <- as.data.frame(matrix(nrow=0,ncol=5)); 
#create meaningul column names
colnames(month.stats) <- c("Month", "Pval","Intercept","Slope","Nobs");

#set up plots frame
par(mfrow=c(3,4))     #3 rows and 4 columns
par(mar = c(2,4,2,1)) #set plot margins (how much white space around each plot) (bottom, left, top right)
      
#loop through each month  
for (i in 1:12) {
  zt <- subset(month.flow, Month==i)       #subset data by momth
  
  temp.ts<-ts(zt$Total_cms, start=min(zt$Year, freq=1));   #create time series
  mk <- mk.test(temp.ts); mk                             #mk test
  sen <- sens.slope(temp.ts); sen                        #sen test
  
  #fill dataframe so months can be directly compared
  month.stats[i,1] <- i;             month.stats[i,2]<-mk$pvalg;            month.stats[i,3]<-sen$intercept
  month.stats[i,4] <- sen$b.sen;     month.stats[i,5]<-sen$nobs;
  
  # create lines to plot on graph
  estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(zt)[1]-1,1)),1)
  upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(zt)[1]-1,1)),1)
  lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(zt)[1]-1,1)),1)

  #plot each month
  plot(zt$Year, zt$Total_cms, type='n', yaxt="n", main=i, ylim=c(0,5000),    #good practice to keep y-axis the same if possible on multiple charts you are comparing
       ylab="", xlab = '', cex.lab=0.9)
  axis(2, las=2, cex.axis=0.8)
  points(zt$Year, zt$Total_cms, col=rgb(0,0,0,0.8), cex=0.8, pch=19)  
  lines(zt$Year, zt$Total_cms, col=rgb(0,0,0,0.2))  
  polygon(c(rev(zt$Year), (zt$Year)), c(rev(lowerc1), (upperc1)),  col=rgb(0,0,0.2,0.2), border=NA)
    lines(zt$Year, estimate, lty=2, col=rgb(0,0,.2,1))
}

#mtext('Total Flow (cms)', side=2, line=-3, outer=TRUE)
month.stats

##    Month       Pval Intercept      Slope Nobs
## 1      1 0.21259403 1358.3553 -4.3194231   87
## 2      2 0.04151639 1736.0924 -7.0708824   87
## 3      3 0.54281095 1559.8907 -2.7467500   87
## 4      4 0.09031758 1375.6212 -5.3923966   87
## 5      5 0.79746829  643.1237 -0.4451803   87
## 6      6 0.80313427  514.6020  0.2348293   87
## 7      7 0.44137408  495.1974 -1.1534333   87
## 8      8 0.21259403  519.5464 -1.7904375   87
## 9      9 0.25575488  270.0089  1.2552429   87
## 10    10 0.06679728  229.8388  1.6588333   87
## 11    11 0.87762946  368.9823  0.2083333   87
## 12    12 0.37893674  747.1815 -1.7405333   87

Exercise 2: Interpreting results

Use filter/subset to answer th following questions:

How many months have a positive trend? A negative trend?
How many of those trends are significant at the 0.05 level?, The 0.10 level?
Which months have significant trends?

How does the annual trend compare with sites across North Carolina?

Often, it is good practice to put your results in the context of a broader geography (stream reaches, river basin, state). Here we are going to run the analysis for sites across North Carolina and plot the results.

First, find sites that meet your criteria in NC
Next, create a data frame to hold the stats for each site. Then create a for loop that reads in the site data, aggregates to annual flow, and calculates the mk and sen test.

#What sites are available in NC?
nc.sites <- readNWISdata(stateCd="NC", parameterCd = pcode, service = "site", seriesCatalogOutput=TRUE)
  length(unique(nc.sites$site_no));   #tells how many sites are present

## [1] 732

#filter data using pipes!
nc.sites2 <- filter(nc.sites, parm_cd %in% pcode) %>%                             #only sites with discharge
  filter(stat_cd %in% scode) %>%                                                 #mean daily values
  filter(begin_date <= "1950_01_01") %>%  filter(end_date >= "2010_01_01") %>%   #in this range
  mutate(period = as.Date(end_date) - as.Date("1970-01-01")) %>%                #how many days of data
  filter(period > 30*365)   #keep those with 30 years of data 

length(unique(nc.sites2$site_no))

## [1] 67

#calculate unique sites
unique.sites <- unique(nc.sites2$site_no)

#set up data frame  
stats <- as.data.frame(matrix(nrow=0,ncol=7));        
#Add column headers
colnames(stats) <- c("Site", "Pval","Intercept","Slope","Nobs","StartYr","EndYr"); 
start.date = "1970-10-01"
end.date = "2017-09-30"

#Loop through each site and calculate statistics
for (i in 1:length(unique.sites)){
  #read in data
  zt <- readNWISdv(siteNumbers = unique.sites[i], parameterCd = pcode, statCd = scode, startDate=start.date, endDate = end.date)
    zt <- renameNWISColumns(zt);
  zt$Flow_cms <- zt$Flow*0.028316847
  zt$Year <- year(zt$Date)
    
  #use pipes to aggregate flow by year
    annual.flow <- zt %>%  group_by(Year) %>%
      summarise(Total_cms = sum(Flow_cms, na.rm=T), n=n()) %>%  round(3)
    annual.flow <- as.data.frame(annual.flow);  
  
  #run trend test
    temp.ts<-ts(annual.flow$Total_cms, start=min(annual.flow$Year, freq=1));
    mk <- mk.test(temp.ts); 
    sen <- sens.slope(temp.ts); 
    
  #fill dataframe
    stats[i,1] <- as.character(unique.sites[i]);             stats[i,2]<-round(mk$pvalg,3);            stats[i,3]<-round(sen$intercept,3)
    stats[i,4] <- round(sen$b.sen,3);                        stats[i,5]<-sen$nobs;                     stats[i,6]<-min(zt$Year)
    stats[i,7] <- max(zt$Year);
    
    #print(i)
}
summary(stats)

##      Site                Pval          Intercept            Slope         
##  Length:67          Min.   :0.0020   Min.   :   106.9   Min.   :-780.556  
##  Class :character   1st Qu.:0.0685   1st Qu.:  1785.8   1st Qu.: -59.303  
##  Mode  :character   Median :0.2120   Median :  4428.6   Median : -16.717  
##                     Mean   :0.3197   Mean   : 11595.7   Mean   : -44.084  
##                     3rd Qu.:0.5125   3rd Qu.: 13756.3   3rd Qu.:  -4.735  
##                     Max.   :1.0000   Max.   :100958.7   Max.   : 884.029  
##       Nobs          StartYr         EndYr     
##  Min.   : 4.00   Min.   :1970   Min.   :2016  
##  1st Qu.:47.00   1st Qu.:1970   1st Qu.:2017  
##  Median :48.00   Median :1970   Median :2017  
##  Mean   :43.58   Mean   :1973   Mean   :2017  
##  3rd Qu.:48.00   3rd Qu.:1970   3rd Qu.:2017  
##  Max.   :48.00   Max.   :2013   Max.   :2017

#remove sites with less than 30 years observations
stats <- subset(stats, Nobs>=30); dim(stats)

## [1] 59  7

Plot trends across the state using leaflet

I want to represent each site as a triangle. I want the triangle to point upwards for positive slopes and downwards for negative slopes. Furthermore, I want to color the triangle based on the direction (red color for negative and blue color for positive) and bold color for significant slopes with pastels for insignificant slopes. Leaflet does not have an automatic way to create such markers. Therefore, we need to create a function to make this icon. I did not write this function - I searched for someone else’s code who had already ran into this problem. This is a great way to learn how to code.

#Function to grab pch icons and transform to leaflet markers
pchIcons = function(pch = 1, width = 30, height = 30, bg = "transparent", col = "black", ...) {
  n = length(pch)
  files = character(n)
  # create a sequence of png images
  for (i in seq_len(n)) {
    f = tempfile(fileext = '.png')
    png(f, width = width, height = height, bg = bg)
    par(mar = c(0, 0, 0, 0))
    plot.new()
    points(.5, .5, pch = pch[i], col = col[i], cex = min(width, height) / 8, ...)
    dev.off()
    files[i] = f
  }
  files
}

Now that we know what the function looks like, we can create the variables I want to load into the function to draw symbols based on significance and slope. We will do this by adding a column to color by Direction and Significance.

Now, we call the function pchIcons into a new variable we created called iconFiles. When we plot the data into leaflet, we will include the variable iconFiles.

#Add a column with the pch (triangle direction) I want based on the slope
#(2 is an upward triangle, 6 a downward triangle)
stats$Direction <- ifelse(stats$Slope>0, 2, 6)    #shapes based on pch values
  stats$Direction <- ifelse(stats$Slope==0, 19, stats$Direction)
  table(stats$Direction)

## 
##  2  6 
##  6 53

stats$Significance <- ifelse(stats$Slope>0 & stats$Pval<=0.05, "blue","lightgrey")
  stats$Significance <- ifelse(stats$Slope>0 & stats$Pval>0.05, "lightblue",stats$Significance)
  stats$Significance <- ifelse(stats$Slope<0 & stats$Pval>0.05, "lightpink",stats$Significance)
  stats$Significance <- ifelse(stats$Slope<0 & stats$Pval<=0.05, "red",stats$Significance)
  table(stats$Significance)

## 
## lightblue lightpink       red 
##         6        40        13

iconFiles = pchIcons(stats$Direction, 20, 20, col = stats$Significance, lwd = 4)

Now I want to plot the data on leaflet. Notice, that the data I want to plot is in the stats table, while the data holding the location of the sites is in the nc.sites2 table. I need to merge these two tables together using merge(). This function works by: merge(table1, table2, by column in table 1, by column in table 2), where the columns in table 1 and table 2 should be a unique identifier that links the two tables together. In this case, it will be the site number.

#merge with site lat/long
stats.merge <- merge(stats, nc.sites2[,c(2,5:6,22:23)], by.x="Site", by.y="site_no"); dim(stats.merge)

## [1] 59 13

#Map results
library(leaflet)
leaflet(data=stats.merge) %>%                                       #data to plot
  addProviderTiles("CartoDB.Positron") %>%                          #creates background mpa
  addMarkers(~dec_long_va,~dec_lat_va,                              #adds markers with location
             icon = ~ icons(
               iconUrl = iconFiles,                                 #markers with triangles
               popupAnchorX = 20, popupAnchorY = 0
             ),
            popup=~paste0(Site,": ",StartYr,"-",EndYr)              #data in popup
  ) %>% addLegend("bottomright",                                    #add legend to map
                  values=c("red","lightpink","lightgrey","lightblue","blue"), color=c("red","lightpink","lightgrey", "lightblue","blue"),
                  labels= c("Significant Decrease","Insignificant Decrease", "No Change", "Insignificant Increase", "Significant Increase"),
                  opacity = 1)

Based on the map, what can you say about how annual streamflow volume has changed since 1970? Can you find your site?

Exercise 3: Run analysis for other regions

Run this analysis for a different time period (e.g. 1930 to present). How did the map change?
Plot mann kendall trends for the Neuse River Basin (looka t notes from LoadStreamflowDescription.Rmd on how to subset sites to a river basins. You will have to find the HUC for the Neuse).
Run this analysis for a different state.

Streamflow Trends

L. Patterson and J. Fay

Spring, 2018