Water security is becoming increasingly important as population and water demand continue to grow. This is especially true with changing climate conditions that introduce new variability into our expectations of water supply. Briefly, we want to know whether the average annual streamflow has changed over time.
The method for installing and loading libraries, as well as downloading data from the USGS, are explained in the LoadStreamflowDescription.Rmd
file.
#install.packages("dataRetrieval", repos=c("http://owi.usgs.gov/R", getOption("repos")))
library(dataRetrieval); library (ggplot2); library(EGRET)
library(dplyr); library(magrittr); library(lubridate)
library(trend); #calculates trends using non-parametric formulas
library(leaflet)
#Identify gauge to download
siteNo <- '02087500' #don't forget to add the zero if it is missing
#Identify parameter of interest:
pcode = '00060' #discharge (cfs)
#Identify statistic code for daily values:
scode = "00003" #mean
#Identify start and end dates
start.date = "1930-10-01"
end.date = "2017-09-30"
#Load in data
neuse <- readNWISdv(siteNumbers = siteNo, parameterCd = pcode, statCd = scode, startDate=start.date, endDate=end.date)
#rename columns to something more useful
neuse <- renameNWISColumns(neuse); colnames(neuse)
## [1] "agency_cd" "site_no" "Date" "Flow" "Flow_cd"
#Create cms column to plot cubic meters per second
neuse$Flow_cms <- neuse$Flow*0.028316847
There is a lot of noise in daily streamflow data that makes it hard to pull out trends. Often, when looking at long-term trends, the data are aggregated by month and/or year. We will use pipes to aggregate the data by month and year. See previous questions for a brief explanation on how pipes work.
#Look at trends based on monthly total streamflow
neuse$Year <- year(neuse$Date); neuse$Month <- month(neuse$Date)
neuse$WaterYear <- ifelse(neuse$Month>=10,neuse$Year+1,neuse$Year)
#Use pipes to summarize by month
month.flow <- neuse %>%
group_by(Year, Month) %>%
summarise(Total_cms = sum(Flow_cms, na.rm=T), n=n()) %>% round(3)
month.flow <- as.data.frame(month.flow);
#lets also look at water year
annual.flow <- neuse %>%
group_by(WaterYear) %>%
summarise(Total_cms = sum(Flow_cms, na.rm=T), n=n()) %>% round(3)
annual.flow <- as.data.frame(annual.flow);
#remove those missing more than 10%
annual.flow <- subset(annual.flow, n>=330)
In excel, we were able to run a linear regression on annual streamflow trends. Let’s replicate that activity in R now.
############################################
# linear regression
############################################
linear = lm(Total_cms ~ WaterYear , data = annual.flow); summary(linear)
##
## Call:
## lm(formula = Total_cms ~ WaterYear, data = annual.flow)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7341.4 -2997.8 -498.1 2606.9 10508.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 64953.09 35324.48 1.839 0.0694 .
## WaterYear -27.08 17.89 -1.514 0.1338
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4191 on 85 degrees of freedom
## Multiple R-squared: 0.02624, Adjusted R-squared: 0.01479
## F-statistic: 2.291 on 1 and 85 DF, p-value: 0.1338
x.est <- as.data.frame(seq(1931,2020,1)); colnames(x.est)<-"WaterYear"
y.est <- as.data.frame(predict(linear,x.est, interval="confidence"))
Oftentimes, time series data (such as water quantity or water quality) do not meet the assumptions of linear regressions. We won’t go into those here, but for a list, see http://r-statistics.co/Assumptions-of-Linear-Regression.html. It is common for streamflow trends to be assessed using non-parametric methods, such as the Mann-Kendall (MK) test. The MK test is appropriate for identifying where changes show a monotonic increase or decrease over time, and whether that change is significant. The MK test is often paired with Theil-Sen line to estimate the slope of the trend. This method is robust against outliers and essentially calculates the median of the slope between all pairs of sample points within your data. We calculate the MK test to get the significance of the trend and the Theil-Sen test to get the slope, intercept and confidence interval for plotting the data.
trend
library. Make sure you have installed and loaded this library. Both tests require the data to be provided as a time series object
. The function ts(data column of interest, time of first observation, number of observations per unit of time)
is used to create a time-series object. This function assumes the data are at equi-distant units of time.
ts(data,1930,1)
.ts(data, 1930.083, 12)
. Where 1/12 = 0.083 and there are 12 months in each year.To plot the regression we need to use the regression to estimate the trend for each point in time. Because the Sen test provides confidence intervals, we can also show the range of possibilities for the trend line.
Plot the confidence interval as a polygon and the estimate as a line. The polygon()
command essentially draws a polygon based on polygon(x vectors, y vectors)
. Oftentimes, drawing a polygon looks like polygon(c(x,rev(x)), c(y2,rev(y1)))
. The first half of the x-vector is the value of x, corresponding to the part of the polygon tracing the upper curve. The second part is the reverse rev of x, corresponding to the part of the polygon that is tracing out the lower curve along the decreasing value of x. Similarly, the first part of the polygon (e.g. higher values) and the second part is the reverse (e.g lower values) of the polygon.
############################################
# Mann Kendall test
############################################
temp.ts<-ts(annual.flow$Total_cms, start=min(annual.flow$WaterYear, freq=1));
mk <- mk.test(temp.ts); mk
## Mann-Kendall Test
##
## two-sided homogeinity test
## H0: S = 0 (no trend)
## HA: S != 0 (monotonic trend)
##
## Statistics for total series
## S varS Z tau pvalue
## 1 -401 74404.33 -1.5 -0.107 0.14253
sen <- sens.slope(temp.ts); sen
##
## Sen's slope and intercept
##
##
## slope: -27.8002
## 95 percent confidence intervall for slope
## 10.0201 -65.247
##
## intercept: 12216.0641
## nr. of observations: 87
#sen$b.sen*sen$nobs;
#sen$b.sen.up*sen$nobs; sen$b.sen.lo*sen$nobs
#sen$intercept
# create lines to plot on graph
#y-axis. We are using the equation y=a+b*x, where a is the intercept and b is the slope
estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(annual.flow)[1]-1,1)),1)
#upper confidence interval
upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(annual.flow)[1]-1,1)),1)
#lower confidence interval
lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(annual.flow)[1]-1,1)),1)
#Plot Annual Flow over time
par(mar = c(5,5,3,5)) #set plot margins
plot(annual.flow$WaterYear, annual.flow$Total_cms, type='n', yaxt="n",
ylab="Total Streamflow (cms)", xlab = '', cex.lab=0.9)
axis(2, las=2, cex.axis=0.8)
box()
#add data to the plot
points(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8), cex=0.8, pch=19)
lines(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8))
#Add linear regression to plot
lines(x.est$WaterYear, y.est$fit, col="red", lty=2);
#draw a polygon between the lower and upper confidence interval
polygon(c(rev(annual.flow$WaterYear), (annual.flow$WaterYear)), c(rev(lowerc1), (upperc1)), col=rgb(0,0,0.2,0.2), border=NA)
lines(annual.flow$WaterYear, estimate, lty=2, col=rgb(0,0,.2,1))
legend("bottomleft",c("Annual Flow", "Linear Regression", "MK regression", "MK confidence interval"), col=c("black","red",rgb(0,0,0.2),rgb(0,0,0.2,0.2)),
pch=c(19,NA,NA,22), pt.cex=c(0.8,1,1,2), lty=c(1,2,2,NA), pt.bg=c(NA,NA,NA,rgb(0,0,0.2,0.2)), cex=0.8)
Here, we will repeat the MK and Sen tests for subsets of the data (before Falls Lake, after Falls Lake) and plot those data on the same graph.
#create the plot
par(mar = c(5,5,3,5)) #set plot margins
plot(annual.flow$WaterYear, annual.flow$Total_cms, type='n', yaxt="n", ylim=c(0,max(annual.flow$Total_cms, na.rm=TRUE)),
ylab="Total Streamflow (cms)", xlab = '', cex.lab=0.9)
axis(2, las=2, cex.axis=0.8)
#add the data
points(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8), cex=0.8, pch=19)
lines(annual.flow$WaterYear, annual.flow$Total_cms, col=rgb(0,0,0,0.8))
#draw a rectangle around the period Falls Lake was built (x1,y1,x2,y2)
rect(1980,0,1984,100000, col=rgb(0.7,0,0,0.2), border=NA)
#Subset and run the MK test
######################################
# Pre Falls Lake
#######################################
pre.falls <- subset(annual.flow, WaterYear<=1980)
#create time series
temp.ts<-ts(pre.falls$Total_cms, start=min(pre.falls$WaterYear, freq=1));
mk <- mk.test(temp.ts); mk
## Mann-Kendall Test
##
## two-sided homogeinity test
## H0: S = 0 (no trend)
## HA: S != 0 (monotonic trend)
##
## Statistics for total series
## S varS Z tau pvalue
## 1 -77 14291.67 -0.6 -0.063 0.52495
sen <- sens.slope(temp.ts); sen
##
## Sen's slope and intercept
##
##
## slope: -25.9626
## 95 percent confidence intervall for slope
## 61.8574 -107.4761
##
## intercept: 12148.8019
## nr. of observations: 50
# create lines to plot on graph
estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(pre.falls)[1]-1,1)),1)
upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(pre.falls)[1]-1,1)),1)
lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(pre.falls)[1]-1,1)),1)
#plot the polygon and line
polygon(c(rev(pre.falls$WaterYear), (pre.falls$WaterYear)), c(rev(lowerc1), (upperc1)), col=rgb(0,0,0.2,0.2), border=NA)
lines(pre.falls$WaterYear, estimate, lty=2, col=rgb(0,0,.2,1))
######################################
# Post Falls Lake
#######################################
post.falls <- subset(annual.flow, WaterYear>=1984)
#create time series
temp.ts<-ts(post.falls$Total_cms, start=min(post.falls$WaterYear, freq=1));
mk <- mk.test(temp.ts); mk
## Mann-Kendall Test
##
## two-sided homogeinity test
## H0: S = 0 (no trend)
## HA: S != 0 (monotonic trend)
##
## Statistics for total series
## S varS Z tau pvalue
## 1 3 4550.333 0 0.005 0.97635
sen <- sens.slope(temp.ts); sen
##
## Sen's slope and intercept
##
##
## slope: 1.9004
## 95 percent confidence intervall for slope
## 171.663 -178.9858
##
## intercept: 10195.6163
## nr. of observations: 34
# create lines to plot on graph
estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(post.falls)[1]-1,1)),1)
upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(post.falls)[1]-1,1)),1)
lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(post.falls)[1]-1,1)),1)
#plot the polygon and line on the graph
polygon(c(rev(post.falls$WaterYear), (post.falls$WaterYear)), c(rev(lowerc1), (upperc1)), col=rgb(0,0,0.2,0.2), border=NA)
lines(post.falls$WaterYear, estimate, lty=2, col=rgb(0,0,.2,1))
What do you notice? This is often a challenge in looking at streamflow trends because the time period assessed can drastically change your results.
Streamflow is greatly influenced by climate patterns and shifts in climate can occur seasonally. Let’s look at each month to see what patterns may emerge. To do this, we are going to create a page for 12 plots to sit on in 3 rows and 4 columns. We will then use a for loop
to subset the data, perform the trend test, and create a plot for each month of the year.
#Create dataframe
month.stats <- as.data.frame(matrix(nrow=0,ncol=5));
#create meaningul column names
colnames(month.stats) <- c("Month", "Pval","Intercept","Slope","Nobs");
#set up plots frame
par(mfrow=c(3,4)) #3 rows and 4 columns
par(mar = c(2,4,2,1)) #set plot margins (how much white space around each plot) (bottom, left, top right)
#loop through each month
for (i in 1:12) {
zt <- subset(month.flow, Month==i) #subset data by momth
temp.ts<-ts(zt$Total_cms, start=min(zt$Year, freq=1)); #create time series
mk <- mk.test(temp.ts); mk #mk test
sen <- sens.slope(temp.ts); sen #sen test
#fill dataframe so months can be directly compared
month.stats[i,1] <- i; month.stats[i,2]<-mk$pvalg; month.stats[i,3]<-sen$intercept
month.stats[i,4] <- sen$b.sen; month.stats[i,5]<-sen$nobs;
# create lines to plot on graph
estimate<-round((sen$intercept+sen$b.sen*seq(0,dim(zt)[1]-1,1)),1)
upperc1<-round((sen$intercept+sen$b.sen.up*seq(0,dim(zt)[1]-1,1)),1)
lowerc1<-round((sen$intercept+sen$b.sen.lo*seq(0,dim(zt)[1]-1,1)),1)
#plot each month
plot(zt$Year, zt$Total_cms, type='n', yaxt="n", main=i, ylim=c(0,5000), #good practice to keep y-axis the same if possible on multiple charts you are comparing
ylab="", xlab = '', cex.lab=0.9)
axis(2, las=2, cex.axis=0.8)
points(zt$Year, zt$Total_cms, col=rgb(0,0,0,0.8), cex=0.8, pch=19)
lines(zt$Year, zt$Total_cms, col=rgb(0,0,0,0.2))
polygon(c(rev(zt$Year), (zt$Year)), c(rev(lowerc1), (upperc1)), col=rgb(0,0,0.2,0.2), border=NA)
lines(zt$Year, estimate, lty=2, col=rgb(0,0,.2,1))
}
#mtext('Total Flow (cms)', side=2, line=-3, outer=TRUE)
month.stats
## Month Pval Intercept Slope Nobs
## 1 1 0.21259403 1358.3553 -4.3194231 87
## 2 2 0.04151639 1736.0924 -7.0708824 87
## 3 3 0.54281095 1559.8907 -2.7467500 87
## 4 4 0.09031758 1375.6212 -5.3923966 87
## 5 5 0.79746829 643.1237 -0.4451803 87
## 6 6 0.80313427 514.6020 0.2348293 87
## 7 7 0.44137408 495.1974 -1.1534333 87
## 8 8 0.21259403 519.5464 -1.7904375 87
## 9 9 0.25575488 270.0089 1.2552429 87
## 10 10 0.06679728 229.8388 1.6588333 87
## 11 11 0.87762946 368.9823 0.2083333 87
## 12 12 0.37893674 747.1815 -1.7405333 87
Use filter/subset to answer th following questions:
Often, it is good practice to put your results in the context of a broader geography (stream reaches, river basin, state). Here we are going to run the analysis for sites across North Carolina and plot the results.
#What sites are available in NC?
nc.sites <- readNWISdata(stateCd="NC", parameterCd = pcode, service = "site", seriesCatalogOutput=TRUE)
length(unique(nc.sites$site_no)); #tells how many sites are present
## [1] 732
#filter data using pipes!
nc.sites2 <- filter(nc.sites, parm_cd %in% pcode) %>% #only sites with discharge
filter(stat_cd %in% scode) %>% #mean daily values
filter(begin_date <= "1950_01_01") %>% filter(end_date >= "2010_01_01") %>% #in this range
mutate(period = as.Date(end_date) - as.Date("1970-01-01")) %>% #how many days of data
filter(period > 30*365) #keep those with 30 years of data
length(unique(nc.sites2$site_no))
## [1] 67
#calculate unique sites
unique.sites <- unique(nc.sites2$site_no)
#set up data frame
stats <- as.data.frame(matrix(nrow=0,ncol=7));
#Add column headers
colnames(stats) <- c("Site", "Pval","Intercept","Slope","Nobs","StartYr","EndYr");
start.date = "1970-10-01"
end.date = "2017-09-30"
#Loop through each site and calculate statistics
for (i in 1:length(unique.sites)){
#read in data
zt <- readNWISdv(siteNumbers = unique.sites[i], parameterCd = pcode, statCd = scode, startDate=start.date, endDate = end.date)
zt <- renameNWISColumns(zt);
zt$Flow_cms <- zt$Flow*0.028316847
zt$Year <- year(zt$Date)
#use pipes to aggregate flow by year
annual.flow <- zt %>% group_by(Year) %>%
summarise(Total_cms = sum(Flow_cms, na.rm=T), n=n()) %>% round(3)
annual.flow <- as.data.frame(annual.flow);
#run trend test
temp.ts<-ts(annual.flow$Total_cms, start=min(annual.flow$Year, freq=1));
mk <- mk.test(temp.ts);
sen <- sens.slope(temp.ts);
#fill dataframe
stats[i,1] <- as.character(unique.sites[i]); stats[i,2]<-round(mk$pvalg,3); stats[i,3]<-round(sen$intercept,3)
stats[i,4] <- round(sen$b.sen,3); stats[i,5]<-sen$nobs; stats[i,6]<-min(zt$Year)
stats[i,7] <- max(zt$Year);
#print(i)
}
summary(stats)
## Site Pval Intercept Slope
## Length:67 Min. :0.0020 Min. : 106.9 Min. :-780.556
## Class :character 1st Qu.:0.0685 1st Qu.: 1785.8 1st Qu.: -59.303
## Mode :character Median :0.2120 Median : 4428.6 Median : -16.717
## Mean :0.3197 Mean : 11595.7 Mean : -44.084
## 3rd Qu.:0.5125 3rd Qu.: 13756.3 3rd Qu.: -4.735
## Max. :1.0000 Max. :100958.7 Max. : 884.029
## Nobs StartYr EndYr
## Min. : 4.00 Min. :1970 Min. :2016
## 1st Qu.:47.00 1st Qu.:1970 1st Qu.:2017
## Median :48.00 Median :1970 Median :2017
## Mean :43.58 Mean :1973 Mean :2017
## 3rd Qu.:48.00 3rd Qu.:1970 3rd Qu.:2017
## Max. :48.00 Max. :2013 Max. :2017
#remove sites with less than 30 years observations
stats <- subset(stats, Nobs>=30); dim(stats)
## [1] 59 7
I want to represent each site as a triangle. I want the triangle to point upwards for positive slopes and downwards for negative slopes. Furthermore, I want to color the triangle based on the direction (red color for negative and blue color for positive) and bold color for significant slopes with pastels for insignificant slopes. Leaflet does not have an automatic way to create such markers. Therefore, we need to create a function to make this icon. I did not write this function - I searched for someone else’s code who had already ran into this problem. This is a great way to learn how to code.
#Function to grab pch icons and transform to leaflet markers
pchIcons = function(pch = 1, width = 30, height = 30, bg = "transparent", col = "black", ...) {
n = length(pch)
files = character(n)
# create a sequence of png images
for (i in seq_len(n)) {
f = tempfile(fileext = '.png')
png(f, width = width, height = height, bg = bg)
par(mar = c(0, 0, 0, 0))
plot.new()
points(.5, .5, pch = pch[i], col = col[i], cex = min(width, height) / 8, ...)
dev.off()
files[i] = f
}
files
}
Now that we know what the function looks like, we can create the variables I want to load into the function to draw symbols based on significance and slope. We will do this by adding a column to color by Direction
and Significance
.
Now, we call the function pchIcons into a new variable we created called iconFiles
. When we plot the data into leaflet
, we will include the variable iconFiles.
#Add a column with the pch (triangle direction) I want based on the slope
#(2 is an upward triangle, 6 a downward triangle)
stats$Direction <- ifelse(stats$Slope>0, 2, 6) #shapes based on pch values
stats$Direction <- ifelse(stats$Slope==0, 19, stats$Direction)
table(stats$Direction)
##
## 2 6
## 6 53
stats$Significance <- ifelse(stats$Slope>0 & stats$Pval<=0.05, "blue","lightgrey")
stats$Significance <- ifelse(stats$Slope>0 & stats$Pval>0.05, "lightblue",stats$Significance)
stats$Significance <- ifelse(stats$Slope<0 & stats$Pval>0.05, "lightpink",stats$Significance)
stats$Significance <- ifelse(stats$Slope<0 & stats$Pval<=0.05, "red",stats$Significance)
table(stats$Significance)
##
## lightblue lightpink red
## 6 40 13
iconFiles = pchIcons(stats$Direction, 20, 20, col = stats$Significance, lwd = 4)
Now I want to plot the data on leaflet. Notice, that the data I want to plot is in the stats
table, while the data holding the location of the sites is in the nc.sites2
table. I need to merge these two tables together using merge()
. This function works by: merge(table1, table2, by column in table 1, by column in table 2)
, where the columns in table 1 and table 2 should be a unique identifier that links the two tables together. In this case, it will be the site number.
#merge with site lat/long
stats.merge <- merge(stats, nc.sites2[,c(2,5:6,22:23)], by.x="Site", by.y="site_no"); dim(stats.merge)
## [1] 59 13
#Map results
library(leaflet)
leaflet(data=stats.merge) %>% #data to plot
addProviderTiles("CartoDB.Positron") %>% #creates background mpa
addMarkers(~dec_long_va,~dec_lat_va, #adds markers with location
icon = ~ icons(
iconUrl = iconFiles, #markers with triangles
popupAnchorX = 20, popupAnchorY = 0
),
popup=~paste0(Site,": ",StartYr,"-",EndYr) #data in popup
) %>% addLegend("bottomright", #add legend to map
values=c("red","lightpink","lightgrey","lightblue","blue"), color=c("red","lightpink","lightgrey", "lightblue","blue"),
labels= c("Significant Decrease","Insignificant Decrease", "No Change", "Insignificant Increase", "Significant Increase"),
opacity = 1)
Based on the map, what can you say about how annual streamflow volume has changed since 1970? Can you find your site?