Unit 1: Analysis of Flood Return Interval Using Rcran

Background: Calculating return intervals

Our team hydrologist suggested that one method for evaluating the impacts of dam construction is to monitor changes in flood return intervals. Falls Lake is a flood control reservoir, so it should decrease the amount of downstream flooding.

Reservoir Pools

Reservoir Pools

Figure: Reservoirs should moderate downstream flows. There is a flood control pool to hold flood waters that can be released slowly over time. There is also a conservation pool that holds water that can be released downstream during drier conditions to meet minimum streamflow requirements.

Flood insurance policy is built around the concept of the 100-year flood event. The housing industry has been working to explain what that risk actually means to homeowners with 30 year mortgages. Understanding the flood risk relative to mortage’s is helpful for insurance companies to know. Has Falls Lake decreased the flood risk for downstream homes?

Descripting Flood Probabilities

Descripting Flood Probabilities

Reservoirs decrease the likelihood of downstream flooding, but that often means development occurs in areas that would have been frequently flooded prior to the reservoir. We’ve seen examples of this just his year with Hurricane Harvey.

Framing and executing the analysis

We will use Leopold’s (1994) flood frequency curve and the Weibull equation to calculate the recurrence interval. Here the return interval is computed as \(\frac{n+1}{m}\) where n is the number of years of data and m is the rank of the year from largest to smallest.

  • NOTE: The accuracy of a return interval is highly impacted by the length of the time series.

So, for us to do this analysis, we need to first compute maximum annual discharge, i.e., extract the largest discharge observed from each water year. Next, sort and rank our data on max annual discharge and then compute a regression line from which we can determine the discharge of a 100 and 500 year flood.

Prep the workspace by installing (if needed) and loading libraries

The method for installing and loading libraries, as well as downloading data from the USGS, are explained in the LoadStreamflowDescription.Rmd file.

#install.packages("dataRetrieval", repos=c("http://owi.usgs.gov/R", getOption("repos")))
library(dataRetrieval); library(EGRET); library (ggplot2)
library(dplyr); library(magrittr); library(lubridate)

Download data from NWIS

#Identify gauge to download
siteNo <- '02087500' #don't forget to add the zero if it is missing

#Identify parameter of interest
pcode = '00060' #discharge (cfs)

#Identify statistic code for daily values
scode = "00003"  #mean

#Identify start and end dates
start.date = "1930-10-01"
end.date = "2017-09-30"

#Load in Ndata using the site ID
neuse <- readNWISdv(siteNumbers = siteNo, parameterCd = pcode, statCd = scode, startDate=start.date, endDate=end.date)
#rename columns to something more useful
neuse <- renameNWISColumns(neuse); colnames(neuse)
## [1] "agency_cd" "site_no"   "Date"      "Flow"      "Flow_cd"
#Create cms column to plot cubic meters per second
neuse$Flow_cms <- neuse$Flow*0.028316847


Calculate the Flood Return Interval using dplyr and pipes

For a basic understanding of variables and pipes, see the LoadStreamflowDescription.Rmd file. Here we will use pipes to calculate the maximum daily streamflow within each water year. The water year can be calculated using ifelse(condition, if true, if false), which essentially says if this condition is true then do ‘x’, otherwise do ‘y’. Then use pipes to calculate the peak flow by water year.

#calculate year and month variables
neuse$Year <- year(neuse$Date);  neuse$Month <- month(neuse$Date)

#calculate the Water Year using ifelse
#if the month is greater than 10, then WaterYear = Year+1, else WaterYear=Year
neuse$WaterYear <- ifelse(neuse$Month>=10, neuse$Year+1, neuse$Year)

#Calculate the maximum annual flow
peak.flow <- neuse %>%
  group_by(WaterYear) %>%
  summarise(Peak_cms = max(Flow_cms, na.rm=T), n=n()) %>%  round(3)
peak.flow <- as.data.frame(peak.flow); peak.flow  
##    WaterYear Peak_cms   n
## 1       1931  300.159 365
## 2       1932  264.763 366
## 3       1933  212.376 365
## 4       1934  237.862 365
## 5       1935  461.565 365
## 6       1936  348.297 366
## 7       1937  271.559 365
## 8       1938  334.139 365
## 9       1939  325.644 365
## 10      1940  311.485 366
## 11      1941  170.467 365
## 12      1942  152.911 365
## 13      1943  227.951 365
## 14      1944  245.790 366
## 15      1945  637.129 365
## 16      1946  247.206 365
## 17      1947  178.679 365
## 18      1948  297.327 366
## 19      1949  253.436 365
## 20      1950  179.246 365
## 21      1951  132.523 365
## 22      1952  339.802 366
## 23      1953  239.277 365
## 24      1954  370.951 365
## 25      1955  317.149 365
## 26      1956  236.446 366
## 27      1957  256.267 365
## 28      1958  382.277 365
## 29      1959  222.287 365
## 30      1960  275.806 366
## 31      1961  208.412 365
## 32      1962  256.267 365
## 33      1963  219.456 365
## 34      1964  239.277 366
## 35      1965  302.990 365
## 36      1966  270.426 365
## 37      1967  331.307 365
## 38      1968  181.228 366
## 39      1969  174.715 365
## 40      1970  169.052 365
## 41      1971  182.644 365
## 42      1972  199.351 366
## 43      1973  430.416 365
## 44      1974  183.210 365
## 45      1975  455.901 365
## 46      1976  139.319 366
## 47      1977  194.254 365
## 48      1978  399.268 365
## 49      1979  393.604 365
## 50      1980  180.945 366
## 51      1981  106.471 365
## 52      1982  214.359 365
## 53      1983  227.667 365
## 54      1984  198.501 366
## 55      1985  144.416 365
## 56      1986  188.307 365
## 57      1987  236.446 365
## 58      1988   60.315 366
## 59      1989  222.287 365
## 60      1990  178.396 365
## 61      1991  165.087 365
## 62      1992  108.737 366
## 63      1993  224.836 365
## 64      1994  190.856 365
## 65      1995  247.772 365
## 66      1996  535.188 366
## 67      1997  172.733 365
## 68      1998  249.188 365
## 69      1999  557.842 365
## 70      2000  185.192 366
## 71      2001  136.770 365
## 72      2002  121.479 365
## 73      2003  214.925 365
## 74      2004  150.362 366
## 75      2005   88.915 365
## 76      2006  328.475 365
## 77      2007  174.432 365
## 78      2008  150.929 366
## 79      2009  146.681 365
## 80      2010  198.784 365
## 81      2011  117.515 365
## 82      2012   55.501 366
## 83      2013  221.721 365
## 84      2014  219.456 365
## 85      2015  171.034 365
## 86      2016  177.263 366
## 87      2017  532.357 365


  • Next, we want to remove years that are missing a lot of data. We arbitrarily decide to throw out any years with less than 90% of the data available: peak.flow <- subset(peak.flow, n>=(365-365*.1))

  • Then we arrange the peak flows from the largest to the smallest value using arrange(data, column) and desc() to arrange the values in descending order.

  • Add a rank column using the command rank(). The - symbol in front of peak.flow tells R to rank the data with the highest value as (1) and the lowest value as (max n).

  • Calculate return intervals in a new column named ReturnInterval
    • Determine how many years of data you have (e.g. count of year rows or max of rank).
    • Compute \(\frac{n+1}{m}\) where n is the number of years of data and m is the rank
  • Calculate the Annual Exceedance Probability (just the inverse of RI) in a new column named Pe where Pe = \(\frac{1}{RI}\)

#remove rows missing more than 10% of data
peak.flow <- subset(peak.flow, n>=(365-365*.1))

#rank flows
peak.flow <- arrange(peak.flow, desc(Peak_cms)); peak.flow[1:5,]   #arranges data in descending order based on the Peak_cms column
##   WaterYear Peak_cms   n
## 1      1945  637.129 365
## 2      1999  557.842 365
## 3      1996  535.188 366
## 4      2017  532.357 365
## 5      1935  461.565 365
peak.flow$Rank <- rank(-peak.flow$Peak_cms); peak.flow[1:5,]       #adds a column with rank from 1 to n
##   WaterYear Peak_cms   n Rank
## 1      1945  637.129 365    1
## 2      1999  557.842 365    2
## 3      1996  535.188 366    3
## 4      2017  532.357 365    4
## 5      1935  461.565 365    5
#calculate the return internval
n.years <- dim(peak.flow)[1]; n.years
## [1] 87
peak.flow$ReturnInterval <- (n.years+1)/peak.flow$Rank; peak.flow[1:5,]
##   WaterYear Peak_cms   n Rank ReturnInterval
## 1      1945  637.129 365    1       88.00000
## 2      1999  557.842 365    2       44.00000
## 3      1996  535.188 366    3       29.33333
## 4      2017  532.357 365    4       22.00000
## 5      1935  461.565 365    5       17.60000
peak.flow$AnnualProb <- round(1/peak.flow$ReturnInterval*100,3);  peak.flow[1:5,]
##   WaterYear Peak_cms   n Rank ReturnInterval AnnualProb
## 1      1945  637.129 365    1       88.00000      1.136
## 2      1999  557.842 365    2       44.00000      2.273
## 3      1996  535.188 366    3       29.33333      3.409
## 4      2017  532.357 365    4       22.00000      4.545
## 5      1935  461.565 365    5       17.60000      5.682


Plot the data and compute a regression equation

For more information on commands to plot data, see LoadStreamflowDescription.Rmd. The Return Interval (x-axis) and sometimes the Peak Discharge (y-axis) are easier to see on a log plot. We convert the x-axis to log by including the command: log="x" in the plot() function. To add tick marks and labels on a log axis, we create the ticks manually and add them to the x-axis.

Next we want to estimate the peak discharge at the 100, 200, 500, and 1000 year interval. To do that, we need to fit a regression to the data. We will try two types of regressions: (1) linear and (2) log.

  • Linear regressions are called using the lm(formula, data) command.
    • RI.linear = lm(Peak_cms ~ ReturnInterval , data = peak.flow); RI.linear
    • This returns the coefficients of the regression. To get the significance, standard error, r2 and f-statistic, use summary(RI.linear)
  • Log regressions are also called using lm(), except the x-axis (independent) variable is set to log().
    • RI.log = lm(Peak_cms ~ log(ReturnInterval), data=peak.flow); summary(RI.log)

*We notice the log regression has the highest r2 value. Using RI.log we estimate the y-value at the 100, 200, 500 and 1000 year interval using the predict() function.

#Plot the return interval with the peak flow
par(mfrow=c(1,1))     #one graph per plot
par(mar = c(5,5,3,5)) #set plot margins
plot(peak.flow$ReturnInterval, peak.flow$Peak_cms, log="x", type='n', yaxt="n", xlim=c(1,1000), ylim=c(0,1000),
     ylab="Peak Streamflow (cms)", xlab = 'Return Interval (Years)')
  axis(2, las=2, cex.axis=0.9)
  #create minor tick marks
  minor.ticks <- c(2,3,4,6,7,8,9,20,30,40,60,70,80,90,200,300,400,600,700,800,900)    
  #add minor tick marks to x-ais
  axis(1,at=minor.ticks,labels=FALSE, col="darkgray")                                
box() #draw a box around the plot
#add points to the plot
  points(peak.flow$ReturnInterval, peak.flow$Peak_cms, col=rgb(0,0,0,0.6), cex=1.2, pch=19)  #add points to plot

#determine whether a linear or log regression has teh best fit...
RI.linear = lm(Peak_cms ~ ReturnInterval , data = peak.flow); RI.linear
## 
## Call:
## lm(formula = Peak_cms ~ ReturnInterval, data = peak.flow)
## 
## Coefficients:
##    (Intercept)  ReturnInterval  
##        204.412           7.608
  summary(RI.linear)
## 
## Call:
## lm(formula = Peak_cms ~ ReturnInterval, data = peak.flow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -236.79  -39.91    0.33   39.24  160.57 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    204.4117     8.4979   24.05   <2e-16 ***
## ReturnInterval   7.6080     0.7047   10.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 71.8 on 85 degrees of freedom
## Multiple R-squared:  0.5783, Adjusted R-squared:  0.5733 
## F-statistic: 116.5 on 1 and 85 DF,  p-value: < 2.2e-16
RI.log = lm(Peak_cms ~ log(ReturnInterval), data=peak.flow)
  summary(RI.log)
## 
## Call:
## lm(formula = Peak_cms ~ log(ReturnInterval), data = peak.flow)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -74.040  -5.703   3.307   9.011  39.418 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          128.193      2.553   50.21   <2e-16 ***
## log(ReturnInterval)  118.001      1.908   61.84   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.3 on 85 degrees of freedom
## Multiple R-squared:  0.9783, Adjusted R-squared:  0.978 
## F-statistic:  3824 on 1 and 85 DF,  p-value: < 2.2e-16
  #Log regression had the best fit
#Estimate the streamflow at the following return intervals using the log regression
x.est <- as.data.frame(c(100,200,500,1000)); colnames(x.est)<-"ReturnInterval"
y.est <- predict(RI.log,x.est, interval="confidence")
  y.est <- as.data.frame(y.est)
  y100 = cbind(x.est, y.est);  y100 <- subset(y100, x.est==100)$fit
  y100
## [1] 671.607
#Add to plot
points(x.est$ReturnInterval, y.est$fit, col="red", pch=2, lwd=2);
legend("bottomright", c("Observed","Log Estimated"), pch=c(19,2), col=c("darkgray","red"))


Using functions to calculate the return interval

For more information on how functions work in R, see LoadStreamflowDescription.Rmd. Essentially we take all of the commands from above and stick them inside a function we name flood_int. The function will then return a data frame called peak.flow with the return interval and annual probabilities included.

#create the flood_int function that reads in the data parameter
flood_int = function(data){ 
  #Calculate the maximum annual flow by water year
  peak.flow <- data %>%
    group_by(WaterYear) %>%
    summarise(Peak_cms = max(Flow_cms, na.rm=T), n=n()) %>%  round(3)
  peak.flow <- as.data.frame(peak.flow); 
  #remove rows missing more than 10% of data
  peak.flow <- subset(peak.flow, n>=(365-365*.1))
  
  #rank flows
  peak.flow <- arrange(peak.flow, desc(Peak_cms)); peak.flow[1:5,]
  peak.flow$Rank <- rank(-peak.flow$Peak_cms); peak.flow[1:5,]
  
  #calculate teh return interval
  n.years <- dim(peak.flow)[1]; n.years
  peak.flow$ReturnInterval <- (n.years+1)/peak.flow$Rank; peak.flow[1:5,]
  peak.flow$AnnualProb <- round(1/peak.flow$ReturnInterval*100,3);  peak.flow[1:5,]
  
  #return the data frame so it can be used outside the function
  return (peak.flow)
}


Now we can call this function to calculate the return interval for different periods of time. In this case, we will look at how the return interval has changed prior to 1979 (before Falls Lake reservoir) and after 1984 (following Falls Lake reservoir).

Call the function flood_int on a subset of the data (pre 1979), run a regression, and plot the data with the original data for the period of record (POR).

#Call for the early period
neuse.early <- subset(neuse, Date>="1930-01-01" & Date<="1979-12-31");         # summary(neuse.early)
peak.flow.early <- flood_int(neuse.early)   ;
summary(peak.flow.early)  
##    WaterYear       Peak_cms           n              Rank   
##  Min.   :1931   Min.   :132.5   Min.   :365.0   Min.   : 1  
##  1st Qu.:1943   1st Qu.:199.4   1st Qu.:365.0   1st Qu.:13  
##  Median :1955   Median :253.4   Median :365.0   Median :25  
##  Mean   :1955   Mean   :272.7   Mean   :365.2   Mean   :25  
##  3rd Qu.:1967   3rd Qu.:325.6   3rd Qu.:365.0   3rd Qu.:37  
##  Max.   :1979   Max.   :637.1   Max.   :366.0   Max.   :49  
##  ReturnInterval     AnnualProb
##  Min.   : 1.020   Min.   : 2  
##  1st Qu.: 1.351   1st Qu.:26  
##  Median : 2.000   Median :50  
##  Mean   : 4.571   Mean   :50  
##  3rd Qu.: 3.846   3rd Qu.:74  
##  Max.   :50.000   Max.   :98
#Basic plot
par(mar = c(5,5,3,5)) #set plot margins
plot(peak.flow.early$ReturnInterval, peak.flow.early$Peak_cms, log="x", type='n', yaxt="n", xlim=c(1,1000), ylim=c(0,1000),
     ylab="Peak Streamflow (cms)", xlab = 'Return Interval (Years)')
axis(2, las=2, cex.axis=0.9)
axis(1,at=minor.ticks,labels=FALSE, col="darkgray")                                 
  box()

#plot original data 
points(peak.flow$ReturnInterval, peak.flow$Peak_cms, col=rgb(0,0,1,0.5), cex=0.8, pch=19)    
#plot early period data
points(peak.flow.early$ReturnInterval, peak.flow.early$Peak_cms, col=rgb(0,0,0,0.6), cex=1.2, pch=19)  
 
#linear regression
RI.linear.early = lm(Peak_cms ~ ReturnInterval , data = peak.flow.early);
  summary(RI.linear.early)
## 
## Call:
## lm(formula = Peak_cms ~ ReturnInterval, data = peak.flow.early)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -103.610  -44.921    1.095   51.534   83.097 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    225.6256     9.0628   24.90  < 2e-16 ***
## ReturnInterval  10.2976     0.9954   10.35 1.06e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.87 on 47 degrees of freedom
## Multiple R-squared:  0.6948, Adjusted R-squared:  0.6884 
## F-statistic:   107 on 1 and 47 DF,  p-value: 1.059e-13
RI.log.early = lm(Peak_cms ~ log(ReturnInterval), data=peak.flow.early)
  summary(RI.log.early) 
## 
## Call:
## lm(formula = Peak_cms ~ log(ReturnInterval), data = peak.flow.early)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.672  -7.447   3.826  10.229  42.181 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          167.649      3.426   48.94   <2e-16 ***
## log(ReturnInterval)  109.227      2.630   41.53   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.18 on 47 degrees of freedom
## Multiple R-squared:  0.9735, Adjusted R-squared:  0.9729 
## F-statistic:  1725 on 1 and 47 DF,  p-value: < 2.2e-16
#plot log line
  x.est <- as.data.frame(c(100,200,500,1000)); colnames(x.est)<-"ReturnInterval"
  y.est.pre <- predict(RI.log.early,x.est, interval="confidence")
  y.est.pre <- as.data.frame(y.est.pre)

#Add regression ponits
points(x.est$ReturnInterval, y.est$fit, col="black", pch=5, lwd=2);  #original POR
points(x.est$ReturnInterval, y.est.pre$fit, col="blue", pch=2, lwd=2);  #early 

#add straight lines at points of interest
    abline(h=y.est.pre$fit[1], col="black", lty=3); abline(v=100, col="red", lty=3)
    abline(h=y100, col="blue", lty=3)

#add a legend
legend("bottomright", c("Period of Record","1930-1979 data", "Est. Flow POR", "Est.Flow 1930-1979"), col=c("blue","black","blue","black"), pch=c(19,19,2,5))


Call the function for on a subset of data (post 1984)

neuse.late <- subset(neuse, Date>="1984-01-01");         # summary(neuse.late)
peak.flow.late <- flood_int(neuse.late)   ;
summary(peak.flow.late)  
##    WaterYear       Peak_cms           n              Rank   
##  Min.   :1985   Min.   : 55.5   Min.   :365.0   Min.   : 1  
##  1st Qu.:1993   1st Qu.:146.7   1st Qu.:365.0   1st Qu.: 9  
##  Median :2001   Median :178.4   Median :365.0   Median :17  
##  Mean   :2001   Mean   :208.3   Mean   :365.2   Mean   :17  
##  3rd Qu.:2009   3rd Qu.:222.3   3rd Qu.:365.0   3rd Qu.:25  
##  Max.   :2017   Max.   :557.8   Max.   :366.0   Max.   :33  
##  ReturnInterval     AnnualProb    
##  Min.   : 1.030   Min.   : 2.941  
##  1st Qu.: 1.360   1st Qu.:26.471  
##  Median : 2.000   Median :50.000  
##  Mean   : 4.213   Mean   :50.000  
##  3rd Qu.: 3.778   3rd Qu.:73.529  
##  Max.   :34.000   Max.   :97.059
#Basic plot
par(mar = c(5,5,3,5)) #set plot margins
plot(peak.flow.late$ReturnInterval, peak.flow.late$Peak_cms, log="x", type='n', yaxt="n", xlim=c(1,1000), ylim=c(0,1000),
     ylab="Peak Streamflow (cms)", xlab = 'Return Interval (Years)')
axis(2, las=2, cex.axis=0.9)
axis(1,at=minor.ticks,labels=FALSE, col="darkgray")                                 
box()

#plot original data
points(peak.flow$ReturnInterval, peak.flow$Peak_cms, col=rgb(0,0,1,0.5), cex=0.8, pch=19)  
#plot post 1984 data
points(peak.flow.late$ReturnInterval, peak.flow.late$Peak_cms, col=rgb(0.7,0.4,0,0.6), cex=1.2, pch=19)  

#linear regression on post 1984
RI.linear.late = lm(Peak_cms ~ ReturnInterval , data = peak.flow.late);
  summary(RI.linear.late)
## 
## Call:
## lm(formula = Peak_cms ~ ReturnInterval, data = peak.flow.late)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -144.96  -15.67    6.35   16.99  205.84 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     138.376     12.736  10.865 4.26e-12 ***
## ReturnInterval   16.601      1.693   9.804 5.13e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 60.61 on 31 degrees of freedom
## Multiple R-squared:  0.7561, Adjusted R-squared:  0.7483 
## F-statistic: 96.12 on 1 and 31 DF,  p-value: 5.132e-11
RI.log.late = lm(Peak_cms ~ log(ReturnInterval), data=peak.flow.late)
  summary(RI.log.late) 
## 
## Call:
## lm(formula = Peak_cms ~ log(ReturnInterval), data = peak.flow.late)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -87.801 -23.122   4.093  20.035 127.460 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           82.158     10.546    7.79 8.63e-09 ***
## log(ReturnInterval)  132.938      8.299   16.02  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 40.3 on 31 degrees of freedom
## Multiple R-squared:  0.8922, Adjusted R-squared:  0.8887 
## F-statistic: 256.6 on 1 and 31 DF,  p-value: < 2.2e-16
#plot regression log line
y.est.post <- predict(RI.log.late,x.est, interval="confidence")
  y.est.post <- as.data.frame(y.est.post)
points(x.est$ReturnInterval, y.est.post$fit, col="goldenrod3", pch=12, lwd=2);
#plot original regression
points(x.est$ReturnInterval, y.est$fit, col="blue", pch=2, lwd=2);
#plot early return interval
points(x.est$ReturnInterval, y.est.pre$fit, col="black", pch=5, lwd=2);

#draw ablines
abline(h=c(y100,y.est.pre$fit[1],y.est.post$fit[1]), col=c("blue","black","goldenrod3"), lty=3);
abline(v=100, col="black", lty=3)

legend("bottomright", c("Period of Record","1984-2017 data", "Est. Flow POR", "Est.Flow 1930-1979", "Est.Flow 1984-2017", "Est. Flow No Hurricane"), 
       col=c("blue","goldenrod3","blue","black","goldenrod3","red"), pch=c(19,19,2,5,12,16))

#remove 3 hurricane events
RI.log.hur <- lm(Peak_cms ~log(ReturnInterval), data=peak.flow.late[c(4:dim(peak.flow.late)[1]),])
  y.est.hur <- as.data.frame(predict(RI.log.hur,x.est, interval="confidence")); y.est.hur
##        fit      lwr      upr
## 1 536.5954 485.7139 587.4770
## 2 601.6321 541.7861 661.4780
## 3 687.6058 615.8768 759.3348
## 4 752.6425 671.9087 833.3762
#plot hurricane
  points(x.est$ReturnInterval, y.est.hur$fit, col="red", pch=16, lwd=2);
  abline(h=y.est.hur$fit[1], col="red", lty=3)


Create Return Interval Table

A table may be the easiest way to compare return intervals. To do this, create a data frame and fill in the desired values. Here we want the 100, 500, and 1000 year intervals. The number of years used to calculate the return interval and the adjusted r2.

#create data frame
RI.table <- as.data.frame(matrix(nrow=4, ncol=6));    
#give column names
colnames(RI.table) <- c("Date.Range", "RI_100yr","RI_500yr","RI_1000yr","Nyears","AdjustedR2")

#fill in columns
  RI.table$Date.Range <- c("1930-2017","1930-1979","1984-2017","Less 3 Hurricanes")
  RI.table$RI_100yr <- c(y.est$fit[1],y.est.pre$fit[1],y.est.post$fit[1], y.est.hur$fit[1])
  RI.table$RI_500yr <- c(y.est$fit[3],y.est.pre$fit[3],y.est.post$fit[3], y.est.hur$fit[3])
  RI.table$RI_1000yr <- c(y.est$fit[4],y.est.pre$fit[4],y.est.post$fit[4], y.est.hur$fit[4])
  RI.table$Nyears <- c(dim(peak.flow)[1], dim(peak.flow.early)[1], dim(peak.flow.late)[1], dim(peak.flow.late)[1]-3)
  RI.table$AdjustedR2 <- c(summary(RI.log)$adj.r.squared, summary(RI.log.early)$adj.r.squared, 
                          summary(RI.log.late)$adj.r.squared, summary(RI.log.hur)$adj.r.squared)
#view table
RI.table
##          Date.Range RI_100yr RI_500yr RI_1000yr Nyears AdjustedR2
## 1         1930-2017 671.6070 861.5222  943.3143     87  0.9779999
## 2         1930-1979 670.6583 846.4524  922.1628     49  0.9729103
## 3         1984-2017 694.3578 908.3126 1000.4579     33  0.8887402
## 4 Less 3 Hurricanes 536.5954 687.6058  752.6425     30  0.8815000


Calculate the probability of an event in a 30 year mortgage

  • Compute the the probability of the 100, 500, and 1,000 year flood occurring over the next 30 years as a binomial distribution: Pe = 1 - [1-\(frac/{1}{T}\))]^n where T is the return period (e.g. 100 years) and n is the number of years of interest (30 years in our case).
  • To do this, we will use a for loop and print the results using print(). The paste0() function concatenates information into a string without including spaces. If you want to include spaces, simply use paste().
#What's the probability of these events occurring in a 30 year mortgage??
Rperiod = c(100,500,1000)
n.years = 30
for (i in 1:3){
  print(paste0("Percent likelihood over ", n.years, " years for a ", Rperiod[i]," year flood: ", round((1-(1-(1/Rperiod[i]))^n.years)*100,2), "%"))
}
## [1] "Percent likelihood over 30 years for a 100 year flood: 26.03%"
## [1] "Percent likelihood over 30 years for a 500 year flood: 5.83%"
## [1] "Percent likelihood over 30 years for a 1000 year flood: 2.96%"