#Import pandas and matplotlib
import pandas as pd
from matplotlib import pyplot as plt
#Magic command to allow plots in Jupyter
%matplotlib inline
#Read the data into a data frame
df = pd.read_csv('../data/State_Data_Formatted_All.csv')
df.dtypes
head()
. Alternatively, try sample(5)
to view a random sample or tail()
to view the last lines of data. #View the first 5 rows
df.head()
It's important here to examine your data to see what values are categorical and which are continuous. Think about the various ways we can filter, group, and aggregate the data to provide meaningful summaries.
unique()
#List unique values in the Category field
df['Type'].unique()
value_counts()
to tabulate the number of records within each unique value#Count the number of records in each category with the value_counts() function
df['Category'].value_counts()
dfNonZero = df.query('Withdrawal_MGD > 0')
dfNonZero['Category'].value_counts()
.plot()
at the end of the statement.dfNonZero['Category'].value_counts().plot()
kind
parameter. First, we'll show that indeed the line
is the default. (Also note that adding a ;
at the end of the statement suppresses the <matplotlib.axes...
message. #Plot the number of records in each category; default is line
dfNonZero['Category'].value_counts().plot(kind='line');
kind
of plot to bar
to change it to a bar plotdfNonZero['Category'].value_counts().plot(kind='bar');
barh
, pie
, box
NOTE That different plots have specific uses:
There are many other types of plots and charts, of course, and each has a best case scenario. Check the literature for a more in-depth discussion on this. For example: see Stephanie Evergreen's Chart Chooser utilities for some ideas.
dfNonZero['Category'].value_counts().plot(kind='pie');
color
option. You can use any named color shown here: https://matplotlib.org/examples/color/named_colors.html, or you can specify a color by it's hexcode (see https://htmlcolorcodes.com/) preceded by a #
, e.g. #ff5733
. Try changing the color below to maroon
.dfNonZero['Category'].value_counts().plot(kind='barh',color='#ff5733');
figsize=()
. Supply a tuple of widght and height to change the size of your plot. Values are somewhat arbitrary, so play around. (Also note that I've changed the format of the command, putting parameters on separate lines to make it more readable...) dfNonZero['Category'].value_counts().plot(kind='barh',
color='navy',
figsize=(8,5));
dfNonZero['Category'].value_counts().plot(kind='pie',
colormap ='Pastel1',
figsize=(5,5));
There are 3 classes of colormap, each with its own use:
See this link for a nice discussion: https://matplotlib.org/users/colormaps.html
Matplotlib can do a lot more that what we've done here. However, it does get confusing, and you can see some examples in past notebooks.
Seaborn works in conjunction with matplotlib and was created to make matplotlib easier. Seaborn, as with Matplotlib, has methods for bar plots, histograms and pie charts. Let's take a look at an example of one of the methods, countplot.
#Importing seaborn
import seaborn as sns
Countplot has very similar parameters to Matplotlib. The data parameter for countplot is where you provide the DataFrame or the source for the data. The hue is for the categorical variables. (Recall that a categorical variable is one that can only take a fixed number of values...)
Lets creat a countplot first to show to show the count of records by Type
(Fresh vs Saline) and then by both Type
and Source
(Surface vs Ground).
#Show the count of values, grouped by Type
sns.countplot(data=dfNonZero,x="Type");
#Show the count of values, grouped by Type AND Source
sns.countplot(data=dfNonZero,x="Type",hue='Source');
The folium
package is actually an interface for "Leaflet" a JavaScript API that allows us to plot markers on a map. In the exercise below, we'll first extract a set of NWIS sample points for a given state and then map the locations of these on a folium map.
Full documentation on the folium package is here: http://python-visualization.github.io/folium/docs-v0.5.0/
#Get the list of site names for NC
theURL = ('https://waterdata.usgs.gov/nwis/inventory?' + \
'state_cd=nc&' + \
'group_key=NONE&' + \
'format=sitefile_output&' + \
'sitefile_output_format=rdb&' + \
'column_name=site_no&' + \
'column_name=station_nm&' + \
'column_name=site_tp_cd&' + \
'column_name=dec_lat_va&' + \
'column_name=dec_long_va&' + \
'column_name=drain_area_va&' + \
'list_of_search_criteria=state_cd')
colnames=['site_no','station_nm','site_tp_cd','lat','lng','agent','datum','d_area']
#Pull the data from the URL
dfNWIS = pd.read_csv(theURL,skiprows=29,sep='\t',names=colnames,index_col='site_no')
#Drop rows with null values
dfNWIS.dropna(inplace=True)
#Display
dfNWIS.head()
We have site number, site names, location coordinates and a field of drainage areas. Let's plot these on a map using the location coordinates. The steps in this process are as follows
#Determine the median lat/lng
medianLat = dfNWIS['lat'].median()
medianLng = dfNWIS['lng'].median()
print (medianLat,medianLng)
OpenStreetMap
, but Stamen Terrain
, Stamen Toner
, Mapbox Bright
, and Mapbox Control Room
, and many others tiles are built in.#Import the package
import folium
print(folium.__version__)
#Construct the map
m = folium.Map(location=[medianLat,medianLng],
zoom_start = 7,
tiles='OpenStreetMap'
)
#Display the map
m
Try playing with the parameters and see how the map appears.
#Create the marker, we'll use a circle Marker
myMarker = folium.CircleMarker(location=[medianLat,medianLng],
color='red',
fill=True,
fill_opacity=0.5,
radius=30
)
myMarker.add_to(m)
m
#Recreate the map object to clear markers
m = folium.Map(location=[medianLat,medianLng],
zoom_start = 7,
tiles='OpenStreetMap'
)
#Loop through all features and add them to the map as markers
for row in dfNWIS.itertuples():
#Get info for the record
lat = row.lat
lng = row.lng
#Create the marker object, adding them to the map object
folium.CircleMarker(location=[lat,lng],
color='blue',
fill=True,
fill_opacity=0.6,
radius=3).add_to(m)
#Show the map
m