Plot with Seaborn
Statistical Plotting with Seaborn
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
load a dataset online from seaborn¶
In [2]:
tip=sns.load_dataset('tips')
In [3]:
tip.info()
In [4]:
tip.head(3)
Out[4]:
visualizing regressions¶
- Plot data and regression model fits across a FacetGrid.
In [5]:
sns.lmplot('total_bill','tip',tip,size=3,aspect=2)
Out[5]:
group by categorical column¶
In [6]:
sns.lmplot(x='total_bill',y='tip',data=tip, size=3,
col='sex')
Out[6]:
plot group data in the same graph¶
- seaborn color palette http://seaborn.pydata.org/tutorial/color_palettes.html
In [7]:
sns.lmplot(x='total_bill',y='tip',data=tip, size=3, aspect=2,
hue='sex', palette='Set1')
Out[7]:
plot Residuals¶
- residplot()
In [8]:
tip.head(1)
Out[8]:
In [9]:
sns.residplot(x='total_bill',y='tip',data=tip,color='indianred')
Out[9]:
Higher-order regressions¶
When there are more complex relationships between two variables, a simple first order regression is often not sufficient to accurately capture the relationship between the variables. Seaborn makes it simple to compute and visualize regressions of varying orders.
sns.regplot()
the function sns.lmplot() is a higher-level interface to sns.regplot().
A principal difference between sns.lmplot() and sns.regplot() is the way in which matplotlib options are passed (sns.regplot() is more permissive).
For both sns.lmplot() and sns.regplot(), the keyword order is used to control the order of polynomial regression.
The function sns.regplot() uses the argument scatter=None to prevent plotting the scatter plot points again.
In [10]:
tip.head(1)
Out[10]:
In [11]:
# Generate a scatter plot of 'weight' and 'mpg' using red circles
plt.scatter(tip['total_bill'], tip['tip'], label='data', color='red', marker='o', alpha=.5)
# Plot in blue a linear regression of order 1 between 'weight' and 'mpg'
sns.regplot(x='total_bill', y='tip', data=tip, scatter=None, color='blue', label='order 1')
# Plot in green a linear regression of order 2 between 'weight' and 'mpg'
sns.regplot(x='total_bill', y='tip', data=tip, scatter=None, order=2, color='green', label='order 2')
sns.regplot(x='total_bill', y='tip', data=tip, scatter=None, order=3, color='purple', label='order 2')
# Add a legend and display the plot
plt.legend(loc='upper right')
plt.show()
In [12]:
sns.stripplot(y= 'tip', data=tip)
plt.ylabel('tip ($)')
Out[12]:
In [13]:
sns.stripplot(x='day', y='tip', data=tip)
plt.ylabel('tip ($)')
Out[13]:
In [14]:
sns.stripplot(x='day', y='tip', data=tip, size=4, jitter=True)
plt.ylabel('tip ($)')
Out[14]:
In [15]:
sns.swarmplot(x='day', y='tip', data=tip)
plt.ylabel('tip ($)')
Out[15]:
In [16]:
sns.swarmplot(x='day', y='tip', data=tip, hue='sex', palette='Set1')
plt.ylabel('tip ($)')
Out[16]:
In [17]:
sns.swarmplot(x='tip', y='day', data=tip, hue='sex', orient='h')
plt.ylabel('tip ($)')
Out[17]:
Violin plot¶
In [18]:
plt.subplot(1,2,1)
sns.boxplot(x='day', y='tip', data=tip)
plt.ylabel('tip ($)')
plt.subplot(1,2,2)
sns.violinplot(x='day', y='tip', data=tip)
plt.ylabel('tip ($)')
plt.tight_layout()
In [19]:
sns.violinplot(x='day', y='tip', data=tip, inner=None,
color='lightgray')
sns.stripplot(x='day', y='tip', data=tip, size=4,
jitter=True)
plt.ylabel('tip ($)')
Out[19]:
In [20]:
sns.jointplot(x= 'total_bill', y= 'tip', data=tip, size=5)
Out[20]:
Using kde=True¶
- kernal density distribution
In [21]:
sns.jointplot(x='total_bill', y= 'tip', data=tip,
kind='kde', size=5)
Out[21]:
Pair plot¶
In [22]:
sns.pairplot(tip, size=2)
Out[22]:
In [23]:
sns.pairplot(tip, hue='sex', kind='reg')
Out[23]:
heatmap¶
- covariance matrix
In [24]:
tip.cov()
Out[24]:
In [25]:
tip.corr()
Out[25]:
In [26]:
sns.heatmap(tip.corr())
Out[26]:
In [ ]: