Tag Archives: tutorial
Visualization with Matplotlib -1 basics
Customizing plots
subplot
layout
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
data set
records of undergraduate degrees awarded to women in a variety of fields from 1970 to 2011
- physical_sciences (representing the percentage of Physical Sciences degrees awarded to women each in corresponding year)
- computer_science (representing the percentage of Computer Science degrees awarded to women in each corresponding year)
In [2]:
year=np.arange(1970,2012)
In [3]:
physical_sciences = np.array([ 13.8, 14.9, 14.8, 16.5, 18.2, 19.1, 20. , 21.3, 22.5,
23.7, 24.6, 25.7, 27.3, 27.6, 28. , 27.5, 28.4, 30.4,
29.7, 31.3, 31.6, 32.6, 32.6, 33.6, 34.8, 35.9, 37.3,
38.3, 39.7, 40.2, 41. , 42.2, 41.1, 41.7, 42.1, 41.6,
40.8, 40.7, 40.7, 40.7, 40.2, 40.1])
In [4]:
computer_science = np.array([ 13.6, 13.6, 14.9, 16.4, 18.9, 19.8, 23.9, 25.7, 28.1,
30.2, 32.5, 34.8, 36.3, 37.1, 36.8, 35.7, 34.7, 32.4,
30.8, 29.9, 29.4, 28.7, 28.2, 28.5, 28.5, 27.5, 27.1,
26.8, 27. , 28.1, 27.7, 27.6, 27. , 25.1, 22.2, 20.6,
18.6, 17.6, 17.8, 18.1, 17.6, 18.2])
In [5]:
plt.figure(figsize=[6,3])
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
# Display the plot
plt.show()
Using axes()¶
- In calling plt.axes([xlo, ylo, width, height]), a set of axes is created and made active with lower corner at coordinates (xlo, ylo) of the specified width and height. Note that these coordinates are passed to plt.axes() in the form of a list.
- The coordinates and lengths are values between 0 and 1 representing lengths relative to the dimensions of the figure. After issuing a plt.axes() command, plots generated are put in that set of axes.
In [6]:
plt.figure(figsize=[9,3])
# Create plot axes for the first line plot
plt.axes([0.05,0.05,0.425,0.9])
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year,physical_sciences, color='blue')
# Create plot axes for the second line plot
plt.axes([.525,0.05,0.425,0.9])
# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='red')
# Display the plot
plt.show()
Using subplot()¶
- The command plt.axes() requires a lot of effort to use well because the coordinates of the axes need to be set manually. A better alternative is to use plt.subplot() to determine the layout automatically.
- plt.subplot(m, n, k) to make the subplot grid of dimensions m by n and to make the kth subplot active (subplots are numbered starting from 1 row-wise from the top left corner of the subplot grid).
In [7]:
plt.figure(figsize=[9,3])
# Create a figure with 1x2 subplot and make the left subplot active
plt.subplot(1,2,1)
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')
# Make the right subplot active in the current 1x2 subplot grid
plt.subplot(1,2,2)
# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')
# Use plt.tight_layout() to improve the spacing between subplots
plt.tight_layout()
plt.show()
add more data¶
health (representing the percentage of Computer Science degrees awarded to women in each corresponding year
education
In [8]:
health = np.array([ 77.1, 75.5, 76.9, 77.4, 77.9, 78.9, 79.2, 80.5, 81.9,
82.3, 83.5, 84.1, 84.4, 84.6, 85.1, 85.3, 85.7, 85.5,
85.2, 84.6, 83.9, 83.5, 83. , 82.4, 81.8, 81.5, 81.3,
81.9, 82.1, 83.5, 83.5, 85.1, 85.8, 86.5, 86.5, 86. ,
85.9, 85.4, 85.2, 85.1, 85. , 84.8])
education = np.array([ 77.1, 75.5, 76.9, 77.4, 77.9, 78.9, 79.2, 80.5, 81.9,
82.3, 83.5, 84.1, 84.4, 84.6, 85.1, 85.3, 85.7, 85.5,
85.2, 84.6, 83.9, 83.5, 83. , 82.4, 81.8, 81.5, 81.3,
81.9, 82.1, 83.5, 83.5, 85.1, 85.8, 86.5, 86.5, 86. ,
85.9, 85.4, 85.2, 85.1, 85. , 84.8])
2x2 subplot layout¶
In [9]:
# Create a figure with 2x2 subplot layout and make the top left subplot active
plt.subplot(2,2,1)
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')
# Make the top right subplot active in the current 2x2 subplot grid
plt.subplot(2,2,2)
# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')
# Make the bottom left subplot active in the current 2x2 subplot grid
plt.subplot(2,2,3)
# Plot in green the % of degrees awarded to women in Health Professions
plt.plot(year, health, color='green')
plt.title('Health Professions')
# Make the bottom right subplot active in the current 2x2 subplot grid
plt.subplot(2,2,4)
# Plot in yellow the % of degrees awarded to women in Education
plt.plot(year, education, color='yellow')
plt.title('Education')
# Improve the spacing between subplots and display them
plt.tight_layout()
plt.show()
Using xlim(), ylim()¶
- set x- and y-limits of plots, e.g. plt.xlim() to set the x-axis range
In [10]:
plt.figure(figsize=[9,3])
# Plot the % of degrees awarded to women in Computer Science and the Physical Sciences
plt.plot(year,computer_science, color='red')
plt.plot(year, physical_sciences, color='blue')
# Add the axis labels
plt.xlabel('Year')
plt.ylabel('Degrees awarded to women (%)')
# Set the x-axis range
plt.xlim([1990,2010])
# Set the y-axis range
plt.ylim([0,50])
# Add a title and display the plot
plt.title('Degrees awarded to women (1990-2010)\nComputer Science (red)\nPhysical Sciences (blue)')
plt.show()
# Save the image as 'xlim_and_ylim.png'
plt.savefig('xlim_and_ylim.png')
Using axis()¶
- alternatively, you can pass a 4-tuple to plt.axis() to set limits for both axes at once.
- save plot using savefig()
In [11]:
plt.figure(figsize=[7,3])
# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')
# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')
# Set the x-axis and y-axis limits
plt.axis((1990,2010,0,50))
# Show the figure
plt.show()
# Save the figure as 'axis_limits.png'
plt.savefig('axis_limits.png')
Other axis() options¶
Invocation | Result |
---|---|
axis(‘off’) | turns off axis lines, labels |
axis(‘equal’) | equal scaling on x, y axes |
axis(‘square’) | forces square plot |
axis(‘tight’) | sets xlim(), ylim() to show all data |
In [12]:
plt.figure(figsize=[20,3])
plt.subplot(1,2,1)
# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')
# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')
# Set the x-axis and y-axis limits
plt.axis('equal')
plt.subplot(1,2,2)
# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')
# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')
# Set the x-axis and y-axis limits
plt.axis('tight')
# Show the figure
plt.show()
Using legend()¶
In [13]:
plt.figure(figsize=[7,2.5])
# Specify the label 'Computer Science'
plt.plot(year, computer_science, color='red', label='Computer Science')
# Specify the label 'Physical Sciences'
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')
# Add a legend at the lower center
plt.legend(loc='upper right')
# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')
plt.show()
Legend locations¶
string | code | string | code | string | code |
---|---|---|---|---|---|
'upper left' | 2 | 'upper center' | 9 | 'upper right' | 1 |
'center left' | 6 | 'center' ' | 10 | 'center right' | 7 |
'lower left' | 3 | 'lower center' | 8 | 'lower right' | 4 |
'best' | 0 | 'right' | 5 |
Using annotate()¶
- To enable an arrow, set arrowprops=dict(facecolor='black'). The arrow will point to the location given by xy and the text will appear at the location given by xytext
In [14]:
plt.figure(figsize=[7,2.5])
# Plot with legend as before
plt.plot(year, computer_science, color='red', label='Computer Science')
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')
plt.legend(loc='lower right')
# Compute the maximum enrollment of women in Computer Science: cs_max
cs_max = computer_science.max()
# Calculate the year in which there was maximum enrollment of women in Computer Science: yr_max
yr_max = year[computer_science.argmax()]
# Add a black arrow annotation
plt.annotate(s='Maximum', xy=(yr_max, cs_max), xytext=(yr_max-30,cs_max+8), arrowprops={'facecolor':'cyan'})
# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')
plt.show()
Modifying styles¶
- Matplotlib comes with a number of different stylesheets to customize the overall look of different plots. To activate a particular stylesheet you can simply call plt.style.use() with the name of the style sheet you want.
- To list all the available style sheets you can execute: print(plt.style.available)
In [15]:
print(plt.style.available)
In [16]:
plt.figure(figsize=[7,2.5])
# Set the style to 'ggplot'
plt.style.use('ggplot')
# Plot the enrollment % of women in Computer Science
plt.plot(year, computer_science, 'ro-',alpha=.2,linewidth=2, markersize=12)
plt.title('Computer Science',fontsize=11,alpha=.8,color='orange')
plt.xlabel('test x lable',fontsize=8,color='g')
plt.ylabel('test y lable',fontsize=9,color='purple',alpha=.8)
plt.tick_params(labelsize=7)
# Add annotation
cs_max = computer_science.max()
yr_max = year[computer_science.argmax()]
plt.annotate('Maximum', xy=(yr_max, cs_max), xytext=(yr_max-1, cs_max-15), arrowprops=dict(facecolor='green'))
# Improve spacing between subplots and display them
plt.tight_layout()
plt.show()
In [ ]:
simple url_based APIs tutorial
simple url_based APIs
What is an API?¶
- Set of protocols and routines
- Bunch of code
- Allows two so!ware programs to communicate with each other
In [7]:
import requests
url = 'http://www.omdbapi.com/?t=Split'
r = requests.get(url)
json_data = r.json()
for key, value in json_data.items():
print(key + ':', value)
with open("a_movie.json", 'w+') as save:
save.write(r.text)
Loading and exploring a JSON¶
- with open(file_path) as file:
In [9]:
import json
# Load JSON: json_data
with open("a_movie.json") as json_file:
json_data = json.load(json_file)
# Print each key-value pair in json_data
for k in json_data.keys():
print(k + ': ', json_data[k])
API requests¶
- pull some movie data down from the Open Movie Database (OMDB) using their API.
- he movie you'll query the API about is The Social Network
- The query string should have one argument t=social+network
- Apply the json() method to the response object r and store the resulting dictionary in the variable json_data.
In [20]:
# Import requests package
import requests
# Assign URL to variable: url
url = 'http://www.omdbapi.com/?t=social+network'
# Package the request, send the request and catch the response: r
r = requests.get(url)
# Print the text of the response
print(r.text)
print type(r.text)
print type(r.json())
# Decode the JSON data into a dictionary: json_data
json_data = r.json()
print
# Print each key-value pair in json_data
for key in json_data.keys():
print(key + ': ', json_data[key])
Wikipedia API¶
In [2]:
# Import package
import requests
# Assign URL to variable: url
url = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=machine+learning'
# Package the request, send the request and catch the response: r
r = requests.get(url)
# Decode the JSON data into a dictionary: json_data
json_data = r.json()
# Print the Wikipedia page extract
pizza_extract = json_data['query']['pages']['233488']['extract']
print(pizza_extract)
In [ ]: