bokeh 4th: server

how to write simple bokeh program that runs on a server?

bokeh 3rd: high-level charts

Where to get Bokeh high-level charts that can be simply created through pandas DataFrame?

bokeh 2nd: layouts

Why layout for different or similar charts is so attracting?

bokeh 1st: fundamentals python data visualization

Let’s dive into the simple but powerful Bokeh—-create sophisticated D3.js like graphs with few Python codes!

 

Visualization with Matplotlib -1 basics

Customizing plots

subplot

layout

In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

data set

records of undergraduate degrees awarded to women in a variety of fields from 1970 to 2011

  • physical_sciences (representing the percentage of Physical Sciences degrees awarded to women each in corresponding year)
  • computer_science (representing the percentage of Computer Science degrees awarded to women in each corresponding year)
In [2]:
year=np.arange(1970,2012)
In [3]:
physical_sciences = np.array([ 13.8,  14.9,  14.8,  16.5,  18.2,  19.1,  20. ,  21.3,  22.5,
        23.7,  24.6,  25.7,  27.3,  27.6,  28. ,  27.5,  28.4,  30.4,
        29.7,  31.3,  31.6,  32.6,  32.6,  33.6,  34.8,  35.9,  37.3,
        38.3,  39.7,  40.2,  41. ,  42.2,  41.1,  41.7,  42.1,  41.6,
        40.8,  40.7,  40.7,  40.7,  40.2,  40.1])
In [4]:
computer_science = np.array([ 13.6,  13.6,  14.9,  16.4,  18.9,  19.8,  23.9,  25.7,  28.1,
        30.2,  32.5,  34.8,  36.3,  37.1,  36.8,  35.7,  34.7,  32.4,
        30.8,  29.9,  29.4,  28.7,  28.2,  28.5,  28.5,  27.5,  27.1,
        26.8,  27. ,  28.1,  27.7,  27.6,  27. ,  25.1,  22.2,  20.6,
        18.6,  17.6,  17.8,  18.1,  17.6,  18.2])
In [5]:
plt.figure(figsize=[6,3])
# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')

# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')

# Display the plot
plt.show()

Using axes()

  • In calling plt.axes([xlo, ylo, width, height]), a set of axes is created and made active with lower corner at coordinates (xlo, ylo) of the specified width and height. Note that these coordinates are passed to plt.axes() in the form of a list.
  • The coordinates and lengths are values between 0 and 1 representing lengths relative to the dimensions of the figure. After issuing a plt.axes() command, plots generated are put in that set of axes.
In [6]:
plt.figure(figsize=[9,3])

# Create plot axes for the first line plot
plt.axes([0.05,0.05,0.425,0.9])

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year,physical_sciences, color='blue')

# Create plot axes for the second line plot
plt.axes([.525,0.05,0.425,0.9])


# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='red')


# Display the plot
plt.show()

Using subplot()

  • The command plt.axes() requires a lot of effort to use well because the coordinates of the axes need to be set manually. A better alternative is to use plt.subplot() to determine the layout automatically.
  • plt.subplot(m, n, k) to make the subplot grid of dimensions m by n and to make the kth subplot active (subplots are numbered starting from 1 row-wise from the top left corner of the subplot grid).
In [7]:
plt.figure(figsize=[9,3])

# Create a figure with 1x2 subplot and make the left subplot active
plt.subplot(1,2,1)

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')

# Make the right subplot active in the current 1x2 subplot grid
plt.subplot(1,2,2)


# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')

# Use plt.tight_layout() to improve the spacing between subplots
plt.tight_layout()
plt.show()

add more data

health (representing the percentage of Computer Science degrees awarded to women in each corresponding year

education

In [8]:
health = np.array([ 77.1,  75.5,  76.9,  77.4,  77.9,  78.9,  79.2,  80.5,  81.9,
        82.3,  83.5,  84.1,  84.4,  84.6,  85.1,  85.3,  85.7,  85.5,
        85.2,  84.6,  83.9,  83.5,  83. ,  82.4,  81.8,  81.5,  81.3,
        81.9,  82.1,  83.5,  83.5,  85.1,  85.8,  86.5,  86.5,  86. ,
        85.9,  85.4,  85.2,  85.1,  85. ,  84.8])
education = np.array([ 77.1,  75.5,  76.9,  77.4,  77.9,  78.9,  79.2,  80.5,  81.9,
        82.3,  83.5,  84.1,  84.4,  84.6,  85.1,  85.3,  85.7,  85.5,
        85.2,  84.6,  83.9,  83.5,  83. ,  82.4,  81.8,  81.5,  81.3,
        81.9,  82.1,  83.5,  83.5,  85.1,  85.8,  86.5,  86.5,  86. ,
        85.9,  85.4,  85.2,  85.1,  85. ,  84.8])

2x2 subplot layout

In [9]:
# Create a figure with 2x2 subplot layout and make the top left subplot active
plt.subplot(2,2,1)

# Plot in blue the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences, color='blue')
plt.title('Physical Sciences')

# Make the top right subplot active in the current 2x2 subplot grid 
plt.subplot(2,2,2)

# Plot in red the % of degrees awarded to women in Computer Science
plt.plot(year, computer_science, color='red')
plt.title('Computer Science')

# Make the bottom left subplot active in the current 2x2 subplot grid
plt.subplot(2,2,3)

# Plot in green the % of degrees awarded to women in Health Professions
plt.plot(year, health, color='green')
plt.title('Health Professions')

# Make the bottom right subplot active in the current 2x2 subplot grid
plt.subplot(2,2,4)

# Plot in yellow the % of degrees awarded to women in Education
plt.plot(year, education, color='yellow')
plt.title('Education')

# Improve the spacing between subplots and display them
plt.tight_layout()
plt.show()

Using xlim(), ylim()

  • set x- and y-limits of plots, e.g. plt.xlim() to set the x-axis range
In [10]:
plt.figure(figsize=[9,3])

# Plot the % of degrees awarded to women in Computer Science and the Physical Sciences
plt.plot(year,computer_science, color='red') 
plt.plot(year, physical_sciences, color='blue')

# Add the axis labels
plt.xlabel('Year')
plt.ylabel('Degrees awarded to women (%)')

# Set the x-axis range
plt.xlim([1990,2010])

# Set the y-axis range
plt.ylim([0,50])

# Add a title and display the plot
plt.title('Degrees awarded to women (1990-2010)\nComputer Science (red)\nPhysical Sciences (blue)')
plt.show()

# Save the image as 'xlim_and_ylim.png'
plt.savefig('xlim_and_ylim.png')
<matplotlib.figure.Figure at 0x7f32a7dca850>

Using axis()

  • alternatively, you can pass a 4-tuple to plt.axis() to set limits for both axes at once.
  • save plot using savefig()
In [11]:
plt.figure(figsize=[7,3])

# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')

# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')

# Set the x-axis and y-axis limits
plt.axis((1990,2010,0,50))

# Show the figure
plt.show()

# Save the figure as 'axis_limits.png'

plt.savefig('axis_limits.png')
<matplotlib.figure.Figure at 0x7f32a7e68e90>

Other axis() options

Invocation Result
axis(‘off’) turns off axis lines, labels
axis(‘equal’) equal scaling on x, y axes
axis(‘square’) forces square plot
axis(‘tight’) sets xlim(), ylim() to show all data
In [12]:
plt.figure(figsize=[20,3])
plt.subplot(1,2,1)

# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')
# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')
# Set the x-axis and y-axis limits
plt.axis('equal')

plt.subplot(1,2,2)

# Plot in blue the % of degrees awarded to women in Computer Science
plt.plot(year,computer_science, color='blue')
# Plot in red the % of degrees awarded to women in the Physical Sciences
plt.plot(year, physical_sciences,color='red')
# Set the x-axis and y-axis limits
plt.axis('tight')

# Show the figure
plt.show()

Using legend()

In [13]:
plt.figure(figsize=[7,2.5])

# Specify the label 'Computer Science'
plt.plot(year, computer_science, color='red', label='Computer Science') 

# Specify the label 'Physical Sciences' 
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')

# Add a legend at the lower center
plt.legend(loc='upper right')

# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')
plt.show()

Legend locations

string code string code string code
'upper left' 2 'upper center' 9 'upper right' 1
'center left' 6 'center' ' 10 'center right' 7
'lower left' 3 'lower center' 8 'lower right' 4
'best' 0 'right' 5

Using annotate()

  • To enable an arrow, set arrowprops=dict(facecolor='black'). The arrow will point to the location given by xy and the text will appear at the location given by xytext
In [14]:
plt.figure(figsize=[7,2.5])

# Plot with legend as before
plt.plot(year, computer_science, color='red', label='Computer Science') 
plt.plot(year, physical_sciences, color='blue', label='Physical Sciences')
plt.legend(loc='lower right')

# Compute the maximum enrollment of women in Computer Science: cs_max
cs_max = computer_science.max()

# Calculate the year in which there was maximum enrollment of women in Computer Science: yr_max
yr_max = year[computer_science.argmax()]

# Add a black arrow annotation
plt.annotate(s='Maximum', xy=(yr_max, cs_max), xytext=(yr_max-30,cs_max+8), arrowprops={'facecolor':'cyan'})

# Add axis labels and title
plt.xlabel('Year')
plt.ylabel('Enrollment (%)')
plt.title('Undergraduate enrollment of women')
plt.show()

Modifying styles

  • Matplotlib comes with a number of different stylesheets to customize the overall look of different plots. To activate a particular stylesheet you can simply call plt.style.use() with the name of the style sheet you want.
  • To list all the available style sheets you can execute: print(plt.style.available)
In [15]:
print(plt.style.available)
[u'seaborn-darkgrid', u'seaborn-notebook', u'classic', u'seaborn-ticks', u'dark_background', u'bmh', u'seaborn-talk', u'grayscale', u'ggplot', u'fivethirtyeight', u'seaborn-colorblind', u'seaborn-deep', u'seaborn-whitegrid', u'seaborn-bright', u'seaborn-poster', u'seaborn-muted', u'seaborn-paper', u'seaborn-white', u'seaborn-pastel', u'seaborn-dark', u'seaborn-dark-palette']

set diff style

set smaller font of axis

In [16]:
plt.figure(figsize=[7,2.5])


# Set the style to 'ggplot'
plt.style.use('ggplot')


# Plot the enrollment % of women in Computer Science
plt.plot(year, computer_science, 'ro-',alpha=.2,linewidth=2, markersize=12)
plt.title('Computer Science',fontsize=11,alpha=.8,color='orange')
plt.xlabel('test x lable',fontsize=8,color='g')
plt.ylabel('test y lable',fontsize=9,color='purple',alpha=.8)


plt.tick_params(labelsize=7)


# Add annotation
cs_max = computer_science.max()
yr_max = year[computer_science.argmax()]
plt.annotate('Maximum', xy=(yr_max, cs_max), xytext=(yr_max-1, cs_max-15), arrowprops=dict(facecolor='green'))


# Improve spacing between subplots and display them
plt.tight_layout()
plt.show()
In [ ]:
 

simple url_based APIs tutorial

simple url_based APIs

 

APIs

Application Programming Interface

  • Protocols and routines
    • Building and interacting with software applications
  • fun: OMDB API
    • the Open Movie Database

JSONs

JavaScript Object Notation

  • Real-time server-to-browser communication
  • Douglas Crockford
  • Human readable

What is an API?

  • Set of protocols and routines
  • Bunch of code
    • Allows two so!ware programs to communicate with each other

Connecting to an API in Python

fun: OMDB API

In [7]:
import requests
url = 'http://www.omdbapi.com/?t=Split'

r = requests.get(url)
json_data = r.json()

for key, value in json_data.items():
    print(key + ':', value)
    
with open("a_movie.json", 'w+') as save:
    save.write(r.text)
(u'Plot:', u'After three girls are kidnapped by a man with 24 distinct personalities they must find some of the different personalities that can help them while running away and staying alive from the others.')
(u'Rated:', u'PG-13')
(u'Response:', u'True')
(u'Language:', u'English')
(u'Title:', u'Split')
(u'Country:', u'USA')
(u'Writer:', u'M. Night Shyamalan')
(u'Metascore:', u'75')
(u'imdbRating:', u'7.6')
(u'Director:', u'M. Night Shyamalan')
(u'Released:', u'20 Jan 2017')
(u'Actors:', u'Anya Taylor-Joy, James McAvoy, Haley Lu Richardson, Kim Director')
(u'Year:', u'2016')
(u'Genre:', u'Horror, Thriller')
(u'Awards:', u'1 nomination.')
(u'Runtime:', u'117 min')
(u'Type:', u'movie')
(u'Poster:', u'https://images-na.ssl-images-amazon.com/images/M/MV5BOWFiNjViN2UtZjIwYS00ZmNhLWIzMTYtYTRiMTczOGMzZGE0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjY5ODI4NDk@._V1_SX300.jpg')
(u'imdbVotes:', u'864')
(u'imdbID:', u'tt4972582')

Loading and exploring a JSON

  • with open(file_path) as file:
In [9]:
import json
# Load JSON: json_data
with open("a_movie.json") as json_file:
    json_data = json.load(json_file)

# Print each key-value pair in json_data
for k in json_data.keys():
    print(k + ': ', json_data[k])
(u'Plot: ', u'After three girls are kidnapped by a man with 24 distinct personalities they must find some of the different personalities that can help them while running away and staying alive from the others.')
(u'Rated: ', u'PG-13')
(u'Response: ', u'True')
(u'Language: ', u'English')
(u'Title: ', u'Split')
(u'Country: ', u'USA')
(u'Writer: ', u'M. Night Shyamalan')
(u'Metascore: ', u'75')
(u'imdbRating: ', u'7.6')
(u'Director: ', u'M. Night Shyamalan')
(u'Released: ', u'20 Jan 2017')
(u'Actors: ', u'Anya Taylor-Joy, James McAvoy, Haley Lu Richardson, Kim Director')
(u'Year: ', u'2016')
(u'Genre: ', u'Horror, Thriller')
(u'Awards: ', u'1 nomination.')
(u'Runtime: ', u'117 min')
(u'Type: ', u'movie')
(u'Poster: ', u'https://images-na.ssl-images-amazon.com/images/M/MV5BOWFiNjViN2UtZjIwYS00ZmNhLWIzMTYtYTRiMTczOGMzZGE0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjY5ODI4NDk@._V1_SX300.jpg')
(u'imdbVotes: ', u'864')
(u'imdbID: ', u'tt4972582')

API requests

  • pull some movie data down from the Open Movie Database (OMDB) using their API.
  • he movie you'll query the API about is The Social Network
    • The query string should have one argument t=social+network
  • Apply the json() method to the response object r and store the resulting dictionary in the variable json_data.
In [20]:
# Import requests package
import requests

# Assign URL to variable: url
url = 'http://www.omdbapi.com/?t=social+network'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Print the text of the response
print(r.text)

print type(r.text)

print type(r.json())

# Decode the JSON data into a dictionary: json_data
json_data = r.json()

print 
# Print each key-value pair in json_data
for key in json_data.keys():
    print(key + ': ', json_data[key])
{"Title":"The Social Network","Year":"2010","Rated":"PG-13","Released":"01 Oct 2010","Runtime":"120 min","Genre":"Biography, Drama","Director":"David Fincher","Writer":"Aaron Sorkin (screenplay), Ben Mezrich (book)","Actors":"Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons","Plot":"Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.","Language":"English, French","Country":"USA","Awards":"Won 3 Oscars. Another 161 wins & 162 nominations.","Poster":"https://images-na.ssl-images-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg","Metascore":"95","imdbRating":"7.7","imdbVotes":"496,009","imdbID":"tt1285016","Type":"movie","Response":"True"}
<type 'unicode'>
<type 'dict'>

(u'Plot: ', u'Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.')
(u'Rated: ', u'PG-13')
(u'Response: ', u'True')
(u'Language: ', u'English, French')
(u'Title: ', u'The Social Network')
(u'Country: ', u'USA')
(u'Writer: ', u'Aaron Sorkin (screenplay), Ben Mezrich (book)')
(u'Metascore: ', u'95')
(u'imdbRating: ', u'7.7')
(u'Director: ', u'David Fincher')
(u'Released: ', u'01 Oct 2010')
(u'Actors: ', u'Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons')
(u'Year: ', u'2010')
(u'Genre: ', u'Biography, Drama')
(u'Awards: ', u'Won 3 Oscars. Another 161 wins & 162 nominations.')
(u'Runtime: ', u'120 min')
(u'Type: ', u'movie')
(u'Poster: ', u'https://images-na.ssl-images-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg')
(u'imdbVotes: ', u'496,009')
(u'imdbID: ', u'tt1285016')
In [2]:
# Import package
import requests

# Assign URL to variable: url
url = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=machine+learning'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Decode the JSON data into a dictionary: json_data
json_data = r.json()

# Print the Wikipedia page extract
pizza_extract = json_data['query']['pages']['233488']['extract']
print(pizza_extract)
<p><b>Machine learning</b> is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible; example applications include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.</p>
<p>Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses in prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining, where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning. Machine learning can also be unsupervised and be used to learn and establish baseline behavioral profiles for various entities and then used to find meaningful anomalies.</p>
<p>Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.</p>
<p></p>
In [ ]: