APIs¶

Application Programming Interface¶

Protocols and routines
- Building and interacting with software applications

fun: OMDB API
- the Open Movie Database

JSONs¶

JavaScript Object Notation¶

Real-time server-to-browser communication
Douglas Crockford
Human readable

What is an API?¶

Set of protocols and routines

Bunch of code
- Allows two so!ware programs to communicate with each other

Connecting to an API in Python¶

fun: OMDB API¶

the Open Movie Database #### http://www.omdbapi.com/

import requests
url = 'http://www.omdbapi.com/?t=Split'

r = requests.get(url)
json_data = r.json()

for key, value in json_data.items():
    print(key + ':', value)
    
with open("a_movie.json", 'w+') as save:
    save.write(r.text)

(u'Plot:', u'After three girls are kidnapped by a man with 24 distinct personalities they must find some of the different personalities that can help them while running away and staying alive from the others.')
(u'Rated:', u'PG-13')
(u'Response:', u'True')
(u'Language:', u'English')
(u'Title:', u'Split')
(u'Country:', u'USA')
(u'Writer:', u'M. Night Shyamalan')
(u'Metascore:', u'75')
(u'imdbRating:', u'7.6')
(u'Director:', u'M. Night Shyamalan')
(u'Released:', u'20 Jan 2017')
(u'Actors:', u'Anya Taylor-Joy, James McAvoy, Haley Lu Richardson, Kim Director')
(u'Year:', u'2016')
(u'Genre:', u'Horror, Thriller')
(u'Awards:', u'1 nomination.')
(u'Runtime:', u'117 min')
(u'Type:', u'movie')
(u'Poster:', u'https://images-na.ssl-images-amazon.com/images/M/MV5BOWFiNjViN2UtZjIwYS00ZmNhLWIzMTYtYTRiMTczOGMzZGE0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjY5ODI4NDk@._V1_SX300.jpg')
(u'imdbVotes:', u'864')
(u'imdbID:', u'tt4972582')

Loading and exploring a JSON¶

with open(file_path) as file:

import json
# Load JSON: json_data
with open("a_movie.json") as json_file:
    json_data = json.load(json_file)

# Print each key-value pair in json_data
for k in json_data.keys():
    print(k + ': ', json_data[k])

(u'Plot: ', u'After three girls are kidnapped by a man with 24 distinct personalities they must find some of the different personalities that can help them while running away and staying alive from the others.')
(u'Rated: ', u'PG-13')
(u'Response: ', u'True')
(u'Language: ', u'English')
(u'Title: ', u'Split')
(u'Country: ', u'USA')
(u'Writer: ', u'M. Night Shyamalan')
(u'Metascore: ', u'75')
(u'imdbRating: ', u'7.6')
(u'Director: ', u'M. Night Shyamalan')
(u'Released: ', u'20 Jan 2017')
(u'Actors: ', u'Anya Taylor-Joy, James McAvoy, Haley Lu Richardson, Kim Director')
(u'Year: ', u'2016')
(u'Genre: ', u'Horror, Thriller')
(u'Awards: ', u'1 nomination.')
(u'Runtime: ', u'117 min')
(u'Type: ', u'movie')
(u'Poster: ', u'https://images-na.ssl-images-amazon.com/images/M/MV5BOWFiNjViN2UtZjIwYS00ZmNhLWIzMTYtYTRiMTczOGMzZGE0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyMjY5ODI4NDk@._V1_SX300.jpg')
(u'imdbVotes: ', u'864')
(u'imdbID: ', u'tt4972582')

API requests¶

pull some movie data down from the Open Movie Database (OMDB) using their API.

he movie you'll query the API about is The Social Network
- The query string should have one argument t=social+network

Apply the json() method to the response object r and store the resulting dictionary in the variable json_data.

# Import requests package
import requests

# Assign URL to variable: url
url = 'http://www.omdbapi.com/?t=social+network'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Print the text of the response
print(r.text)

print type(r.text)

print type(r.json())

# Decode the JSON data into a dictionary: json_data
json_data = r.json()

print 
# Print each key-value pair in json_data
for key in json_data.keys():
    print(key + ': ', json_data[key])

{"Title":"The Social Network","Year":"2010","Rated":"PG-13","Released":"01 Oct 2010","Runtime":"120 min","Genre":"Biography, Drama","Director":"David Fincher","Writer":"Aaron Sorkin (screenplay), Ben Mezrich (book)","Actors":"Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons","Plot":"Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.","Language":"English, French","Country":"USA","Awards":"Won 3 Oscars. Another 161 wins & 162 nominations.","Poster":"https://images-na.ssl-images-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg","Metascore":"95","imdbRating":"7.7","imdbVotes":"496,009","imdbID":"tt1285016","Type":"movie","Response":"True"}
<type 'unicode'>
<type 'dict'>

(u'Plot: ', u'Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, but is later sued by two brothers who claimed he stole their idea, and the co-founder who was later squeezed out of the business.')
(u'Rated: ', u'PG-13')
(u'Response: ', u'True')
(u'Language: ', u'English, French')
(u'Title: ', u'The Social Network')
(u'Country: ', u'USA')
(u'Writer: ', u'Aaron Sorkin (screenplay), Ben Mezrich (book)')
(u'Metascore: ', u'95')
(u'imdbRating: ', u'7.7')
(u'Director: ', u'David Fincher')
(u'Released: ', u'01 Oct 2010')
(u'Actors: ', u'Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons')
(u'Year: ', u'2010')
(u'Genre: ', u'Biography, Drama')
(u'Awards: ', u'Won 3 Oscars. Another 161 wins & 162 nominations.')
(u'Runtime: ', u'120 min')
(u'Type: ', u'movie')
(u'Poster: ', u'https://images-na.ssl-images-amazon.com/images/M/MV5BMTM2ODk0NDAwMF5BMl5BanBnXkFtZTcwNTM1MDc2Mw@@._V1_SX300.jpg')
(u'imdbVotes: ', u'496,009')
(u'imdbID: ', u'tt1285016')

Wikipedia API¶

https://www.mediawiki.org/wiki/API:Main_page

# Import package
import requests

# Assign URL to variable: url
url = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=machine+learning'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Decode the JSON data into a dictionary: json_data
json_data = r.json()

# Print the Wikipedia page extract
pizza_extract = json_data['query']['pages']['233488']['extract']
print(pizza_extract)

<p><b>Machine learning</b> is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data – such algorithms overcome following strictly static program instructions by making data driven predictions or decisions, through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms is infeasible; example applications include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.</p>
<p>Machine learning is closely related to (and often overlaps with) computational statistics, which also focuses in prediction-making through the use of computers. It has strong ties to mathematical optimization, which delivers methods, theory and application domains to the field. Machine learning is sometimes conflated with data mining, where the latter subfield focuses more on exploratory data analysis and is known as unsupervised learning. Machine learning can also be unsupervised and be used to learn and establish baseline behavioral profiles for various entities and then used to find meaningful anomalies.</p>
<p>Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction; in commercial use, this is known as predictive analytics. These analytical models allow researchers, data scientists, engineers, and analysts to "produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning from historical relationships and trends in the data.</p>
<p></p>

Importing flat files from the web¶

use Python2
University of California, Irvine's Machine Learning repository.

http://archive.ics.uci.edu/ml/index.html

'winequality-red.csv', the flat file contains tabular data of physiochemical properties of red wine, such as pH, alcohol content and citric acid content, along with wine quality rating.

# Import package
import urllib

# Import pandas
import pandas as pd

# Assign url of file: url
url = 'https://s3.amazonaws.com/assets.datacamp.com/production/course_1606/datasets/winequality-red.csv'

# Save file locally
urllib.urlretrieve(url, 'winequality-red.csv')

# Read file into a DataFrame and print its head
df = pd.read_csv('winequality-red.csv', sep=';')
print df.shape

(1599, 12)

df.head(3)

Opening and reading flat files from the web¶

load a file from the web into a DataFrame without first saving it locally, you can do that easily using pandas.

# Import packages
import matplotlib.pyplot as plt
import pandas as pd

# Assign url of file: url
url = 'https://s3.amazonaws.com/assets.datacamp.com/production/course_1606/datasets/winequality-red.csv'

# Read file into a DataFrame: df
df = pd.read_csv(url, sep=';')

# Print the head of the DataFrame
# print(df.head())
print df.shape
# Plot first column of df
pd.DataFrame.hist(df.ix[:, 0:1], alpha=.4, figsize=(6,3))
plt.xlabel('fixed acidity (g(tartaric acid)/dm$^3$)')
plt.ylabel('count')
plt.show()

(1599, 12)

Importing non-flat files from the web¶

use pd.read_excel() to import an Excel spreadsheet.

# Import package
import pandas as pd

# Assign url of file: url
url = 'http://s3.amazonaws.com/assets.datacamp.com/course/importing_data_into_r/latitude.xls'

# Read in all sheets of Excel file: xl
xl = pd.read_excel(url, sheetname=None)

# Print the sheetnames to the shell
print(xl.keys())

# Print the head of the first sheet (using its name, NOT its index)
print(xl['1700'].head())

print type(xl)
type(xl['1700'])

[u'1700', u'1900']
                 country       1700
0            Afghanistan  34.565000
1  Akrotiri and Dhekelia  34.616667
2                Albania  41.312000
3                Algeria  36.720000
4         American Samoa -14.307000
<type 'dict'>

pandas.core.frame.DataFrame

from urllib2 import urlopen, Request

request = Request('http://jishichao.com')

response = urlopen(request)

html = response.read()

response.close()

print type(html)
len(html)

<type 'str'>

4843

Printing HTTP request results in Python using urllib¶

You have just just packaged and sent a GET request to "http://docs.datacamp.com/teach/" and then caught the response. You saw that such a response is a http.client.HTTPResponse object. The question remains: what can you do with this response?

Well, as it came from an HTML page, you could read it to extract the HTML and, in fact, such a http.client.HTTPResponse object has an associated read() method.

# Import packages
from urllib2 import urlopen, Request

# Specify the url
url = "http://docs.datacamp.com/teach/"

# This packages the request
request = Request(url)

# Sends the request and catches the response: response
response = urlopen(request)

# Extract the response: html
html = response.read()

print type(html)
print 
# Print the html
print(html[:300])


# Be polite and close the response!
response.close()

<type 'str'>

<!DOCTYPE html>
<link rel="shortcut icon" href="images/favicon.ico" />
<html>

  <head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <title>Home</title>
  <meta name="description" content="A

Requests¶

better and most used

Performing HTTP requests in Python using requests¶

do the same using the higher-level requests library.

import requests
r = requests.get('http://jishichao.com')
text = r.text

print type(text)
print type(text.encode('utf-8'))

<type 'unicode'>
<type 'str'>

Beautiful Soup¶

Parsing HTML with BeautifulSoup¶

Import the function BeautifulSoup from the package bs4

Package the request to the URL, send the request and catch the response with a single function requests.get(), assigning the response to the variable r.
Use the text attribute of the object r to return the HTML of the webpage as a string; store the result in a variable html_doc.
Create a BeautifulSoup object soup from the resulting HTML using the function BeautifulSoup()
Use the method prettify() on soup and assign the result to pretty_soup

# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url: url
url = 'http://jishichao.com'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extracts the response as html: html_doc
html_doc = r.text

# Create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc, "lxml")

# Prettify the BeautifulSoup object: pretty_soup
pretty_soup = soup.prettify()

# Print the response
print type(pretty_soup)
print 
print(pretty_soup[:300])

<type 'unicode'>

<!DOCTYPE html>
<html lang="en">
 <head>
  <link href="../static/my.css" rel="stylesheet" type="text/css"/>
  <title>
   welcome 23333
  </title>
  <!--      <script>
        (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
        (i[r].q=i[r].q||[]).push(arguments)},i[r

Turning a webpage into data using BeautifulSoup: getting the text¶

Extract the title from the HTML soup soup using the attribute title and assign the result to guido_title.

Extract the text from the HTML soup soup using the method get_text() and assign to guido_text.

# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url: url
url = 'http://jishichao.com'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extract the response as html: html_doc
html_doc = r.text

# Create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc)

# Get the title of Guido's webpage: guido_title
guido_title = soup.title

# Print the title of Guido's webpage to the shell
print(guido_title)

# Get Guido's text: guido_text
guido_text = soup.get_text()

# Print Guido's text to the shell
print(guido_text[100:300])

<title>welcome 23333 </title>

An interactive Data Visualization Web App I wrote

My Notebook website built by Python Flask deployed on AWS
A image downloader for a specific website 'worldcosplay', the program helps you open their

Turning a webpage into data using BeautifulSoup: getting the hyperlinks¶

Use the method find_all() to find all hyperlinks in soup, remembering that hyperlinks are defined by the HTML tag < a >; store the result in the variable a_tags

The variable a_tags is a results set: your job now is to enumerate over it, using a for loop and to print the actual URLs of the hyperlinks; to do this, for every element link in a_tags, you want to print() link.get('href').

# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url
url = 'http://jishichao.com'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extracts the response as html: html_doc
html_doc = r.text

# create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc)

# Print the title of Guido's webpage
print(soup.title)

# Find all 'a' tags (which define hyperlinks): a_tags
a_tags = soup.find_all('a')

# Print the URLs to the shell
for link in a_tags:
    print(link.get('href'))

<title>welcome 23333 </title>
http://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=
      result&fr=&sf=1&fmq=1467292435965_R&pv=&ic=0&nc=1&z=&se=1&showtab
      =0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=草泥马动态图
http://shichaoji.com
http://www.jishichao.com:7777
http://www.jishichao.com:10086
https://jishichao.com
/windows0
/windows2
/windows1
/mac1
/linux64
/linux32
/plotting
./

type(a_tags), type(a_tags[0])

(bs4.element.ResultSet, bs4.element.Tag)

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5

Data Science Notebook

matplotlib

seaborn

3d scatterplot

jointplot

FacetGrid

boxplot

stripplot

violinplot

kdeplot

pairplot

Andrews Curves

parallel_coordinates

radviz

sample data exploration and visualization

Using the famous Titanic dataset

APIs¶

Application Programming Interface¶

JSONs¶

JavaScript Object Notation¶

What is an API?¶

Connecting to an API in Python¶

fun: OMDB API¶

Loading and exploring a JSON¶

API requests¶

Wikipedia API¶

Importing flat files from the web¶

Opening and reading flat files from the web¶

Importing non-flat files from the web¶

Printing HTTP request results in Python using urllib¶

Requests¶

Performing HTTP requests in Python using requests¶

Beautiful Soup¶

Parsing HTML with BeautifulSoup¶

Turning a webpage into data using BeautifulSoup: getting the text¶

Turning a webpage into data using BeautifulSoup: getting the hyperlinks¶