Add Anaconda new Python environment & add to jupyter notebook kernel

Create a virtual environment for your project

conda create -n yourenvname python=x.x anaconda

Activate your virtual environment

source activate yourenvname

list conda environment

conda info --envs

 

Install additional Python packages to a virtual environment

conda install -n yourenvname [package]

Add the new environment to jupyter notebook

pip install ipykernel

python -m ipykernel install --user --name=yourenvname

list jupyter notebook’s kernels info (config) , list kernel environment

jupyter kernelspec list

conda info --envs

Change name/ properties of the environment:

go to the path under “jupyter kernelspec list”, edit the config file directly

delete environment

conda remove --name myenv --all

 

 

 

database python connection basic Sqlalchemy & Pandas

connecting a database through Python

Sqlite

Sqlalchemy

Pandas

Create a database engine

  • SQLite database
    • simple and fast
  • SQLAlchemy
    • Works with many Relational Database Management Systems
      • databasetype:///schema_name.extention
In [3]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///test.sqlite')
In [5]:
table_names = engine.table_names()
table_names
Out[5]:
[]

example: Chinook Database

  • http://chinookdatabase.codeplex.com/
    • the Chinook database contains information about a semi-fictional digital media store in which media data is real and customer, employee and sales data has been manually created.
In [8]:
# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

# Save the table names to a list: table_names
table_names = engine.table_names()

# Print the table names to the shell
print(table_names)
[u'Album', u'Artist', u'Customer', u'Employee', u'Genre', u'Invoice', u'InvoiceLine', u'MediaType', u'Playlist', u'PlaylistTrack', u'Track']

The Hello World of SQL Queries!

  • SELECT, in order to retrieve all columns of the table Album in the Chinook database. Recall that the query SELECT * selects all columns.
In [12]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

# Open engine connection
con = engine.connect()

# Perform query: rs
rs = con.execute('select * from Album')

# Save results of the query to list: ll
ll = rs.fetchall()


# Close connection
con.close()

# Print head of list
print len(ll),type(ll)
ll[:3]
347 <type 'list'>
Out[12]:
[(1, u'For Those About To Rock We Salute You', 1),
 (2, u'Balls to the Wall', 2),
 (3, u'Restless and Wild', 2)]
In [13]:
# Save results of the query to DataFrame
df=pd.DataFrame(ll)
df.head(3)
Out[13]:
0 1 2
0 1 For Those About To Rock We Salute You 1
1 2 Balls to the Wall 2
2 3 Restless and Wild 2

Customizing the Hello World of SQL Queries

  • Select specified columns from a table;
  • Select a specified number of rows;
  • Import column names from the database table.
In [28]:
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute("SELECT LastName, Title FROM Employee")
    df = pd.DataFrame(rs.fetchmany(size=3))
    
    # Using the rs object, set the DataFrame's column names to the corresponding names of the table columns.
    df.columns = rs.keys()

# Print the length of the DataFrame df
print(len(df))

# Print the head of the DataFrame df
df
3
Out[28]:
LastName Title
0 Adams General Manager
1 Edwards Sales Manager
2 Peacock Sales Support Agent

Filtering your database records using SQL's WHERE

In [34]:
# Create engine: engine
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute('select * from Employee where EmployeeID >= 6')
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

# Print the head of the DataFrame df
print df.shape
df.head(1)
(3, 15)
Out[34]:
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
0 6 Mitchell Michael IT Manager 1 1973-07-01 00:00:00 2003-10-17 00:00:00 5827 Bowness Road NW Calgary AB Canada T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899 michael@chinookcorp.com

Ordering your SQL records with ORDER BY

In [37]:
# Create engine: engine
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

# Open engine in context manager
with engine.connect() as con:
    rs = con.execute("SELECT * FROM Employee ORDER BY BirthDate")
    df = pd.DataFrame(rs.fetchall())

    # Set the DataFrame's column names
    df.columns = rs.keys()

# Print head of DataFrame
print df.shape
df.head(3)
(8, 15)
Out[37]:
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
0 4 Park Margaret Sales Support Agent 2.0 1947-09-19 00:00:00 2003-05-03 00:00:00 683 10 Street SW Calgary AB Canada T2P 5G3 +1 (403) 263-4423 +1 (403) 263-4289 margaret@chinookcorp.com
1 2 Edwards Nancy Sales Manager 1.0 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322 nancy@chinookcorp.com
2 1 Adams Andrew General Manager NaN 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW Edmonton AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457 andrew@chinookcorp.com

Querying relational databases directly with pandas

  • df = pd.read_sql_query("queries", engine)
In [40]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

# Execute query and store records in DataFrame: df
df = pd.read_sql_query("SELECT * FROM Album", engine)
print df.shape
(347, 3)
  • compare
In [41]:
# Open engine in context manager
# Perform query and save results to DataFrame: df1
with engine.connect() as con:
    rs = con.execute("SELECT * FROM Album")
    df1 = pd.DataFrame(rs.fetchall())
    df1.columns = rs.keys()

# Confirm that both methods yield the same result: does df = df1 ?
print(df.equals(df1))
True

Pandas for more complex querying

  • You'll build a DataFrame that contains the rows of the Employee table for which the EmployeeId is greater than or equal to 6 and you'll order these entries by BirthDate.
In [44]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook_Sqlite.sqlite')

# Execute query and store records in DataFrame: df
df = pd.read_sql_query(
    "SELECT * FROM Employee WHERE EmployeeId >= 6 ORDER BY BirthDate",
    engine
)

# Print head of DataFrame
print df.shape
df
(3, 15)
Out[44]:
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
0 8 Callahan Laura IT Staff 6 1968-01-09 00:00:00 2004-03-04 00:00:00 923 7 ST NW Lethbridge AB Canada T1H 1Y8 +1 (403) 467-3351 +1 (403) 467-8772 laura@chinookcorp.com
1 7 King Robert IT Staff 6 1970-05-29 00:00:00 2004-01-02 00:00:00 590 Columbia Boulevard West Lethbridge AB Canada T1K 5N8 +1 (403) 456-9986 +1 (403) 456-8485 robert@chinookcorp.com
2 6 Mitchell Michael IT Manager 1 1973-07-01 00:00:00 2003-10-17 00:00:00 5827 Bowness Road NW Calgary AB Canada T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899 michael@chinookcorp.com

INNER JOIN

In [47]:
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute("SELECT Title, Name FROM Album INNER JOIN Artist on Album.ArtistID = Artist.ArtistID")
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

# Print head of DataFrame df
print df.shape
df.head(3)
(347, 2)
Out[47]:
Title Name
0 For Those About To Rock We Salute You AC/DC
1 Balls to the Wall Accept
2 Restless and Wild Accept

pandas filter

In [50]:
# Execute query and store records in DataFrame: df
df = pd.read_sql_query(
    "SELECT * FROM PlaylistTrack INNER JOIN Track on PlaylistTrack.TrackId = Track.TrackId WHERE Milliseconds < 250000",
    engine
)

# Print head of DataFrame
print df.shape
df.head(3)
(4073, 11)
Out[50]:
PlaylistId TrackId TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
0 1 3390 3390 One and the Same 271 2 23 None 217732 3559040 0.99
1 1 3392 3392 Until We Fall 271 2 23 None 230758 3766605 0.99
2 1 3393 3393 Original Fire 271 2 23 None 218916 3577821 0.99
In [ ]: