bokeh 4th: server

how to write simple bokeh program that runs on a server?

bokeh 3rd: high-level charts

Where to get Bokeh high-level charts that can be simply created through pandas DataFrame?

Pandas basis


Question  answer explain
 how to get how big in memory a DataFrame object is?
 what is the best representative of null value in Pandas object?  np.nan import numpy as np
 what is the best way to slice a DataFrame by index?  df.iloc[-5:, 2:] use iloc method
 how to convert a DataFrame (excluding indexes) to a numpy ndarray?  df.values it is a attribute, can’t be called
 what is the most basic way to create a DataFrame?  pd.DataFrame(dict) pass dictionary to; keys are column names
 what is broadcasting?  pd[‘new’]=7 all the values of the new column will be 7
 how to change df’s column names, index names?  pd.columns = [‘a’,’b’,…]

pd.index = [‘c’,’d’,…]

assign value directly
 when read csv, how to specify names of the column  pd.read_csv(path, names=[‘a’,’b’,…..])  instead, pass header=None will prevent pandas using data as column names, but use 0,1,2,3 ….
when read csv, how to let pandas to turn some specific values into NaN? pd.read_csv(path, na_values = ‘-1’)

pdf.read_csv(path, na_values = {‘column3’:[‘ -2’, ‘wtf’,…]})

all the values which is character ‘-1’ will be rendered to NaN
how to parse data in reading csv pd.read_csv(path, parse_dates = [[0,1,2]]) pandas will parse column 1, 2, 3 into one datetype column
does index of df have a name? = ‘xxx’ assign a name to the index of df
how to save df to a csv file with other delimiters  rather than ‘,’ pd.to_csv(path, sep=’\t’) save to a csv file which separates data by tab

how to batch convert string to Date type


how to get 2 DataFrame together,  & append one df to another?

from here

What is used for?

The primary use of is to initialize Python packages. The easiest way to demonstrate this is to take a look at the structure of a standard Python module.


As you can see in the structure above the inclusion of the file in a directory indicates to the Python interpreter that the directory should be treated like a Python package


What goes in can be an empty file but it is often used to perform setup needed for the package(import things, load things into path, etc).

One common thing to do in your is to import selected Classes, functions, etc into the package level so they can be convieniently imported from the package.

In our example above we can say that has the Class File. So without anything in you would import with this syntax:

from package.file import File

However you can import File into your to make it available at the package level:

# in your
from file import File

# now import File from package
from package import File

Another thing to do is at the package level make subpackages/modules available with the__all__ variable. When the interpeter sees an __all__ variable defined in an it imports the modules listed in the __all__ variable when you do:

from package import *

__all__ is a list containing the names of modules that you want to be imported with import * so looking at our above example again if we wanted to import the submodules in subpackage the __all__ variable in subpackage/ would be:

__all__ = ['submodule1', 'submodule2']

With the __all__ variable populated like that, when you perform

from subpackage import *

it would import submodule1 and submodule2.

As you can see __init__.pycan be very useful besides its primary function of indicating that a directory is a module.

Python class inherit example

class Staff:  
    def __init__(self,name,age):  = name  
        self.age = age  
        print 'Create Staff: ',  
    def tell(self):  
        print 'name:%s; age:%s' % (, self.age)  
    def speak(self):
        print 'I\'m %s'%self.age
class Teacher(Staff):  
    def __init__(self,name,age,salary):  
        self.salary = salary  
        print 'Create Teacher: ',  
    def tell(self):  
        print 'salary: ', self.salary  
class Student(Staff):  
    def __init__(self,name,age,marks):  
        self.marks = marks  
        print 'Create Student: ',  
    def tell(self):  
        print 'marks: ', self.marks  
tea = Teacher('Eva', 28, 3000)  
stu = Student('Adam', 16, 77)  

have= [tea,stu]  

for i in have:  
    print i.tell()
    print i.speak()

Create Staff:  Eva
Create Teacher:  Eva
Create Staff:  Adam
Create Student:  Adam

name:Eva; age:28
salary:  3000

I'm 28

name:Adam; age:16
marks:  77

I'm 16

To determine if a character is Chinese

string’scoding should be unicode

to know if one character is Chinese

we can decode utf-8 to unicode

def is_chinese(uchar):
if uchar >= u'\u4e00' and uchar<=u'\u9fa5':
return True
return False


In Python, convert utf-8 to unicode


convert unicode to utf-8



for i in '下:@uVT4HLJLA: 二、我是用MAC的,所以可以骂你脑残'.decode('utf-8'):
    print i, is_chinese(i)

下 True
: False
@ False
u False
V False
T False
4 False
H False
L False
J False
L False
A False
: False
二 True
、 False
我 True
是 True
用 True
M False
A False
C False
的 True
, False
所 True
以 True
可 True
以 True
骂 True

get file extension using Python

import os

have = os.listdir()

for i in have:
name, extension = os.path.splitext(i)
print name, extension


.ipynb_checkpoints ext:
Chinese-Sentiment ext:
corenlp-python ext:
corenlp.tar ext: .gz
extract ext:
k-means ext: .ipynb
neg ext: .csv
neg ext: .txt
neg ext: .xls
new_model ext:
new_model2 ext:
pos ext: .csv
pos ext: .txt
pos ext: .xls
Sentiment-Analysis ext:
sentiment ext: .ipynb
sentiPY ext:
snownlp ext:
test100 ext: .ipynb
Untitled ext: .ipynb
Untitled1 ext: .ipynb
Untitled2 ext: .ipynb
use_analysis ext: .csv
week1 ext: .csv
week31_divided(no_use_in_model) ext: .csv
wtf ext:

validate folder


for i in have:
if os.path.splitext(i)[-1]=='':

return folder​


Machine Learning: Regression with GraphLab Create

bias-variance trade-off gradient descent
Ridge regression        cross-validation      measure of fit + measure of  model complexity  
Lasso regression coordinate descent feature selection measure of fit + (different) measure of model complexity
Nearest Neighbor Regression & Kernel Regression
concave  hax Max value
convex  has Min value


• Linear regression
• Regularization: Ridge (L2), Lasso (L1)
• Nearest neighbor and kernel regression
Algorithms • Gradient descent
• Coordinate descent


• Loss functions, bias-variance tradeoff,

cross-validation, sparsity, overfitting,

model selection, feature selection





python format print example

In [1]:
print 'before: {:.2f} after'.format(1.5555)
before: 1.56 after
In [2]:
print '{1},{0},{1},{2},{0}'.format('pos',777,True) 
In [3]:
print '{name},{age}'.format(age=18,name='cutie')  
In [4]:
has=['first', 2.00, 'third']
print '1st {0[0]} all: {0} last {0[2]} end'.format(has)
1st first all: ['first', 2.0, 'third'] last third end
In [5]:
print 'start--- {:,} ---end'.format(9876543210)
start--- 9,876,543,210 ---end
In [6]:
print 'start:{:>8}'.format(123)
start:     123
In [7]:
print 'start:{:0>8}'.format(123)
In [8]:
print 'start:{:A>8}'.format(123)
