bokeh 4th: server

how to write simple bokeh program that runs on a server?

bokeh 3rd: high-level charts

Where to get Bokeh high-level charts that can be simply created through pandas DataFrame?

Pandas basis

 

Question  answer explain
 how to get how big in memory a DataFrame object is?  df.info()
 what is the best representative of null value in Pandas object?  np.nan import numpy as np
 what is the best way to slice a DataFrame by index?  df.iloc[-5:, 2:] use iloc method
 how to convert a DataFrame (excluding indexes) to a numpy ndarray?  df.values it is a attribute, can’t be called
 what is the most basic way to create a DataFrame?  pd.DataFrame(dict) pass dictionary to; keys are column names
 what is broadcasting?  pd[‘new’]=7 all the values of the new column will be 7
 how to change df’s column names, index names?  pd.columns = [‘a’,’b’,…]

pd.index = [‘c’,’d’,…]

assign value directly
 when read csv, how to specify names of the column  pd.read_csv(path, names=[‘a’,’b’,…..])  instead, pass header=None will prevent pandas using data as column names, but use 0,1,2,3 ….
when read csv, how to let pandas to turn some specific values into NaN? pd.read_csv(path, na_values = ‘-1’)

pdf.read_csv(path, na_values = {‘column3’:[‘ -2’, ‘wtf’,…]})

all the values which is character ‘-1’ will be rendered to NaN
how to parse data in reading csv pd.read_csv(path, parse_dates = [[0,1,2]]) pandas will parse column 1, 2, 3 into one datetype column
does index of df have a name? pd.index.name = ‘xxx’ assign a name to the index of df
how to save df to a csv file with other delimiters  rather than ‘,’ pd.to_csv(path, sep=’\t’) save to a csv file which separates data by tab

how to batch convert string to Date type

df[‘datestring’]=pd.to_datetime(df[‘datestring’])

how to get 2 DataFrame together,  & append one df to another?

__init__.py

from here

What is __init__.py used for?

The primary use of __init__.py is to initialize Python packages. The easiest way to demonstrate this is to take a look at the structure of a standard Python module.

package/
    __init__.py
    file.py
    file2.py
    file3.py
    subpackage/
        __init__.py
        submodule1.py
        submodule2.py

As you can see in the structure above the inclusion of the __init__.py file in a directory indicates to the Python interpreter that the directory should be treated like a Python package

 

What goes in __init__.py?

__init__.py can be an empty file but it is often used to perform setup needed for the package(import things, load things into path, etc).

One common thing to do in your __init__.py is to import selected Classes, functions, etc into the package level so they can be convieniently imported from the package.

In our example above we can say that file.py has the Class File. So without anything in our__init__.py you would import with this syntax:

from package.file import File

However you can import File into your __init__.py to make it available at the package level:

# in your __init__.py
from file import File

# now import File from package
from package import File

Another thing to do is at the package level make subpackages/modules available with the__all__ variable. When the interpeter sees an __all__ variable defined in an __init__.py it imports the modules listed in the __all__ variable when you do:

from package import *

__all__ is a list containing the names of modules that you want to be imported with import * so looking at our above example again if we wanted to import the submodules in subpackage the __all__ variable in subpackage/__init__.py would be:

__all__ = ['submodule1', 'submodule2']

With the __all__ variable populated like that, when you perform

from subpackage import *

it would import submodule1 and submodule2.

As you can see __init__.pycan be very useful besides its primary function of indicating that a directory is a module.

Python class inherit example

class Staff:  
    def __init__(self,name,age):  
        self.name = name  
        self.age = age  
        print 'Create Staff: ', self.name  
  
    def tell(self):  
        print 'name:%s; age:%s' % (self.name, self.age)  
        
    def speak(self):
        print 'I\'m %s'%self.age
  
class Teacher(Staff):  
    def __init__(self,name,age,salary):  
        Staff.__init__(self,name,age)  
        self.salary = salary  
        print 'Create Teacher: ', self.name  
  
    def tell(self):  
        Staff.tell(self)  
        print 'salary: ', self.salary  
  
class Student(Staff):  
    def __init__(self,name,age,marks):  
        Staff.__init__(self,name,age)  
        self.marks = marks  
        print 'Create Student: ', self.name  
  
    def tell(self):  
        Staff.tell(self)  
        print 'marks: ', self.marks  
  
tea = Teacher('Eva', 28, 3000)  
stu = Student('Adam', 16, 77)  

have= [tea,stu]  

print 
  
for i in have:  
    print i.tell()
    print 
    print i.speak()
    print 


Create Staff:  Eva
Create Teacher:  Eva
Create Staff:  Adam
Create Student:  Adam

name:Eva; age:28
salary:  3000
None

I'm 28
None

name:Adam; age:16
marks:  77
None

I'm 16
None

To determine if a character is Chinese

string’scoding should be unicode

to know if one character is Chinese

we can decode utf-8 to unicode

def is_chinese(uchar):
"""判断一个unicode是否是汉字"""
if uchar >= u'\u4e00' and uchar<=u'\u9fa5':
return True
else:
return False

 

In Python, convert utf-8 to unicode

string.decode('utf-8')

convert unicode to utf-8

string.encode('utf-8')

 

for i in '下:@uVT4HLJLA: 二、我是用MAC的,所以可以骂你脑残'.decode('utf-8'):
    print i, is_chinese(i)


下 True
: False
@ False
u False
V False
T False
4 False
H False
L False
J False
L False
A False
: False
  False
二 True
、 False
我 True
是 True
用 True
M False
A False
C False
的 True
, False
所 True
以 True
可 True
以 True
骂 True

get file extension using Python

import os

have = os.listdir()

for i in have:
name, extension = os.path.splitext(i)
print name, extension


 

.ipynb_checkpoints ext:
Chinese-Sentiment ext:
corenlp-python ext:
corenlp.tar ext: .gz
extract ext:
k-means ext: .ipynb
neg ext: .csv
neg ext: .txt
neg ext: .xls
new_model ext:
new_model2 ext:
pos ext: .csv
pos ext: .txt
pos ext: .xls
Sentiment-Analysis ext:
sentiment ext: .ipynb
sentiPY ext:
snownlp ext:
test100 ext: .ipynb
Untitled ext: .ipynb
Untitled1 ext: .ipynb
Untitled2 ext: .ipynb
use_analysis ext: .csv
week1 ext: .csv
week31_divided(no_use_in_model) ext: .csv
wtf ext:

validate folder

 

folder=[]
for i in have:
if os.path.splitext(i)[-1]=='':
folder.append(i)

return folder​

 

Machine Learning: Regression with GraphLab Create

bias-variance trade-off gradient descent
Ridge regression        cross-validation      measure of fit + measure of  model complexity  
Lasso regression coordinate descent feature selection measure of fit + (different) measure of model complexity
Nearest Neighbor Regression & Kernel Regression
 
concave  hax Max value
convex  has Min value
Models

 

• Linear regression
• Regularization: Ridge (L2), Lasso (L1)
• Nearest neighbor and kernel regression
Algorithms • Gradient descent
• Coordinate descent
Concepts

 

• Loss functions, bias-variance tradeoff,

cross-validation, sparsity, overfitting,

model selection, feature selection

 

 

 

 

python format print example

In [1]:
print 'before: {:.2f} after'.format(1.5555)
before: 1.56 after
In [2]:
print '{1},{0},{1},{2},{0}'.format('pos',777,True) 
777,pos,777,True,pos
In [3]:
print '{name},{age}'.format(age=18,name='cutie')  
cutie,18
In [4]:
has=['first', 2.00, 'third']
print '1st {0[0]} all: {0} last {0[2]} end'.format(has)
1st first all: ['first', 2.0, 'third'] last third end
In [5]:
print 'start--- {:,} ---end'.format(9876543210)
start--- 9,876,543,210 ---end
In [6]:
print 'start:{:>8}'.format(123)
start:     123
In [7]:
print 'start:{:0>8}'.format(123)
start:00000123
In [8]:
print 'start:{:A>8}'.format(123)
start:AAAAA123
In [ ]:
 
In [ ]:
 
In [ ]: