Category Archives: Python
Pandas basis
Question | answer | explain | ||
how to get how big in memory a DataFrame object is? | df.info() | |||
what is the best representative of null value in Pandas object? | np.nan | import numpy as np | ||
what is the best way to slice a DataFrame by index? | df.iloc[-5:, 2:] | use iloc method | ||
how to convert a DataFrame (excluding indexes) to a numpy ndarray? | df.values | it is a attribute, can’t be called | ||
what is the most basic way to create a DataFrame? | pd.DataFrame(dict) | pass dictionary to; keys are column names | ||
what is broadcasting? | pd[‘new’]=7 | all the values of the new column will be 7 | ||
how to change df’s column names, index names? | pd.columns = [‘a’,’b’,…]
pd.index = [‘c’,’d’,…] |
assign value directly | ||
when read csv, how to specify names of the column | pd.read_csv(path, names=[‘a’,’b’,…..]) | instead, pass header=None will prevent pandas using data as column names, but use 0,1,2,3 …. | ||
when read csv, how to let pandas to turn some specific values into NaN? | pd.read_csv(path, na_values = ‘-1’)
pdf.read_csv(path, na_values = {‘column3’:[‘ -2’, ‘wtf’,…]}) |
all the values which is character ‘-1’ will be rendered to NaN | ||
how to parse data in reading csv | pd.read_csv(path, parse_dates = [[0,1,2]]) | pandas will parse column 1, 2, 3 into one datetype column | ||
does index of df have a name? | pd.index.name = ‘xxx’ | assign a name to the index of df | ||
how to save df to a csv file with other delimiters rather than ‘,’ | pd.to_csv(path, sep=’\t’) | save to a csv file which separates data by tab |
how to batch convert string to Date type
df[‘datestring’]=pd.to_datetime(df[‘datestring’])
how to get 2 DataFrame together, & append one df to another?
__init__.py
from here
What is __init__.py
used for?
The primary use of __init__.py
is to initialize Python packages. The easiest way to demonstrate this is to take a look at the structure of a standard Python module.
package/ __init__.py file.py file2.py file3.py subpackage/ __init__.py submodule1.py submodule2.py
As you can see in the structure above the inclusion of the __init__.py
file in a directory indicates to the Python interpreter that the directory should be treated like a Python package
What goes in __init__.py
?
__init__.py
can be an empty file but it is often used to perform setup needed for the package(import things, load things into path, etc).
One common thing to do in your __init__.py
is to import selected Classes, functions, etc into the package level so they can be convieniently imported from the package.
In our example above we can say that file.py has the Class File. So without anything in our__init__.py
you would import with this syntax:
from package.file import File
However you can import File into your __init__.py to make it available at the package level:
# in your __init__.py from file import File # now import File from package from package import File
Another thing to do is at the package level make subpackages/modules available with the__all__ variable. When the interpeter sees an __all__ variable defined in an __init__.py it imports the modules listed in the __all__ variable when you do:
from package import *
__all__
is a list containing the names of modules that you want to be imported with import * so looking at our above example again if we wanted to import the submodules in subpackage the __all__ variable in subpackage/__init__.py
would be:
__all__ = ['submodule1', 'submodule2']
With the __all__ variable populated like that, when you perform
from subpackage import *
it would import submodule1 and submodule2.
As you can see __init__.py
can be very useful besides its primary function of indicating that a directory is a module.
Python class inherit example
class Staff:
def __init__(self,name,age):
self.name = name
self.age = age
print 'Create Staff: ', self.name
def tell(self):
print 'name:%s; age:%s' % (self.name, self.age)
def speak(self):
print 'I\'m %s'%self.age
class Teacher(Staff):
def __init__(self,name,age,salary):
Staff.__init__(self,name,age)
self.salary = salary
print 'Create Teacher: ', self.name
def tell(self):
Staff.tell(self)
print 'salary: ', self.salary
class Student(Staff):
def __init__(self,name,age,marks):
Staff.__init__(self,name,age)
self.marks = marks
print 'Create Student: ', self.name
def tell(self):
Staff.tell(self)
print 'marks: ', self.marks
tea = Teacher('Eva', 28, 3000)
stu = Student('Adam', 16, 77)
have= [tea,stu]
print
for i in have:
print i.tell()
print
print i.speak()
print
Create Staff: Eva
Create Teacher: Eva
Create Staff: Adam
Create Student: Adam
name:Eva; age:28
salary: 3000
None
I'm 28
None
name:Adam; age:16
marks: 77
None
I'm 16
None
To determine if a character is Chinese
string’scoding should be unicode
to know if one character is Chinese
we can decode utf-8 to unicode
def is_chinese(uchar):
"""判断一个unicode是否是汉字"""
if uchar >= u'\u4e00' and uchar<=u'\u9fa5':
return True
else:
return False
In Python, convert utf-8 to unicode
string.decode('utf-8')
convert unicode to utf-8
string.encode('utf-8')
for i in '下:@uVT4HLJLA: 二、我是用MAC的,所以可以骂你脑残'.decode('utf-8'):
print i, is_chinese(i)
下 True
: False
@ False
u False
V False
T False
4 False
H False
L False
J False
L False
A False
: False
False
二 True
、 False
我 True
是 True
用 True
M False
A False
C False
的 True
, False
所 True
以 True
可 True
以 True
骂 True
get file extension using Python
import os
have = os.listdir()
for i in have:
name, extension = os.path.splitext(i)
print name, extension
.ipynb_checkpoints ext:
Chinese-Sentiment ext:
corenlp-python ext:
corenlp.tar ext: .gz
extract ext:
k-means ext: .ipynb
neg ext: .csv
neg ext: .txt
neg ext: .xls
new_model ext:
new_model2 ext:
pos ext: .csv
pos ext: .txt
pos ext: .xls
Sentiment-Analysis ext:
sentiment ext: .ipynb
sentiPY ext:
snownlp ext:
test100 ext: .ipynb
Untitled ext: .ipynb
Untitled1 ext: .ipynb
Untitled2 ext: .ipynb
use_analysis ext: .csv
week1 ext: .csv
week31_divided(no_use_in_model) ext: .csv
wtf ext:
validate folder
folder=[]
for i in have:
if os.path.splitext(i)[-1]=='':
folder.append(i)
return folder
Machine Learning: Regression with GraphLab Create
bias-variance trade-off | gradient descent | |||
Ridge regression | cross-validation | measure of fit + measure of model complexity | ||
Lasso regression | coordinate descent | feature selection | measure of fit + (different) measure of model complexity | |
Nearest Neighbor Regression & Kernel Regression | ||||
concave | hax Max value | |||
convex | has Min value | |||
Models
|
• Linear regression • Regularization: Ridge (L2), Lasso (L1) • Nearest neighbor and kernel regression |
Algorithms | • Gradient descent • Coordinate descent |
Concepts
|
• Loss functions, bias-variance tradeoff,
cross-validation, sparsity, overfitting, model selection, feature selection |
python format print example
print 'before: {:.2f} after'.format(1.5555)
print '{1},{0},{1},{2},{0}'.format('pos',777,True)
print '{name},{age}'.format(age=18,name='cutie')
has=['first', 2.00, 'third']
print '1st {0[0]} all: {0} last {0[2]} end'.format(has)
print 'start--- {:,} ---end'.format(9876543210)
print 'start:{:>8}'.format(123)
print 'start:{:0>8}'.format(123)
print 'start:{:A>8}'.format(123)