Pandas basis
Question | answer | explain | ||
how to get how big in memory a DataFrame object is? | df.info() | |||
what is the best representative of null value in Pandas object? | np.nan | import numpy as np | ||
what is the best way to slice a DataFrame by index? | df.iloc[-5:, 2:] | use iloc method | ||
how to convert a DataFrame (excluding indexes) to a numpy ndarray? | df.values | it is a attribute, can’t be called | ||
what is the most basic way to create a DataFrame? | pd.DataFrame(dict) | pass dictionary to; keys are column names | ||
what is broadcasting? | pd[‘new’]=7 | all the values of the new column will be 7 | ||
how to change df’s column names, index names? | pd.columns = [‘a’,’b’,…]
pd.index = [‘c’,’d’,…] |
assign value directly | ||
when read csv, how to specify names of the column | pd.read_csv(path, names=[‘a’,’b’,…..]) | instead, pass header=None will prevent pandas using data as column names, but use 0,1,2,3 …. | ||
when read csv, how to let pandas to turn some specific values into NaN? | pd.read_csv(path, na_values = ‘-1’)
pdf.read_csv(path, na_values = {‘column3’:[‘ -2’, ‘wtf’,…]}) |
all the values which is character ‘-1’ will be rendered to NaN | ||
how to parse data in reading csv | pd.read_csv(path, parse_dates = [[0,1,2]]) | pandas will parse column 1, 2, 3 into one datetype column | ||
does index of df have a name? | pd.index.name = ‘xxx’ | assign a name to the index of df | ||
how to save df to a csv file with other delimiters rather than ‘,’ | pd.to_csv(path, sep=’\t’) | save to a csv file which separates data by tab |
how to batch convert string to Date type
df[‘datestring’]=pd.to_datetime(df[‘datestring’])
how to get 2 DataFrame together, & append one df to another?
how .all() work? | default parameter: axis=0 | check each of the column of a DataFrame, if all the rows in that column are True, return True for that column | |
how .any() work? | same | if any of the rows in that column is True, return True | |
if set axis=1, check all the rows |
In [14]:
import pandas as pd
df=pd.DataFrame({'col_1':[True,True,True,True],'col_2':[False,False,False,False],
'col_3':[True,False,True,False],'col_4':[0,0,0,1],\
'col_5':[0,0,0,0],'col_6':[1,1,1,1],'col_7':[0,1,2,3],'col_8':[7,6,5,4]})
df
Out[14]:
In [15]:
df.all()
Out[15]:
In [16]:
df.any()
Out[16]:
In [17]:
df.all(axis=1)
Out[17]:
In [18]:
df.any(axis=1)
Out[18]:
In [ ]: