Pandas basis
Question | answer | explain | ||
how to get how big in memory a DataFrame object is? | df.info() | |||
what is the best representative of null value in Pandas object? | np.nan | import numpy as np | ||
what is the best way to slice a DataFrame by index? | df.iloc[-5:, 2:] | use iloc method | ||
how to convert a DataFrame (excluding indexes) to a numpy ndarray? | df.values | it is a attribute, can’t be called | ||
what is the most basic way to create a DataFrame? | pd.DataFrame(dict) | pass dictionary to; keys are column names | ||
what is broadcasting? | pd[‘new’]=7 | all the values of the new column will be 7 | ||
how to change df’s column names, index names? | pd.columns = [‘a’,’b’,…]
pd.index = [‘c’,’d’,…] |
assign value directly | ||
when read csv, how to specify names of the column | pd.read_csv(path, names=[‘a’,’b’,…..]) | instead, pass header=None will prevent pandas using data as column names, but use 0,1,2,3 …. | ||
when read csv, how to let pandas to turn some specific values into NaN? | pd.read_csv(path, na_values = ‘-1’)
pdf.read_csv(path, na_values = {‘column3’:[‘ -2’, ‘wtf’,…]}) |
all the values which is character ‘-1’ will be rendered to NaN | ||
how to parse data in reading csv | pd.read_csv(path, parse_dates = [[0,1,2]]) | pandas will parse column 1, 2, 3 into one datetype column | ||
does index of df have a name? | pd.index.name = ‘xxx’ | assign a name to the index of df | ||
how to save df to a csv file with other delimiters rather than ‘,’ | pd.to_csv(path, sep=’\t’) | save to a csv file which separates data by tab |
how to batch convert string to Date type
df[‘datestring’]=pd.to_datetime(df[‘datestring’])