Manipulating DataFrames with pandas¶

Extracting, filtering, and transforming data from DataFrames
Advanced indexing with multiple levels
Tidying, rearranging and restructuring your data
Pivoting, melting, and stacking DataFrames
Identifying and spli!ing DataFrames by groups

Indexing and column rearrangement¶

# Import pandas
import pandas as pd

# Read in filename and set the index: election
df = pd.read_csv('tweets.csv', index_col='created_at', parse_dates=True)

# Create a separate dataframe with the columns ['entities', 'favorited', 'text']: results
results = df[['entities', 'favorited', 'text']]

# Print the output of results.head()
# print(results.head())
results.head(3)

df.index

DatetimeIndex(['2017-02-10 05:24:51', '2017-02-10 05:24:57',
               '2017-02-10 05:25:38', '2017-02-10 05:25:42',
               '2017-02-10 05:25:48', '2017-02-10 05:25:53',
               '2017-02-10 05:25:58', '2017-02-10 05:26:12',
               '2017-02-10 05:26:27', '2017-02-10 05:26:30',
               ...
               '2017-02-10 04:27:03', '2017-02-10 04:27:18',
               '2017-02-10 04:27:31', '2017-02-10 04:27:34',
               '2017-02-10 04:27:36', '2017-02-10 04:27:39',
               '2017-02-10 04:27:41', '2017-02-10 04:27:54',
               '2017-02-10 04:28:25', '2017-02-10 04:28:36'],
              dtype='datetime64[ns]', name=u'created_at', length=615, freq=None)

Slicing and indexing a Series¶

.loc[] if NO "," used, will slice index

df.loc['02-10-17 05:24':'02-10-17 05:24']

use :num for filtering and ordering¶

df.loc['02-10-17 05:24':'02-10-17 05:24':-2]

df.iloc[1:7:2,4:14:3]

[df.entities]==[df['entities']]

True

df['text'][2:7]

created_at
2017-02-10 05:25:38    RT @alaskantexanQCT: I'm not "White" but still...
2017-02-10 05:25:42    We dont discriminate like Libs, divede by sex,...
2017-02-10 05:25:48    @TMattis7 @babsmarshall1 @Randazzoj @realDonal...
2017-02-10 05:25:53    @Trumpliwood @heifer109 she's just mad because...
2017-02-10 05:25:58    RT @lordaedonis: if you a grown man afraid of ...
Name: text, dtype: object

Using .loc[]¶

df.ix[3:7,'entities':'favorited']

df.loc[:,'entities':'favorited'].head()

df.loc['2-10-17 05:20','entities':'favorited']

df.loc['2-10-17 05:20','entities':'favorited']

enclude list¶

Subselecting DataFrames with lists¶

df.loc['2-10-17 05:24',['entities','text','favorited']]

Using .iloc[]¶

df.iloc[7:11,11:14]

df.iloc[[3,74,233],[11,14,7]]

comparison¶

series and df¶

df['text'].head()

created_at
2017-02-10 05:24:51    RT @MeckeringBoy: Does Pauline secretly wear h...
2017-02-10 05:24:57    RT @lordaedonis: if you a grown man afraid of ...
2017-02-10 05:25:38    RT @alaskantexanQCT: I'm not "White" but still...
2017-02-10 05:25:42    We dont discriminate like Libs, divede by sex,...
2017-02-10 05:25:48    @TMattis7 @babsmarshall1 @Randazzoj @realDonal...
Name: text, dtype: object

df[['text']].head()

Filtering DataFrames¶

Creating a Boolean Series¶

df['truncated']==1

created_at
2017-02-10 05:24:51    False
2017-02-10 05:24:57    False
2017-02-10 05:25:38    False
2017-02-10 05:25:42    False
2017-02-10 05:25:48    False
2017-02-10 05:25:53    False
2017-02-10 05:25:58    False
2017-02-10 05:26:12    False
2017-02-10 05:26:27    False
2017-02-10 05:26:30    False
2017-02-10 05:26:35    False
2017-02-10 05:26:48    False
2017-02-10 05:27:56    False
2017-02-10 05:28:28    False
2017-02-10 05:28:28    False
2017-02-10 05:28:40    False
2017-02-10 05:28:55    False
2017-02-10 05:30:06    False
2017-02-10 05:30:18    False
2017-02-10 05:30:20    False
2017-02-10 05:30:53    False
2017-02-10 05:30:55     True
2017-02-10 05:31:41    False
2017-02-10 05:32:20    False
2017-02-10 05:32:23    False
2017-02-10 05:32:32    False
2017-02-10 05:34:11    False
2017-02-10 05:34:17    False
2017-02-10 05:36:07    False
2017-02-10 05:38:17     True
                       ...  
2017-02-10 04:23:54    False
2017-02-10 04:23:58    False
2017-02-10 04:24:06    False
2017-02-10 04:24:07    False
2017-02-10 04:24:17    False
2017-02-10 04:24:31    False
2017-02-10 04:24:32    False
2017-02-10 04:24:35    False
2017-02-10 04:24:48    False
2017-02-10 04:25:06    False
2017-02-10 04:25:09    False
2017-02-10 04:25:15    False
2017-02-10 04:25:23    False
2017-02-10 04:25:48    False
2017-02-10 04:25:50    False
2017-02-10 04:26:02    False
2017-02-10 04:26:13    False
2017-02-10 04:26:16     True
2017-02-10 04:26:23    False
2017-02-10 04:26:52    False
2017-02-10 04:27:03    False
2017-02-10 04:27:18    False
2017-02-10 04:27:31    False
2017-02-10 04:27:34    False
2017-02-10 04:27:36    False
2017-02-10 04:27:39    False
2017-02-10 04:27:41     True
2017-02-10 04:27:54    False
2017-02-10 04:28:25    False
2017-02-10 04:28:36    False
Name: truncated, dtype: bool

df[df['truncated']==1].head(3)

Combining filters¶

&¶

|¶

df[(df['text'].str.len()>140) & (df['truncated']==1)].head(3)

all()¶

any()¶

Select columns with all nonzeros¶

df.all()

df.all()

contributors                  True
coordinates                   True
display_text_range            True
entities                      True
extended_entities             True
extended_tweet                True
favorite_count               False
favorited                    False
filter_level                  True
geo                           True
id                            True
id_str                        True
in_reply_to_screen_name       True
in_reply_to_status_id         True
in_reply_to_status_id_str     True
in_reply_to_user_id           True
in_reply_to_user_id_str       True
is_quote_status              False
lang                          True
place                         True
possibly_sensitive           False
quoted_status                 True
quoted_status_id              True
quoted_status_id_str          True
retweet_count                False
retweeted                    False
retweeted_status              True
source                        True
text                          True
timestamp_ms                  True
truncated                    False
user                          True
dtype: bool

Select columns with any nonzeros¶

df.any()

contributors                 False
coordinates                  False
display_text_range            True
entities                      True
extended_entities             True
extended_tweet                True
favorite_count               False
favorited                    False
filter_level                  True
geo                          False
id                            True
id_str                        True
in_reply_to_screen_name       True
in_reply_to_status_id         True
in_reply_to_status_id_str     True
in_reply_to_user_id           True
in_reply_to_user_id_str       True
is_quote_status               True
lang                          True
place                         True
possibly_sensitive            True
quoted_status                 True
quoted_status_id              True
quoted_status_id_str          True
retweet_count                False
retweeted                    False
retweeted_status              True
source                        True
text                          True
timestamp_ms                  True
truncated                     True
user                          True
dtype: bool

Select columns with any NaNs¶

df.loc[:, df.isnull().any()]

df.loc[:, df.isnull().all()]

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 615 entries, 2017-02-10 05:24:51 to 2017-02-10 04:28:36
Data columns (total 32 columns):
contributors                 0 non-null float64
coordinates                  0 non-null float64
display_text_range           161 non-null object
entities                     0 non-null float64
extended_entities            4 non-null object
extended_tweet               32 non-null object
favorite_count               615 non-null int64
favorited                    615 non-null bool
filter_level                 615 non-null object
geo                          0 non-null float64
id                           615 non-null int64
id_str                       615 non-null int64
in_reply_to_screen_name      83 non-null object
in_reply_to_status_id        67 non-null float64
in_reply_to_status_id_str    67 non-null float64
in_reply_to_user_id          83 non-null float64
in_reply_to_user_id_str      83 non-null float64
is_quote_status              615 non-null bool
lang                         615 non-null object
place                        12 non-null object
possibly_sensitive           135 non-null object
quoted_status                281 non-null object
quoted_status_id             281 non-null float64
quoted_status_id_str         281 non-null float64
retweet_count                615 non-null int64
retweeted                    615 non-null bool
retweeted_status             405 non-null object
source                       615 non-null object
text                         615 non-null object
timestamp_ms                 615 non-null int64
truncated                    615 non-null bool
user                         615 non-null object
dtypes: bool(4), float64(10), int64(5), object(13)
memory usage: 141.7+ KB

drop ROWS with NAN values¶

df.shape

(615, 32)

df.dropna(how='all').shape

(615, 32)

df.dropna(how='any').shape

(0, 32)

use the thresh= 20 keyword argument to drop rows from full dataset that have more than 20 missing values.¶

df.dropna(thresh=20).shape

(102, 32)

drop COLUMNS with NAN values¶

on y-axis¶

df.dropna(how='all',axis=1).shape

(615, 29)

df.dropna(how='any',axis=1).shape

(615, 14)

use the thresh= 100 keyword argument to drop columns from full dataset that have more than 100 missing values.¶

df.dropna(how='any',axis='columns',thresh=100).shape

(615, 20)

assign NAN values¶

import numpy as np

df.entities=np.nan

df.head(3)

df.loc[df['text'].str.len()>144,'text']

created_at
2017-02-10 05:32:20    RT @kimthecynic: Grab 'em by the pussy-bandet ...
2017-02-10 06:02:52    RT @alii_babygirl: Horny😘 #sex #horny #snapch...
2017-02-10 06:04:56    RT @alii_babygirl: This is perfect😍 #sex #hor...
2017-02-10 05:32:20    RT @kimthecynic: Grab 'em by the pussy-bandet ...
2017-02-10 06:02:52    RT @alii_babygirl: Horny😘 #sex #horny #snapch...
2017-02-10 06:04:56    RT @alii_babygirl: This is perfect😍 #sex #hor...
2017-02-10 03:29:45    #Hannity,if that was Hillary R Obama,yall wuld...
2017-02-10 03:34:52    Gotta support u on that my friend!  Should be ...
2017-02-10 04:08:06    omg its a giant commy liberal entitled pussy w...
2017-02-10 04:21:52    RT @Alanna_T7: If I add any more to this it'll...
Name: text, dtype: object

(df['text'].str.len()>144).sum()

10

Transforming DataFrames¶

### broadcasting

### use build-in function

### use apply() on DataFrame

test = df[['timestamp_ms']].head()

test

test.timestamp_ms.floordiv(7**11)

created_at
2017-02-10 05:24:51    751
2017-02-10 05:24:57    751
2017-02-10 05:25:38    751
2017-02-10 05:25:42    751
2017-02-10 05:25:48    751
Name: timestamp_ms, dtype: int64

test['timestamp_ms'].apply(lambda x: x//7**11)

created_at
2017-02-10 05:24:51    751
2017-02-10 05:24:57    751
2017-02-10 05:25:38    751
2017-02-10 05:25:42    751
2017-02-10 05:25:48    751
Name: timestamp_ms, dtype: int64

test.timestamp_ms.floordiv(7**11) == test['timestamp_ms'].apply(lambda x: x//7**11)

created_at
2017-02-10 05:24:51    True
2017-02-10 05:24:57    True
2017-02-10 05:25:38    True
2017-02-10 05:25:42    True
2017-02-10 05:25:48    True
Name: timestamp_ms, dtype: bool

use string method on DataFrame¶

text = df[['text']].head(10)
text

text['text'] = text['text'].str.upper()
text

for index¶

* No apply method¶

* Use Map() method¶

example:
use map() to transform index's type Datetime to String

text.index

DatetimeIndex(['2017-02-10 05:24:51', '2017-02-10 05:24:57',
               '2017-02-10 05:25:38', '2017-02-10 05:25:42',
               '2017-02-10 05:25:48', '2017-02-10 05:25:53',
               '2017-02-10 05:25:58', '2017-02-10 05:26:12',
               '2017-02-10 05:26:27', '2017-02-10 05:26:30'],
              dtype='datetime64[ns]', name=u'created_at', freq=None)

text.index.map(str)

array(['2017-02-10 05:24:51', '2017-02-10 05:24:57', '2017-02-10 05:25:38',
       '2017-02-10 05:25:42', '2017-02-10 05:25:48', '2017-02-10 05:25:53',
       '2017-02-10 05:25:58', '2017-02-10 05:26:12', '2017-02-10 05:26:27',
       '2017-02-10 05:26:30'], dtype=object)

text.index.weekday_name

array(['Friday', 'Friday', 'Friday', 'Friday', 'Friday', 'Friday',
       'Friday', 'Friday', 'Friday', 'Friday'], dtype=object)

apply() on DataFrame with multiple columns¶

df[['timestamp_ms','quoted_status_id']].apply(lambda x: x%7).head()

Using .map() with a dictionary¶

The .map() method is used to transform values according to a Python dictionary look-up.¶

pre = {True: 'truth', False: 'lie'}

df['truncated'].map(pre).head(30)

created_at
2017-02-10 05:24:51      lie
2017-02-10 05:24:57      lie
2017-02-10 05:25:38      lie
2017-02-10 05:25:42      lie
2017-02-10 05:25:48      lie
2017-02-10 05:25:53      lie
2017-02-10 05:25:58      lie
2017-02-10 05:26:12      lie
2017-02-10 05:26:27      lie
2017-02-10 05:26:30      lie
2017-02-10 05:26:35      lie
2017-02-10 05:26:48      lie
2017-02-10 05:27:56      lie
2017-02-10 05:28:28      lie
2017-02-10 05:28:28      lie
2017-02-10 05:28:40      lie
2017-02-10 05:28:55      lie
2017-02-10 05:30:06      lie
2017-02-10 05:30:18      lie
2017-02-10 05:30:20      lie
2017-02-10 05:30:53      lie
2017-02-10 05:30:55    truth
2017-02-10 05:31:41      lie
2017-02-10 05:32:20      lie
2017-02-10 05:32:23      lie
2017-02-10 05:32:32      lie
2017-02-10 05:34:11      lie
2017-02-10 05:34:17      lie
2017-02-10 05:36:07      lie
2017-02-10 05:38:17    truth
Name: truncated, dtype: object

Using vectorized functions¶

When performance is paramount, you should avoid using .apply() and .map()¶

### because those constructs perform Python for-loops over the data stored in a pandas Series or DataFrame.
### By using vectorized functions instead, you can loop over the data at the same speed as compiled code (C, Fortran, etc.)!
### NumPy, SciPy and pandas come with a variety of vectorized functions (called Universal Functions or UFuncs in NumPy).
- You can even write your own vectorized functions

In statistics, the z-score is the number of standard deviations by which an observation is above the mean - so if it is negative, it means the observation is below the mean.
- nstead of using .apply() as you did in the earlier exercises, the zscore UFunc will take a pandas Series as input and return a NumPy array. You will then assign the values of the NumPy array to a new column in the DataFrame

import pandas as pd

f = pd.read_csv('census.csv',chunksize=10000)
df2=f.next()
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3193 entries, 0 to 3192
Data columns (total 100 columns):
SUMLEV                   3193 non-null int64
REGION                   3193 non-null int64
DIVISION                 3193 non-null int64
STATE                    3193 non-null int64
COUNTY                   3193 non-null int64
STNAME                   3193 non-null object
CTYNAME                  3193 non-null object
CENSUS2010POP            3193 non-null int64
ESTIMATESBASE2010        3193 non-null int64
POPESTIMATE2010          3193 non-null int64
POPESTIMATE2011          3193 non-null int64
POPESTIMATE2012          3193 non-null int64
POPESTIMATE2013          3193 non-null int64
POPESTIMATE2014          3193 non-null int64
POPESTIMATE2015          3193 non-null int64
NPOPCHG_2010             3193 non-null int64
NPOPCHG_2011             3193 non-null int64
NPOPCHG_2012             3193 non-null int64
NPOPCHG_2013             3193 non-null int64
NPOPCHG_2014             3193 non-null int64
NPOPCHG_2015             3193 non-null int64
BIRTHS2010               3193 non-null int64
BIRTHS2011               3193 non-null int64
BIRTHS2012               3193 non-null int64
BIRTHS2013               3193 non-null int64
BIRTHS2014               3193 non-null int64
BIRTHS2015               3193 non-null int64
DEATHS2010               3193 non-null int64
DEATHS2011               3193 non-null int64
DEATHS2012               3193 non-null int64
DEATHS2013               3193 non-null int64
DEATHS2014               3193 non-null int64
DEATHS2015               3193 non-null int64
NATURALINC2010           3193 non-null int64
NATURALINC2011           3193 non-null int64
NATURALINC2012           3193 non-null int64
NATURALINC2013           3193 non-null int64
NATURALINC2014           3193 non-null int64
NATURALINC2015           3193 non-null int64
INTERNATIONALMIG2010     3193 non-null int64
INTERNATIONALMIG2011     3193 non-null int64
INTERNATIONALMIG2012     3193 non-null int64
INTERNATIONALMIG2013     3193 non-null int64
INTERNATIONALMIG2014     3193 non-null int64
INTERNATIONALMIG2015     3193 non-null int64
DOMESTICMIG2010          3193 non-null int64
DOMESTICMIG2011          3193 non-null int64
DOMESTICMIG2012          3193 non-null int64
DOMESTICMIG2013          3193 non-null int64
DOMESTICMIG2014          3193 non-null int64
DOMESTICMIG2015          3193 non-null int64
NETMIG2010               3193 non-null int64
NETMIG2011               3193 non-null int64
NETMIG2012               3193 non-null int64
NETMIG2013               3193 non-null int64
NETMIG2014               3193 non-null int64
NETMIG2015               3193 non-null int64
RESIDUAL2010             3193 non-null int64
RESIDUAL2011             3193 non-null int64
RESIDUAL2012             3193 non-null int64
RESIDUAL2013             3193 non-null int64
RESIDUAL2014             3193 non-null int64
RESIDUAL2015             3193 non-null int64
GQESTIMATESBASE2010      3193 non-null int64
GQESTIMATES2010          3193 non-null int64
GQESTIMATES2011          3193 non-null int64
GQESTIMATES2012          3193 non-null int64
GQESTIMATES2013          3193 non-null int64
GQESTIMATES2014          3193 non-null int64
GQESTIMATES2015          3193 non-null int64
RBIRTH2011               3193 non-null float64
RBIRTH2012               3193 non-null float64
RBIRTH2013               3193 non-null float64
RBIRTH2014               3193 non-null float64
RBIRTH2015               3193 non-null float64
RDEATH2011               3193 non-null float64
RDEATH2012               3193 non-null float64
RDEATH2013               3193 non-null float64
RDEATH2014               3193 non-null float64
RDEATH2015               3193 non-null float64
RNATURALINC2011          3193 non-null float64
RNATURALINC2012          3193 non-null float64
RNATURALINC2013          3193 non-null float64
RNATURALINC2014          3193 non-null float64
RNATURALINC2015          3193 non-null float64
RINTERNATIONALMIG2011    3193 non-null float64
RINTERNATIONALMIG2012    3193 non-null float64
RINTERNATIONALMIG2013    3193 non-null float64
RINTERNATIONALMIG2014    3193 non-null float64
RINTERNATIONALMIG2015    3193 non-null float64
RDOMESTICMIG2011         3193 non-null float64
RDOMESTICMIG2012         3193 non-null float64
RDOMESTICMIG2013         3193 non-null float64
RDOMESTICMIG2014         3193 non-null float64
RDOMESTICMIG2015         3193 non-null float64
RNETMIG2011              3193 non-null float64
RNETMIG2012              3193 non-null float64
RNETMIG2013              3193 non-null float64
RNETMIG2014              3193 non-null float64
RNETMIG2015              3193 non-null float64
dtypes: float64(30), int64(68), object(2)
memory usage: 2.4+ MB

df3 =df2[['CENSUS2010POP','POPESTIMATE2015']]
df3.head()

# Import zscore from scipy.stats
from scipy.stats import zscore
import numpy as np

# Call zscore with election['turnout'] as input: turnout_zscore
turnout_zscore = zscore(df3['CENSUS2010POP'])

# Print the type of turnout_zscore
print(type(turnout_zscore))

# Assign turnout_zscore to a new column: election['turnout_zscore']
df3['turnout_zscore'] = turnout_zscore

# Print the output of election.head()
print(df3.head())

<type 'numpy.ndarray'>
   CENSUS2010POP  POPESTIMATE2015  turnout_zscore
0        4779736          4858979        3.899899
1          54571            55347       -0.118039
2         182265           203709       -0.009457
3          27457            26489       -0.141095
4          22915            22583       -0.144957

C:\Program Files\Anaconda2\lib\site-packages\ipykernel\__main__.py:12: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

df3.head()

%%timeit
turnout_zscore = zscore(df3['CENSUS2010POP'])

10000 loops, best of 3: 49.4 µs per loop

%%timeit
a = df3['CENSUS2010POP'].std()

10000 loops, best of 3: 24.9 µs per loop

%%timeit
b = np.std(df3['CENSUS2010POP'])

10000 loops, best of 3: 26.7 µs per loop

Sorting¶

df3.sort_values('turnout_zscore',ascending=False)

	CENSUS2010POP	POPESTIMATE2015	turnout_zscore
190	37253956	39144818	31.513627
2566	25145561	27469114	21.217524
1860	19378102	19795791	16.313294
329	18801310	20271272	15.822832
608	12830632	12859995	10.745799
2282	12702379	12802503	10.636742
2078	11536504	11613423	9.645366
1253	9883640	9922576	8.239890
209	9818605	10170292	8.184589
397	9687653	10214860	8.073237
1923	9535483	10042802	7.943843
1804	8791894	8958013	7.311548
2866	8001024	8382993	6.639050
3000	6724540	7170351	5.553620
1238	6547629	6794422	5.403188
711	6483802	6619680	5.348914
98	6392017	6828065	5.270867
2470	6346105	6600299	5.231826
1508	5988927	6083672	4.928108
1213	5773552	6006401	4.744969
3096	5686986	5771337	4.671359
1337	5303925	5489594	4.345632
624	5194675	5238216	4.252733
249	5029196	5456574	4.112022
0	4779736	4858979	3.899899
2356	4625364	4896146	3.768633
1131	4533372	4670724	3.690409
1010	4339367	4425092	3.525441
2667	4092459	4538028	3.315489
2245	3831074	4028977	3.093226
...	...	...	...
2763	929	916	-0.163652
1643	884	827	-0.163691
277	843	774	-0.163726
1733	824	804	-0.163742
1773	818	750	-0.163747
2698	808	764	-0.163755
1780	783	829	-0.163777
2028	783	936	-0.163777
1738	763	777	-0.163794
1727	736	732	-0.163817
2068	727	767	-0.163824
1676	718	697	-0.163832
290	712	726	-0.163837
2722	707	820	-0.163841
306	699	701	-0.163848
1838	695	698	-0.163851
1685	690	788	-0.163856
96	662	613	-0.163879
1767	647	684	-0.163892
2583	641	648	-0.163897
1739	632	585	-0.163905
1719	614	641	-0.163920
1740	539	475	-0.163984
1659	494	475	-0.164022
1686	478	487	-0.164036
1684	460	456	-0.164051
2697	416	407	-0.164089
2701	286	282	-0.164199
560	90	89	-0.164366
2717	82	112	-0.164373

	entities	favorited	text
created_at
2017-02-10 05:24:51	{u'user_mentions': [{u'indices': [3, 16], u'sc...	False	RT @MeckeringBoy: Does Pauline secretly wear h...
2017-02-10 05:24:57	{u'user_mentions': [{u'indices': [3, 15], u'sc...	False	RT @lordaedonis: if you a grown man afraid of ...
2017-02-10 05:25:38	{u'user_mentions': [{u'indices': [3, 19], u'sc...	False	RT @alaskantexanQCT: I'm not "White" but still...

	entities	extended_entities	extended_tweet	favorite_count	favorited
created_at
2017-02-10 05:25:42	{u'user_mentions': [], u'symbols': [], u'hasht...	NaN	NaN	0	False
2017-02-10 05:25:48	{u'user_mentions': [{u'indices': [0, 9], u'scr...	NaN	NaN	0	False
2017-02-10 05:25:53	{u'user_mentions': [{u'indices': [0, 12], u'sc...	NaN	NaN	0	False
2017-02-10 05:25:58	{u'user_mentions': [{u'indices': [3, 15], u'sc...	NaN	NaN	0	False

	entities	extended_entities	extended_tweet	favorite_count	favorited
created_at
2017-02-10 05:20:01	{u'user_mentions': [], u'symbols': [], u'hasht...	NaN	NaN	0	False
2017-02-10 05:20:20	{u'user_mentions': [{u'indices': [3, 19], u'sc...	NaN	NaN	0	False
2017-02-10 05:20:31	{u'user_mentions': [{u'indices': [3, 15], u'sc...	NaN	NaN	0	False
2017-02-10 05:20:57	{u'user_mentions': [{u'indices': [3, 19], u'sc...	NaN	NaN	0	False

	id_str	in_reply_to_screen_name	in_reply_to_status_id
created_at
2017-02-10 05:26:12	829924408428945408	juliahosack	8.298853e+17
2017-02-10 05:26:27	829924471314141184	NaN	NaN
2017-02-10 05:26:30	829924483867697154	warier163	8.299240e+17
2017-02-10 05:26:35	829924504683974657	NaN	NaN

	id_str	in_reply_to_status_id_str	favorited
created_at
2017-02-10 05:25:42	829924279940558850	NaN	False
2017-02-10 05:51:15	829930712815919104	NaN	False
2017-02-10 05:12:36	829920984362684416	NaN	False

	contributors	coordinates	display_text_range	entities	extended_entities	extended_tweet	favorite_count	favorited	filter_level	geo	...	quoted_status_id	quoted_status_id_str	retweet_count	retweeted	retweeted_status	source	text	timestamp_ms	truncated	user
created_at
2017-02-10 05:30:55	NaN	NaN	[0, 140]	{u'user_mentions': [], u'symbols': [], u'hasht...	NaN	{u'display_text_range': [0, 130], u'entities':...	0	False	low	NaN	...	8.298905e+17	8.298905e+17	0	False	NaN	<a href="http://twitter.com/download/android" ...	They're whole party is imploding. They're all ...	1486704655983	True	{u'follow_request_sent': None, u'profile_use_b...
2017-02-10 05:38:17	NaN	NaN	[0, 140]	{u'user_mentions': [], u'symbols': [], u'hasht...	NaN	{u'display_text_range': [0, 139], u'entities':...	0	False	low	NaN	...	8.206567e+17	8.206567e+17	0	False	NaN	<a href="http://twitter.com" rel="nofollow">Tw...	Just another spineless pussy, too afraid to st...	1486705097961	True	{u'follow_request_sent': None, u'profile_use_b...
2017-02-10 05:41:06	NaN	NaN	[0, 140]	{u'user_mentions': [], u'symbols': [], u'hasht...	NaN	{u'display_text_range': [0, 136], u'entities':...	0	False	low	NaN	...	8.299271e+17	8.299271e+17	0	False	NaN	<a href="http://twitter.com" rel="nofollow">Tw...	One swift move, and Trump proves he is both ig...	1486705266575	True	{u'follow_request_sent': None, u'profile_use_b...

	timestamp_ms
created_at
2017-02-10 05:24:51	1486704291616
2017-02-10 05:24:57	1486704297304
2017-02-10 05:25:38	1486704338285
2017-02-10 05:25:42	1486704342242
2017-02-10 05:25:48	1486704348065

	CENSUS2010POP	POPESTIMATE2015
0	4779736	4858979
1	54571	55347
2	182265	203709
3	27457	26489
4	22915	22583

	text
created_at
2017-02-10 05:24:51	RT @MECKERINGBOY: DOES PAULINE SECRETLY WEAR H...
2017-02-10 05:24:57	RT @LORDAEDONIS: IF YOU A GROWN MAN AFRAID OF ...
2017-02-10 05:25:38	RT @ALASKANTEXANQCT: I'M NOT "WHITE" BUT STILL...
2017-02-10 05:25:42	WE DONT DISCRIMINATE LIKE LIBS, DIVEDE BY SEX,...
2017-02-10 05:25:48	@TMATTIS7 @BABSMARSHALL1 @RANDAZZOJ @REALDONAL...
2017-02-10 05:25:53	@TRUMPLIWOOD @HEIFER109 SHE'S JUST MAD BECAUSE...
2017-02-10 05:25:58	RT @LORDAEDONIS: IF YOU A GROWN MAN AFRAID OF ...
2017-02-10 05:26:12	@JULIAHOSACK @MARINA_SIRTIS YOUR THE PROBLEM ...
2017-02-10 05:26:27	RT @ALASKANTEXANQCT: I'M NOT "WHITE" BUT STILL...
2017-02-10 05:26:30	@WARIER163 @STEFANMOLYNEUX FROM WHO? OBVIOUSLY...