the Python challange

Python get local varible, call

locals()

the Python challange

level 0 

2**38

1<<38

pow(2,38)

level 1

s="""
g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle. sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj.
"""
import string

mapping=string.maketrans('abcedfghijklmnopqrstuvwxyz','cedfghijklmnopqrstuvwxyzab')

print s.translate(mapping)

level 2

 

import re

data = """copy html source code highlighted"""

print "".join(re.findall("[A-Za-z]", data))

level 3

print "".join(re.findall("[^A-Z]+[A-Z]{3}([a-z])[A-Z]{3}[^A-Z]+", data))

level 4

import urllib

nothing = '12345'
while True:
    content = urllib.urlopen('http://pythonchallenge.com/pc/def/linkedlist.php?nothing=%(nothing)s' % locals()).read()
    try:
        nothing = str(int(content.split(' ')[-1]))
    except ValueError:
        if 'Divide' in content:
            nothing = str(int(nothing) / 2)
        else:
            break

print content
import requests
import re

next = "12345"

while next.isdigit():
    p = requests.get('http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=%s' % next)
    if 'Divide' in p.content:
        next = str(int(next) / 2)
    else:
        message = "".join(re.findall('nothing is [0-9]*', p.content))
        next = message.lstrip('nothing is ')

print p.content

 

calculate 7-day retention rate for mobile apps on a csv dataset

source

assumptions

  • The data set is small enough, the function will load the entire dataset and perform calculation on the entire data set when the calculate() is called, so that a 7-day retention rate of a specific time peroid needs to be retrieved, the calculation time will be instant.

    • if the data is streaming or dataset is too big, I shall design models load data streamingly or by block and/or perform calculation on the subset dataset sliced specific time period.
  • The filter by OS will be cross-platform

    • for example, if we want to calculate the 7-day retention rate filtered by ios OS of a day 9-10-2016, and that day is the first time user X opened app on that day on OS system, but EARLIER user user X opened app on 9-5-2016 on a andriod system. In that case, the model will not count user X as a new user, and user X's info will not be used to calculate 7-day retention rate for 9-10-2016 onwards.
      • this assumption can be simply revered by changing very few codes, which will let the model count user X as a new user even if user X opened app on the early days on other OS.

from model import functions

In [1]:
from retention import load_data, filter_data, analyze, interval_rate  
from retention import get_stat_df, get_all_data, get_filltered_data, plotting

Use question 1 as an example

a. What was the overall Day-7 retention over the month of September?

function load csv data

In [2]:
path='sample_data.csv'
load_data(path)
used: 1.47s
from  2016-09-01 to  2016-10-31
142327 records

Perform calculation on the entire dataset, if filter is needed, func filter_data should be called before analyze()

In [3]:
analyze()
used: 5.38s
during 61 days period between 2016-09-01 and 2016-10-31
The overall 7 days retention rate is 4.97%
call get_stat_df() to get details, which will return a DataFrame
call interval_rate("start_date","end_date") for 7-day retention rate of specific time period

calculate 7-day retention rate of a specified time period

In [4]:
interval_rate('9-1-16','9-30-16')
Out[4]:
0.0492223692918597

Optional

call get_stat_df() will return the calculated DataFrame contains specific information of each day,columns, for reference

  • day1 represents the number of NEW users on that day
  • day7 tell us the Unique(not new) users on 7 days after that day
  • 'matched' contains the number that users from day1 reopened 7 days after that day
In [5]:
df=get_stat_df()
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 61 entries, 2016-09-01 to 2016-10-31
Data columns (total 4 columns):
day1                    61 non-null int64
day7                    61 non-null int64
matched                 61 non-null int64
single_day_retention    58 non-null float64
dtypes: float64(1), int64(3)
memory usage: 2.4 KB
In [6]:
df.head()
Out[6]:
day1 day7 matched single_day_retention
date
2016-09-01 154 188 6 0.038961
2016-09-02 171 209 5 0.029240
2016-09-03 232 235 5 0.021552
2016-09-04 180 219 3 0.016667
2016-09-05 114 237 6 0.052632

call get_all_data() will return the raw data will use to analyze, for reference

In [7]:
data=get_all_data()
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 142327 entries, 2016-09-01 to 2016-10-31
Data columns (total 3 columns):
user_id        142327 non-null object
os_name        142327 non-null object
app_version    142327 non-null object
dtypes: object(3)
memory usage: 4.3+ MB
In [8]:
data.head()
Out[8]:
user_id os_name app_version
2016-09-01 8f07aa09-530d-4571-86fd-269cf6a255e7 android 2.5.1
2016-09-01 8f07aa09-530d-4571-86fd-269cf6a255e7 android 2.5.1
2016-09-01 8f07aa09-530d-4571-86fd-269cf6a255e7 android 2.5.1
2016-09-01 d60aae20-1c98-48e6-8b2f-72fbcb13bef5 android 2.5.1
2016-09-01 d60aae20-1c98-48e6-8b2f-72fbcb13bef5 android 2.5.1

call plotting() will simply plot line charts from DataFrame returned by get_stat_df()

  • just simple plotting, no further analysis
  • pass a NAME will save image as html file, otherwise will plot in jupyter notebook
In [9]:
plotting()
Loading BokehJS ...

save graph as test.html

In [11]:
# plotting('test')

b. What was the Day-7 retention from September 8 through September 10 for Android users?

filter data by os 'android'

In [12]:
filter_data('android')
filtered by android
86393 records
  • every time choose to filter data, retention rates should be reculculate
    • so call analyze() then analyze
In [ ]:
analyze()

get 7-day retention rate of September 8 through September 10

In [13]:
interval_rate('9-8-16','9-10-16')
Out[13]:
0.05851063829787234

c. What was the Day-7 retention over the month of September for iOS users using version 6.5?

filter data first

In [14]:
filter_data('ios','6.5.0')
filtered by ios, 6.5.0
808 records

get 7-day retention rate of September

then analyze

In [15]:
analyze()
used: 1.90s
during 61 days period between 2016-09-01 and 2016-10-31
The overall 7 days retention rate is 0.00%
call get_stat_df() to get details, which will return a DataFrame
call interval_rate("start_date","end_date") for 7-day retention rate of specific time period
In [16]:
interval_rate('9-1-16','9-30-16')
Out[16]:
0.0

why 0? examine

In [17]:
df=get_stat_df()
In [18]:
df.head()
Out[18]:
day1 day7 matched single_day_retention
date
2016-09-01 0 0 0 NaN
2016-09-02 0 0 0 NaN
2016-09-03 0 0 0 NaN
2016-09-04 0 0 0 NaN
2016-09-05 0 0 0 NaN
In [19]:
df['matched'].sum()
Out[19]:
0L

0 retention rate due to 0 records matched

In [20]:
df
Out[20]:
day1 day7 matched single_day_retention
date
2016-09-01 0 0 0 NaN
2016-09-02 0 0 0 NaN
2016-09-03 0 0 0 NaN
2016-09-04 0 0 0 NaN
2016-09-05 0 0 0 NaN
2016-09-06 0 0 0 NaN
2016-09-07 0 0 0 NaN
2016-09-08 0 0 0 NaN
2016-09-09 0 0 0 NaN
2016-09-10 0 0 0 NaN
2016-09-11 0 0 0 NaN
2016-09-12 0 0 0 NaN
2016-09-13 0 0 0 NaN
2016-09-14 0 0 0 NaN
2016-09-15 0 0 0 NaN
2016-09-16 0 0 0 NaN
2016-09-17 0 0 0 NaN
2016-09-18 0 0 0 NaN
2016-09-19 0 0 0 NaN
2016-09-20 0 0 0 NaN
2016-09-21 0 0 0 NaN
2016-09-22 0 132 0 NaN
2016-09-23 0 86 0 NaN
2016-09-24 0 16 0 NaN
2016-09-25 0 9 0 NaN
2016-09-26 0 1 0 NaN
2016-09-27 0 0 0 NaN
2016-09-28 0 1 0 NaN
2016-09-29 48 0 0 0.0
2016-09-30 6 1 0 0.0
... ... ... ... ...
2016-10-02 0 1 0 NaN
2016-10-03 0 0 0 NaN
2016-10-04 0 0 0 NaN
2016-10-05 0 0 0 NaN
2016-10-06 0 0 0 NaN
2016-10-07 0 0 0 NaN
2016-10-08 0 0 0 NaN
2016-10-09 0 0 0 NaN
2016-10-10 0 0 0 NaN
2016-10-11 0 1 0 NaN
2016-10-12 0 0 0 NaN
2016-10-13 0 0 0 NaN
2016-10-14 0 0 0 NaN
2016-10-15 0 0 0 NaN
2016-10-16 0 1 0 NaN
2016-10-17 0 0 0 NaN
2016-10-18 0 0 0 NaN
2016-10-19 0 0 0 NaN
2016-10-20 0 1 0 NaN
2016-10-21 0 0 0 NaN
2016-10-22 0 0 0 NaN
2016-10-23 1 0 0 0.0
2016-10-24 0 0 0 NaN
2016-10-25 0 0 0 NaN
2016-10-26 0 0 0 NaN
2016-10-27 0 0 0 NaN
2016-10-28 0 0 0 NaN
2016-10-29 0 0 0 NaN
2016-10-30 0 0 0 NaN
2016-10-31 0 0 0 NaN

61 rows × 4 columns

In [ ]:
 

Linux login without passwd, port forwarding, X11 forwarding

Login in Without Password

simply exec

$     ssh-keygen

press enter till the end, then exec

$    ssh-copy-id  user@host      (from client to host, the host is the server you want to log in)

enter password, done! next time you don’t need password to log in.

 

Port forwarding and X11 forwarding 

When login, simple add  -L (port forwarding), and optional -X (X11 forwarding)

$    ssh  -X -L    port: localhost: port   user@host

first port is the client machine’s localhost’s port

localhost is the server machine’s url , can be diff of localhost, e.g. google.com , where server machine’s internet access google.com, that is the idea of proxy, e.g. access google from China (client machine) login to server machine, forwarding server machine’s access of google to client machine’s localhost

second port is the port of the middle url (localhost in this case), if this port set to 80, it is the default webpage, where we do not need to specify port

downgrade wordpress

I just found out some plugins do not work with wordpress 4.7

So downgrade to a earlier version and enjoy 100% of the plugins then wait for the more stable new version is a good choice.

Install plugin

WP Downgrade

then go to settings -> WP Downgrade

type in the version you want to use, reinstall.

Boom

set up jupyter notebook for R

First install anaconda whatever

https://www.continuum.io/blog/developer/jupyter-and-conda-r

 

in the command window

Once you have conda, you may install “R Essentials” into the current environment:

conda install -c r r-essentials
Bash

or create a new environment just for “R essentials”:

conda create -n my-r-env -c r r-essentials

Jupyter

Jupyter provides a great notebook interface to write your analysis and share it with your peers. Open a shell and run this command to start the Jupyter notebook interface in your browser:

jupyter notebook
Bash

Start a new R notebook:

create an R notebook with jupyter

Algorithms

Algorithms

Simple Array Sum

a='1 2 3 4 10 11'
arr = map(int,a.strip().split(' '))
print arr

#!/bin/python

import sys


n = int(raw_input().strip())
arr = map(int,raw_input().strip().split(' '))
print arr

 

Compare the Triplets

#!/bin/python

import sys


a0,a1,a2 = raw_input().strip().split(' ')
a0,a1,a2 = [int(a0),int(a1),int(a2)]
b0,b1,b2 = raw_input().strip().split(' ')
b0,b1,b2 = [int(b0),int(b1),int(b2)]

x=0;y=0

def je(m,n):
    global x,y
    if m>n:
        x+=1
    elif m<n:    
        y+=1

je(a0,b0)
je(a1,b1)
je(a2,b2)
print x,y

 

Diagonal Difference

Sample Input

3
11 2 4
4 5 6
10 8 -12

Sample Output

15

 

Explanation

The primary diagonal is:
11
5
-12

Sum across the primary diagonal: 11 + 5 – 12 = 4

The secondary diagonal is:
4
5
10
Sum across the secondary diagonal: 4 + 5 + 10 = 19
Difference: |4 – 19| = 15

#!/bin/python

import sys


n = int(raw_input().strip())
a = []
for a_i in xrange(n):
    a_temp = map(int,raw_input().strip().split(' '))
    a.append(a_temp)

l=0
r=0
for i in range(len(a)):
    l+=a[i][i]
    r+=a[::-1][i][i]
print abs(l-r)

 

 staircase

n = int(input())
for i in range(1,n+1):
 print(('#'*i).rjust(n,' '))

OR

n = int(raw_input())
for i in range(1,n+1):
    print " "*(n-i) + "#"*i

OR

#!/bin/python

import sys


n = int(raw_input().strip())

for i in range(n,0,-1):
    print ' '*(i-1)+'#'*(n-(i-1))

 

Time Conversion

#!/bin/python

import sys


time = raw_input().strip()

if time[-2:]=='PM':
    if int(time[:2])==12:
        h='12'+time[2:-2]
    else:
        h=str(int(time[:2])+12)+time[2:-2]
if time[-2:]=='AM':
    if int(time[:2])==12:
        h='00'+time[2:-2]
    else:
        h=str(int(time[:2]))+time[2:-2]
        
    
print h



#!/bin/python

from time import strptime, strftime

print strftime("%H:%M:%S", strptime(raw_input(), "%I:%M:%S%p"))

 

 Circular Array Rotation

#!/bin/python

import sys


n,k,m = raw_input().strip().split(' ')
n,k,m = [int(n),int(k),int(m)]
arr = map(int,raw_input().strip().split(' '))



k %= n
arr = arr[-k:] + arr[:-k]
for i in range(m):
    print(arr[int(input())])

 

exclude directory using find

find -name "*.js" -not -path "./directory/*"

#example
find . -name "*.py" -not -path "./anaconda2/*"


# copy the files find to a directory
find -name "*.js" -not -path "./directory/*" -exec cp {} temp/ \;

find . -name "*.py" -not -path "./anaconda2/*" -exec cp {} pycoll/ \;