Data Science Notebook

Menu

Menu

Data Science Notebook

Menu

Menu

Category Archives: data cleasing

My Data Science & Data Engineer Project Distributed computing with 120 CPUs using H2O

I just want to share a data science project I completed recently, with the integration of data engineer concepts to data science.

Data Engineer, data science, H2O, python
data cleasing, data retrieve, Deploy on Linux, jupyter notebook, Python, statistic, vitualization
Posted on May 15, 2018

Simple LDA Topic Modeling in Python: implementation and visualization, without delve into the Math

The very simple approach to train a topic model in LDA

within 10 minutes!

 

Plot words importance

 

topic modeling, topic modeling python lda visualization gensim pyldavis nltk
data cleasing, Python, text mining, topic modeling, unsupervised learning
Posted on April 25, 2017

unsupervised learning-4 discovering interpretable features

Non-negative matrix factorization (NMF)

word-frequency array

apply dimensional reduction techniques on image

  • PCA

  • NMF

  • SVD

NMF components, PCA components

Cosine similarity

 

learning, longly, midnight, super
data cleasing, jupyter notebook, project, Python, text mining, unsupervised learning
Posted on February 20, 2017

Manipulate database -2 SQLAlchemy PostgreSQL, filtering, grouping, aggregating funcs To Pandas DataFrame and plotting with matplotlib and seaborn

SQLAlchemy connecting to PostgreSQL,

filtering,

grouping, aggregating funcs 

To Pandas DataFrame

and plotting with matplotlib and seaborn

database, fun, learning, sqlachemy
data cleasing, data retrieve, database, jupyter notebook, Python, vitualization
Posted on February 18, 2017

unsupervised learning-3 Dimension reduction: PCA, tf-idf, sparse matrix, twitter posts clustering Intrinsic dimension, text mining, Word frequency arrays, csr_matrix, TruncatedSVD

Dimension reduction: PCA,  Intrinsic dimension

tf-idf,  Word frequency arrays

sparse matrix,  csr_matrix, TruncatedSVD

fun, pca, text mining, tf-idf
data cleasing, jupyter notebook, Python, statistic, text mining, unsupervised learning
Posted on February 18, 2017

Visualization with Seaborn statistic Python Seaborn

visualizing regressions

group by categorical feature

plot Residuals

Higher-order regressions

Visualizing univariate distributions

Visualizing multivariate distributions

 

learning, seaborn
data cleasing, jupyter notebook, Python, statistic, vitualization
Posted on February 17, 2017

unsupervised learning -1 k-means clustering, Cross tabulation, Inertia, PCA, StandardScaler, Pipline

k-means clustering,

Cross tabulation,

Inertia,

PCA,

StandardScaler,

Pipline

fun, learning, Machine Learning
data cleasing, jupyter notebook, Python, unsupervised learning
Posted on February 16, 2017

famous iris dataset visualization

matplotlib

seaborn

3d scatterplot

jointplot

FacetGrid

boxplot

stripplot

violinplot

kdeplot

pairplot

Andrews Curves

parallel_coordinates

radviz

fun, learning, plotting
data cleasing, jupyter notebook, matplotlib, Python, vitualization
Posted on February 16, 2017

Pandas -5 practicing

grouping

filtering

visualizing

learning
data cleasing, jupyter notebook, Pandas, Python
Posted on February 14, 2017

Post navigation

Older posts

Log in

  • Register
  • Log in
  • Entries RSS

contact me

Richard Ji

Richard Ji

Categories

  • bokeh (5)
  • Chinese (3)
  • data cleasing (25)
  • data retrieve (13)
  • database (5)
  • deep learning (18)
  • Deploy on Linux (16)
  • excel (2)
  • fun (1)
  • git (1)
  • jupyter notebook (57)
  • keras (12)
  • machine learning (11)
  • matplotlib (6)
  • Pandas (9)
  • practice (9)
  • project (7)
  • Python (69)
  • R (1)
  • source (2)
  • statistic (14)
  • tensorflow (5)
  • text mining (11)
  • tips (1)
  • topic modeling (3)
  • Uncategorized (14)
  • unsupervised learning (6)
  • vitualization (27)
  • wordpress (2)

Recent Posts

  • My Data Science & Data Engineer Project
  • A simple “click” that create LDA topic models for text mining
  • Face Similarity searching ~ landmark detecting
  • Simple LDA Topic Modeling in Python: implementation and visualization, without delve into the Math
  • Sentiment Analysis model deployed!

Click cat

Archives

  • May 2018 (1)
  • February 2018 (1)
  • September 2017 (1)
  • April 2017 (5)
  • March 2017 (12)
  • February 2017 (14)
  • January 2017 (36)
  • December 2016 (3)
  • November 2016 (6)
  • October 2016 (10)
  • August 2016 (2)
  • July 2016 (3)
  • March 2016 (2)
  • February 2016 (10)
  • January 2016 (6)
  • December 2015 (2)
  • November 2015 (1)
  • October 2015 (1)
  • September 2015 (1)
  • April 2015 (1)
  • January 2010 (1)

iridescent data science 

Theme by SiteOrigin.
  • Home
  • Visualization

Categories

  • bokeh
  • Chinese
  • data cleasing
  • data retrieve
  • database
  • deep learning
  • Deploy on Linux
  • excel
  • fun
  • git
  • jupyter notebook
  • keras
  • machine learning
  • matplotlib
  • Pandas
  • practice
  • project
  • Python
  • R
  • source
  • statistic
  • tensorflow
  • text mining
  • tips
  • topic modeling
  • Uncategorized
  • unsupervised learning
  • vitualization
  • wordpress
HTML Snippets Powered By : XYZScripts.com