My Data Science & Data Engineer Project Distributed computing with 120 CPUs using H2O I just want to share a data science project I completed recently, with the integration of data engineer concepts to data science. Data Engineer, data science, H2O, pythondata cleasing, data retrieve, Deploy on Linux, jupyter notebook, Python, statistic, vitualization Posted on May 15, 2018
Simple LDA Topic Modeling in Python: implementation and visualization, without delve into the Math The very simple approach to train a topic model in LDA within 10 minutes! Plot words importance topic modeling, topic modeling python lda visualization gensim pyldavis nltkdata cleasing, Python, text mining, topic modeling, unsupervised learning Posted on April 25, 2017
unsupervised learning-4 discovering interpretable features Non-negative matrix factorization (NMF) word-frequency array apply dimensional reduction techniques on image PCA NMF SVD NMF components, PCA components Cosine similarity learning, longly, midnight, superdata cleasing, jupyter notebook, project, Python, text mining, unsupervised learning Posted on February 20, 2017
Manipulate database -2 SQLAlchemy PostgreSQL, filtering, grouping, aggregating funcs To Pandas DataFrame and plotting with matplotlib and seaborn SQLAlchemy connecting to PostgreSQL, filtering, grouping, aggregating funcs To Pandas DataFrame and plotting with matplotlib and seaborn database, fun, learning, sqlachemydata cleasing, data retrieve, database, jupyter notebook, Python, vitualization Posted on February 18, 2017
unsupervised learning-3 Dimension reduction: PCA, tf-idf, sparse matrix, twitter posts clustering Intrinsic dimension, text mining, Word frequency arrays, csr_matrix, TruncatedSVD Dimension reduction: PCA, Intrinsic dimension tf-idf, Word frequency arrays sparse matrix, csr_matrix, TruncatedSVD fun, pca, text mining, tf-idfdata cleasing, jupyter notebook, Python, statistic, text mining, unsupervised learning Posted on February 18, 2017
Visualization with Seaborn statistic Python Seaborn visualizing regressions group by categorical feature plot Residuals Higher-order regressions Visualizing univariate distributions Visualizing multivariate distributions learning, seaborndata cleasing, jupyter notebook, Python, statistic, vitualization Posted on February 17, 2017
unsupervised learning -1 k-means clustering, Cross tabulation, Inertia, PCA, StandardScaler, Pipline k-means clustering, Cross tabulation, Inertia, PCA, StandardScaler, Pipline fun, learning, Machine Learningdata cleasing, jupyter notebook, Python, unsupervised learning Posted on February 16, 2017
famous iris dataset visualization matplotlib seaborn 3d scatterplot jointplot FacetGrid boxplot stripplot violinplot kdeplot pairplot Andrews Curves parallel_coordinates radviz fun, learning, plottingdata cleasing, jupyter notebook, matplotlib, Python, vitualization Posted on February 16, 2017
Pandas -5 practicing grouping filtering visualizing learningdata cleasing, jupyter notebook, Pandas, Python Posted on February 14, 2017