District Data Labs

Hands-on data science tutorials, lessons, and other awesome content.

All Categories: python(23) machine learning(9) visualization(8) nlp(6) wrangling(5) r(3) exploratory analysis(3) graph analytics(2) network graph(2) probability(2) dedupe(1) hadoop(1) spark(1) data products(1) open source(1) entity resolution(1) home

Forward Propagation: Building a Skip-Gram Net From the Ground Up

Part 1: Skip-gram Feedforward

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab. Make sure to check out the other posts in the series so far:

Let's . . .

Posted in: machine learning nlp python

January 12, 2017

Principal Component Analysis with Python

An Overview and Tutorial

The amount of data generated each day from sources such as scientific experiments, cell phones, and smartwatches has been growing exponentially over the last several years. Not only are the number data sources increasing, but the data itself is also growing richer as the number of features in the data increases. Datasets with a large number . . .

Posted in: machine learning python

August 31, 2016

NLP Research Lab Part 2: Skip-Gram Architecture Overview

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab. Make sure to check out NLP Research Lab Part 1: Distributed Representations.

Chances are, if you’ve been working in Natural Language Processing (NLP) or machine learning, you’ve heard of the class of . . .

Posted in: machine learning nlp python

August 02, 2016

NLP Research Lab Part 1: Distributed Representations

How I Learned To Stop Worrying And Love Word Embeddings

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab.

This post is about Distributed Representations, a concept that is foundational not only to the understanding of data processing in machine learning, but also to the understanding of information processing and . . .

Posted in: machine learning nlp python

July 27, 2016

Visual Diagnostics for More Informed Machine Learning: Part 3

Visual Evaluation and Parameter Tuning

Note: Before starting Part 3, be sure to read Part 1 and Part 2!

Welcome back! In this final installment of Visual Diagnostics for More Informed Machine Learning, we'll close the loop on visualization tools for navigating the different phases of the machine learning workflow. Recall that we are framing the workflow in terms of . . .

Posted in: machine learning python visualization

May 25, 2016

Visual Diagnostics for More Informed Machine Learning: Part 2

Demystifying Model Selection

Note: Before starting Part 2, be sure to read Part 1!

When it comes to machine learning, ultimately the most important picture to have is the big picture. Discussions of (i.e. arguments about) machine learning are usually about which model is the best. Whether it's logistic regression, random forests, Bayesian methods, support . . .

Posted in: machine learning python visualization

May 23, 2016

Visual Diagnostics for More Informed Machine Learning: Part 1

Feature Analysis

How could they see anything but the shadows if they were never allowed to move their heads?

— Plato The Allegory of the Cave

Python and high level libraries like Scikit-learn, TensorFlow, NLTK, PyBrain, Theano, and MLPY have made machine learning accessible to a broad programming community that might never . . .

Posted in: machine learning python visualization

May 19, 2016

← Previous 1 2 Next →