District Data Labs

Getting Started in Open Source

A Primer for Data Scientists

I really, honestly love programming... I also love collaborating, exchanging ideas, learning better and faster ways to accomplish things that I'm already familiar with or, even better, learning completely new things that broaden my horizons as a developer or person... I enjoy getting feedback from friends - or programmers I'm . . .

Read More

Posted in: open source

November 05, 2016

Principal Component Analysis with Python

An Overview and Tutorial

The amount of data generated each day from sources such as scientific experiments, cell phones, and smartwatches has been growing exponentially over the last several years. Not only are the number data sources increasing, but the data itself is also growing richer as the number of features in the data increases. Datasets with a large number . . .

Read More

August 31, 2016

NLP Research Lab Part 2: Skip-Gram Architecture Overview

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab. Make sure to check out NLP Research Lab Part 1: Distributed Representations.

Chances are, if you’ve been working in Natural Language Processing (NLP) or machine learning, you’ve heard of the class of approaches called . . .

Read More

August 02, 2016

NLP Research Lab Part 1: Distributed Representations

How I Learned To Stop Worrying And Love Word Embeddings

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab.

This post is about Distributed Representations, a concept that is foundational not only to the understanding of data processing in machine learning, but also to the understanding of information processing and storage . . .

Read More

July 27, 2016

Beyond the Word Cloud

Visualizing Text with Python

In this article, we explore two extremely powerful ways to visualize text: word bubbles and word networks. These two visualizations are replacing word clouds as the defacto text visualization of choice because they are simple to create, understandable, and provide deep and valuable at-a-glance insights. In this post, we will examine how to . . .

Read More

July 26, 2016

District Data Labs PyCon Recap

Overview of our Talk, Tutorial, Posters, and Sprints

Last week, a group of us from District Data Labs flew to Portland, Oregon to attend PyCon, the largest annual gathering for the Python community. We had a talk, a tutorial, and two posters accepted to the conference, and we also hosted development sprints for several open source projects. With this blog post, we are putting everything . . .

Read More

June 09, 2016

Visual Diagnostics for More Informed Machine Learning: Part 3

Visual Evaluation and Parameter Tuning

Note: Before starting Part 3, be sure to read Part 1 and Part 2!

Welcome back! In this final installment of Visual Diagnostics for More Informed Machine Learning, we'll close the loop on visualization tools for navigating the different phases of the machine learning workflow. Recall that we are framing the workflow in terms of the . . .

Read More

May 25, 2016