Basics of Entity Resolution

with Python and Dedupe

Entity resolution (ER) is the task of disambiguating records that correspond to real world entities across and within datasets. The applications of entity resolution are tremendous, particularly for public sector and federal datasets related to health, transportation, finance, law enforcement, and antiterrorism.

March 11, 2017

Data Exploration with Python, Part 2

Preparing Your Data to be Explored

This is the second post in our Data Exploration with Python series. Before reading this post, make sure to check out Data Exploration with Python, Part 1!

Mise en place (noun): In a professional kitchen, the disciplined organization and preparation of equipment and food before service begins.

February 07, 2017

Forward Propagation: Building a Skip-Gram Net From the Ground Up

Part 1: Skip-gram Feedforward

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab. Make sure to check out the other posts in the series so far:

January 12, 2017

Data Exploration with Python, Part 1

Preparing Yourself to Become a Great Explorer

Exploratory data analysis (EDA) is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with. It is exploratory analysis that gives us a sense of what additional work should be performed to quantify and extract insights from our data. It also . . .

December 29, 2016

Python Exception Handling Basics

Exceptions are a crucial part of higher level languages, and although exceptions might be frustrating when they occur, they are your friend. The alternative to an exception is a panic — an error in execution that at best simply makes the program die and at worst can cause a blue screen of death. Exceptions, on the other hand, are tools

December 04, 2016

Principal Component Analysis with Python

An Overview and Tutorial

The amount of data generated each day from sources such as scientific experiments, cell phones, and smartwatches has been growing exponentially over the last several years. Not only are the number data sources increasing, but the data itself is also growing richer as the number of features in the data increases. Datasets with a large number

August 31, 2016

NLP Research Lab Part 2: Skip-Gram Architecture Overview

Editor's Note: This post is part of a series based on the research conducted in District Data Labs' NLP Research Lab. Make sure to check out NLP Research Lab Part 1: Distributed Representations.

Chances are, if you've been working in Natural Language Processing (NLP) or machine learning, you've heard of the class of approaches called

August 02, 2016