District Data Labs

Hands-on data science tutorials, lessons, and other awesome content.

All Categories: python(23) machine learning(9) visualization(8) nlp(6) wrangling(5) r(3) exploratory analysis(3) graph analytics(2) network graph(2) probability(2) dedupe(1) hadoop(1) spark(1) data products(1) open source(1) entity resolution(1) home

Data Exploration with Python, Part 2

Preparing Your Data to be Explored

This is the second post in our Data Exploration with Python series. Before reading this post, make sure to check out Data Exploration with Python, Part 1!

Mise en place (noun): In a professional kitchen, the disciplined organization and preparation of equipment and food before service begins.

When performing . . .

Posted in: exploratory analysis python visualization wrangling

February 07, 2017

Data Exploration with Python, Part 1

Preparing Yourself to Become a Great Explorer

Exploratory data analysis (EDA) is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with. It is exploratory analysis that gives us a sense of what additional work should be performed to quantify and extract insights from our data. It also . . .

Posted in: exploratory analysis python visualization wrangling

December 29, 2016

Building a Classifier from Census Data

An end-to-end machine learning example using Pandas and Scikit-Learn

One of the machine learning workshops given to students in the Georgetown Data Science Certificate is to build a classification, regression, or clustering model using one of the UCI Machine Learning Repository datasets. The idea behind the workshop is to ingest data from a website, perform some initial analyses to get a sense for what's . . .

Posted in: machine learning python wrangling

May 02, 2016

A Practical Guide to Anonymizing Datasets with Python & Faker

How Not to Lose Friends and Alienate People

If you want to keep a secret, you must also hide it from yourself.

— George Orwell 1984

In order to learn (or teach) data science you need data (surprise!). The best libraries often come with a toy dataset to illustrate examples of how the code works. However, nothing can replace an actual, non-trivial . . .

Posted in: python wrangling

March 02, 2016

Simple CSV Data Wrangling with Python

Efficient Processing, Schemas, and Serialization

I wanted to write a quick post today about a task that most of us do routinely but often think very little about - loading CSV (comma-separated value) data into Python. This simple action has a variety of obstacles that need to be overcome due to the nature of serialization and data transfer. In fact, I'm routinely surprised how often I . . .