District Data Labs
Data Exploration with Python, Part 2
Preparing Your Data to be Explored
This is the second post in our Data Exploration with Python series. Before reading this post, make sure to check out Data Exploration with Python, Part 1!
Mise en place (noun): In a professional kitchen, the disciplined organization and preparation of equipment and food before service begins.
When performing . . .
Data Exploration with Python, Part 1
Preparing Yourself to Become a Great Explorer
Exploratory data analysis (EDA) is an important pillar of data science, a critical step required to complete every project regardless of the domain or the type of data you are working with. It is exploratory analysis that gives us a sense of what additional work should be performed to quantify and extract insights from our data. It also . . .
Building a Classifier from Census Data
An end-to-end machine learning example using Pandas and Scikit-Learn
One of the machine learning workshops given to students in the Georgetown Data Science Certificate is to build a classification, regression, or clustering model using one of the UCI Machine Learning Repository datasets. The idea behind the workshop is to ingest data from a website, perform some initial analyses to get a sense for what's . . .
Posted in: machine learningpythonwrangling
A Practical Guide to Anonymizing Datasets with Python & Faker
How Not to Lose Friends and Alienate People
If you want to keep a secret, you must also hide it from yourself.
— George Orwell 1984
In order to learn (or teach) data science you need data (surprise!). The best libraries often come with a toy dataset to illustrate examples of how the code works. However, nothing can replace an actual, non-trivial . . .
Simple CSV Data Wrangling with Python
Efficient Processing, Schemas, and Serialization
I wanted to write a quick post today about a task that most of us do routinely but often think very little about - loading CSV (comma-separated value) data into Python. This simple action has a variety of obstacles that need to be overcome due to the nature of serialization and data transfer. In fact, I'm routinely surprised how often I . . .