District Data Labs
Graph Analytics Over Relational Datasets with Python
The analysis of interconnection structures of entities connected through relationships has proven to be of immense value in understanding the inner-workings of networks in a variety of different data domains including finance, health care, business, computer science, etc. These analyses have emerged in the form of Graph Analytics -- the . . .
Posted in: graph analyticsnetwork graph
A Practical Guide to Anonymizing Datasets with Python & Faker
How Not to Lose Friends and Alienate People
If you want to keep a secret, you must also hide it from yourself.
— George Orwell 1984
In order to learn (or teach) data science you need data (surprise!). The best libraries often come with a toy dataset to illustrate examples of how the code works. However, nothing can replace an actual, non-trivial . . .
An Introduction to Machine Learning with Python
For the mind does not require filling like a bottle, but rather, like wood, it only requires kindling to create in it an impulse to think independently and an ardent desire for the truth.
— Plutarch On Listening to Lectures
The impulse to ingest more data is our first and most powerful instinct. Born with . . .
Posted in: machine learningpython
Parameter Tuning with Hyperopt
This post will cover a few things needed to quickly implement a fast, principled method for machine learning model parameter tuning. There are two common methods of parameter tuning: grid search and random search. Each have their pros and cons. Grid search is slow but effective at searching the whole search space, while random search is fast, . . .
Time Maps: Visualizing Discrete Events Across Many Timescales
Discrete events pervade our daily lives. These include phone calls, online transactions, and heartbeats. Despite the simplicity of discrete event data, it’s hard to visualize many events over a long time period without hiding details about shorter timescales.
The plot below illustrates this problem. It shows the number of website . . .
Posted in: pythonvisualization
The Age of the Data Product
We are living through an information revolution. Like any economic revolution, it has had a transformative effect on society, academia, and business. The present revolution, driven as it is by networked communication systems and the Internet, is unique in that it has created a surplus of a valuable new material - data - and transformed us all . . .
Posted in: data products
Markup for Fast Data Science Publication
A central lesson of science is that to understand complex issues (or even simple ones), we must try to free our minds of dogma and to guarantee the freedom to publish, to contradict, and to experiment.
— Carl Sagan in Billions & Billions: Thoughts on Life and Death at the Brink of the Millennium
As data . . .