Taking a Deeper Look at Gentrifying Census Tracts with Cluster Classification

Photo by Miltiadis Fragkidis on Unsplash

Born and raised in New York City, I have seen Harlem as a child in the mid to late 90’s go from an overwhelming minority neighborhood to the gentrified haven of brownstones it is today. There have been academic papers done on the changes in neighborhoods and one of the more recent Ellen & Ding (2016) was the benchmark with which I compared my model.

Taking a data science approach, I wanted to ask a very specific question:

Could a machine learning algorithm detect gentrification?

The most important feature of this project to understand is that this is unsupervised learning. This means that the target variable was…

The Difference Between Alpha & Standardized Alpha

Photo by Markus Winkler on Unsplash

Statistical tests are an integral part of a data scientist's repertoire. Every day, we clean, sort, and model data with the assumption that the differences we find in the numbers actually matter.

Does the salary of Segment A and Segment B of the population really differ enough to matter? Can we explain the two-dollar difference by sheer luck in sampling? That is why we have to do statistical tests — to show that there is more than a likely chance our assumptions matter.

Hypothesis Tests Overview

When we use data, we know that the numbers we work with are not every example of…

Algorithm Interview Topics, Vol. 2

How to Design An Efficient Search Algorithm

Photo by Marten Newhall on Unsplash

We’ve all done exercises before where we’ve had to see if a string contained a character or returned a Boolean after trying to find a numeric value. The keywords are all there: in, contains, if x == value.

But how many of us just starting out truly understand what is going on behind the scenes and what mechanics determine what is the best algorithm to use.

This article aims to go over the differences in algorithms like sequential search, binary search, and hashing to better understand what is going on and what to use in certain situations. …

Algorithm Interview Topics, Vol. 1

How to Use Single & Doubly Linked Lists

Photo by Bryson Hammer on Unsplash

A linked list is a linear data structure whose elements are not stored in a continuous location. This means that a linked list contains separate vacuoles known as ‘nodes’, which contain the data they were created to contain and a reference to another node in the list.

Some advantages of a linked list are its discontinuous nature and its reading time never going above O(n). A linked list does not need to be stored continuously because it has references built into each node for the location of the next in order. If space in your computer’s memory is sparse, a…


Creating Working Environments for Data Science Projects

Photo by Robert Lukeman on Unsplash

Once I started my journey into data science, I immediately regretted the decision my younger self made on purchasing my Macbook Pro. I went for storage over memory or processing speed and it showed.

Ensembles took longer to run, daily observation time series broke my kernels, and I couldn’t even entertain deep learning as a viable option. So when Apple debuted its M1 chip that was supposedly a great place for data science, I jumped all over it. I maxed it out (which considering it was an early model, wasn’t hard) and got ready for some serious machine learning projects…

Predicting Health Insurance Coverage using Machine Learning

Photo by Luis Melendez on Unsplash

Across the country, universal healthcare has been a major talking point for people from all walks of life. While there is much debate about the efficacy of such a program, the effects of not having health insurance on those that are uninsured and on the system in general, is still a detriment we can all feel.

Along with my partner on this project, Albert Um, we decided to work together and build a machine learning model that could predict the health insurance status of a resident of New York State. Choosing New York State was done purposefully. It is a…


Feel Like a Hacker on Terminal

Photo by Markus Spiske on Unsplash

Back when I used my laptop as a netbook/word processor, using the Terminal or Command Prompt was always to be avoided.

Slowly but surely, I worked on it until I was climbing through my files like a pro. Then came the git commands, but that is a tale for another article. Using the command line is an integral and honestly, cool method of using your computer’s power. It can be used for anything from saving and deleting files, to opening programs, to interacting with online repositories of data.

I am going to show you how to change the look of…

When A Straight Line Doesn’t Help Your Predictions

Photo by Ludovic Charlet on Unsplash

Linear regression is an awesome tool for predicting continuous values. If X explains Y in a linear fashion then linear regression is a safe bet to predict future values. But the relationship needs to be linear — meaning that if X changes at a steady rate, then Y also needs to change at a steady rate. I created a data frame with the following code to show this relationship.

And that creates this graph:

How to Quickly Transform, Train, & Model Your Data

Photo by SELİM ARDA ERYILMAZ on Unsplash

When training machine learning models, the process of transforming, training, and then testing models can get cumbersome. An ML pipeline is a quick way to code a workflow that allows us to do everything from transforming data to training models.

Using the scikit-learn package on Python, we can write an automated code that we just enter data into and it returns a trained model.

In order to build a functioning pipeline that returns the predicted values or score we want we need to remember our process when it comes to creating a machine learning model. Steps like exploratory analysis and…

Efficient Coding, Vol 3, CODEX

It’s an Algorithm Inside an Algorithm

Photo by Tine Ivanič on Unsplash

A recursion algorithm is one where the algorithm calls on itself but using iteratively simpler and smaller values or structures. A very quick example is often taught by creating a new factorial function.

If you don’t already know, factorials are the product of every integer from some given n until you get to 1 — and it’s denoted by an exclamation point. So 4! = 4 x 3 x 2 x1 = 24. Most languages already have a function, if not a package containing a function, that deals with factorials. …

Paul Torres

Data Scientist with a Physicist’s heart. Looking through numbers to tell a story that people will care about.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store