Taking a Deeper Look at Gentrifying Census Tracts with Cluster Classification

Born and raised in New York City, I have seen Harlem as a child in the mid to late 90’s go from an overwhelming minority neighborhood to the gentrified haven of brownstones it is today. …

The Difference Between Alpha & Standardized Alpha

Statistical tests are an integral part of a data scientist's repertoire. Every day, we clean, sort, and model data with the assumption that the differences we find in the numbers actually matter.

Does the salary of Segment A and Segment B of the population really differ enough to matter? Can…

Algorithm Interview Topics, Vol. 2

How to Design An Efficient Search Algorithm

We’ve all done exercises before where we’ve had to see if a string contained a character or returned a Boolean after trying to find a numeric value. The keywords are all there: in, contains, if x == value.

But how many of us just starting out truly understand what is…

Algorithm Interview Topics, Vol. 1

How to Use Single & Doubly Linked Lists

A linked list is a linear data structure whose elements are not stored in a continuous location. This means that a linked list contains separate vacuoles known as ‘nodes’, which contain the data they were created to contain and a reference to another node in the list.

Some advantages of…


Creating Working Environments for Data Science Projects

Once I started my journey into data science, I immediately regretted the decision my younger self made on purchasing my Macbook Pro. I went for storage over memory or processing speed and it showed.

Ensembles took longer to run, daily observation time series broke my kernels, and I couldn’t even…

Predicting Health Insurance Coverage using Machine Learning

Across the country, universal healthcare has been a major talking point for people from all walks of life. …


Feel Like a Hacker on Terminal

Back when I used my laptop as a netbook/word processor, using the Terminal or Command Prompt was always to be avoided.

Slowly but surely, I worked on it until I was climbing through my files like a pro. Then came the git commands, but that is a tale for another…

When A Straight Line Doesn’t Help Your Predictions

Linear regression is an awesome tool for predicting continuous values. If X explains Y in a linear fashion then linear regression is a safe bet to predict future values. But the relationship needs to be linear — meaning that if X changes at a steady rate, then Y also needs…

How to Quickly Transform, Train, & Model Your Data

When training machine learning models, the process of transforming, training, and then testing models can get cumbersome. An ML pipeline is a quick way to code a workflow that allows us to do everything from transforming data to training models.

Using the scikit-learn package on Python, we can write an…

Efficient Coding, Vol 3, CODEX

It’s an Algorithm Inside an Algorithm

A recursion algorithm is one where the algorithm calls on itself but using iteratively simpler and smaller values or structures. A very quick example is often taught by creating a new factorial function.

If you don’t already know, factorials are the product of every integer from some given n until…

Paul Torres

Data Scientist with a Physicist’s heart. Looking through numbers to tell a story that people will care about.

