Taking a Deeper Look at Gentrifying Census Tracts with Cluster Classification

Image for post
Image for post
Photo by Miltiadis Fragkidis on Unsplash

Born and raised in New York City, I have seen Harlem as a child in the mid to late 90’s go from an overwhelming minority neighborhood to the gentrified haven of brownstones it is today. There have been academic papers done on the changes in neighborhoods and one of the more recent Ellen & Ding (2016) was the benchmark with which I compared my model.

Taking a data science approach, I wanted to ask a very specific question:

Could a machine learning algorithm detect gentrification?

The most important feature of this project to understand is that this is unsupervised learning. This means that the target variable was…


How to Quickly Transform, Train, & Model Your Data

Image for post
Image for post
Photo by SELİM ARDA ERYILMAZ on Unsplash

When training machine learning models, the process of transforming, training, and then testing models can get cumbersome. An ML pipeline is a quick way to code a workflow that allows us to do everything from transforming data to training models.

Using the scikit-learn package on Python, we can write an automated code that we just enter data into and it returns a trained model.

In order to build a functioning pipeline that returns the predicted values or score we want we need to remember our process when it comes to creating a machine learning model. Steps like exploratory analysis and…


Efficient Coding, Vol 3, CODEX

It’s an Algorithm Inside an Algorithm

Image for post
Image for post
Photo by Tine Ivanič on Unsplash

A recursion algorithm is one where the algorithm calls on itself but using iteratively simpler and smaller values or structures. A very quick example is often taught by creating a new factorial function.

If you don’t already know, factorials are the product of every integer from some given n until you get to 1 — and it’s denoted by an exclamation point. So 4! = 4 x 3 x 2 x1 = 24. Most languages already have a function, if not a package containing a function, that deals with factorials. …


Differences Between R and Python for New Users, CODEX

An Escape from Python to Learn Data Science’s Other Language

Image for post
Image for post
Photo by Fotis Fotopoulos on Unsplash

When you first start your path towards being a data scientist, you may ask yourself — Python or R?

I, like most, chose Python. I want to say that it was because of the ease of use or how adaptable it was in different situations. But really, it was the name. Python spoke to me and I found it really cool. If there was a language called Wolf I probably would’ve learned that instead.

But as I’ve grown even more comfortable with Python I’ve noticed a lot of job listings that ask about your comfort with R. …


What to Watch for When Using Pseudo-Labels

Image for post
Image for post
Photo by Marvin Esteve on Unsplash

Semi-supervised learning is a technique used by data scientists and large companies when labeled data is hard to come by. It combines the processes you use when attempting a supervised learning project and the uncertainty you have when you must create your own labels for previously unlabeled data.

To learn more about semi-supervised machine learning, we will first discuss the basics of supervised vs unsupervised machine learning.

Supervised vs Unsupervised

Supervised learning models draw their strength from predetermined labels in the data. As long as the feature you are trying to predict is available in the data it is supervised learning. Classic examples…


The Next Look Into Efficient Coding

Calculating the Best Way to Write Code

Image for post
Image for post
Photo by Harley-Davidson on Unsplash

In my last article, I spoke about algorithms. You can read up on it here. In short, algorithms are just a method of doing something — anything really. However, as in all things, there are better and quicker ways to accomplish a goal. You can carry a crate down the street or you can use a cart. Both methods accomplish the goal but one is faster and more efficient. The same goes with coding.

When running code, the metric most people think about first is time.

…..How long is the code going to take?

In the end, that is what…


A First Look Into Efficient Coding, CODEX

Why Code Slows Down As Data Grow

Image for post
Image for post
Photo by Jake Givens on Unsplash

When you first enter coding, hearing the word “algorithm” is a bit intimidating. The first thing that came to my mind when someone mentioned designing an algorithm was the most intense coding session.

Nonsense as far as I am concerned

But in reality, it's just designing steps for a process to achieve a task. The simplest algorithm I can think of off the top of my head is how to determine the average of a trio of numbers:

Easisest Algorithm for Average: (3+6+9)/3 = 6

That is an algorithm I just employed to complete a task. Simple, right? But now…


CODEX

Using Machine Learning to Identify, Classify, and Neutralize Hacks

Image for post
Image for post
Photo by Franck on Unsplash

This past week I found a job listing for “Expert Threat Data Scientist” at a prestigious company based out of Oregon. The job title immediately intrigued me because I had never seen those two specialties put together.

A data scientist working in cyber security?

What I Imagine Data Scientists look like to Cyber Security Techs

So I immediately did some research — partly to be able to apply to the job (the Pacific Northwest is my favorite region in the country), and partly to see where this rabbit hole took me. Applying data science principles to protecting the data itself…


Analysis of Food Scarcity in New York City Using Python

Image for post
Image for post
Photo by gemma on Unsplash

Food scarcity is a well-known problem in New York City. It refers to the fact that some areas lack healthy food options, usually provided by supermarkets or fresh food stores. These “food deserts”, which they are colloquially called, often lead to higher rates of obesity and obesity-related illnesses. Alternatively, these areas also have a high number of fast-food options. Fast food restaurants are (well-deservedly) considered unhealthy options because of the ingredients used and the caloric intake averaged on every meal.

Supermarkets are one of the few establishments that offer a…


Continued Training For Snake Charmers

Image for post
Image for post
Photo by Priscilla Du Preez on Unsplash

Congratulations!

(If you have no idea what I’m talking about, check out my earlier article where I go over the very basics of Python. Click this link.)

You’ve made it to part two. That means you’ve worked out the basics of Python (like installing) and done some fun things with the language. In that article, we went over installation, statements, variables, and even data types. That was the groundwork we need to lay to do the next part, where we get our code to do things for us.

In this article, we are going to learn:

  1. Operators
  2. Conditionals
  3. Loops
  4. Functions

Paul Torres

Data Scientist with a Physicist’s heart. Looking through numbers to tell a story that people will care about.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store