Born and raised in New York City, I have seen Harlem as a child in the mid to late 90’s go from an overwhelming minority neighborhood to the gentrified haven of brownstones it is today. There have been academic papers done on the changes in neighborhoods and one of the more recent Ellen & Ding (2016) was the benchmark with which I compared my model.
Taking a data science approach, I wanted to ask a very specific question:
The most important feature of this project to understand is that this is unsupervised learning. This means that the target variable was…
We’ve all done exercises before where we’ve had to see if a string contained a character or returned a Boolean after trying to find a numeric value. The keywords are all there: in, contains, if x == value.
But how many of us just starting out truly understand what is going on behind the scenes and what mechanics determine what is the best algorithm to use.
This article aims to go over the differences in algorithms like sequential search, binary search, and hashing to better understand what is going on and what to use in certain situations. …
A linked list is a linear data structure whose elements are not stored in a continuous location. This means that a linked list contains separate vacuoles known as ‘nodes’, which contain the data they were created to contain and a reference to another node in the list.
Some advantages of a linked list are its discontinuous nature and its reading time never going above O(n). A linked list does not need to be stored continuously because it has references built into each node for the location of the next in order. If space in your computer’s memory is sparse, a…
Once I started my journey into data science, I immediately regretted the decision my younger self made on purchasing my Macbook Pro. I went for storage over memory or processing speed and it showed.
Ensembles took longer to run, daily observation time series broke my kernels, and I couldn’t even entertain deep learning as a viable option. So when Apple debuted its M1 chip that was supposedly a great place for data science, I jumped all over it. I maxed it out (which considering it was an early model, wasn’t hard) and got ready for some serious machine learning projects…
Across the country, universal healthcare has been a major talking point for people from all walks of life. While there is much debate about the efficacy of such a program, the effects of not having health insurance on those that are uninsured and on the system in general, is still a detriment we can all feel.
Along with my partner on this project, Albert Um, we decided to work together and build a machine learning model that could predict the health insurance status of a resident of New York State. Choosing New York State was done purposefully. It is a…
Back when I used my laptop as a netbook/word processor, using the Terminal or Command Prompt was always to be avoided.
Slowly but surely, I worked on it until I was climbing through my files like a pro. Then came the git commands, but that is a tale for another article. Using the command line is an integral and honestly, cool method of using your computer’s power. It can be used for anything from saving and deleting files, to opening programs, to interacting with online repositories of data.
I am going to show you how to change the look of…
Linear regression is an awesome tool for predicting continuous values. If X explains Y in a linear fashion then linear regression is a safe bet to predict future values. But the relationship needs to be linear — meaning that if X changes at a steady rate, then Y also needs to change at a steady rate. I created a data frame with the following code to show this relationship.
And that creates this graph:
When training machine learning models, the process of transforming, training, and then testing models can get cumbersome. An ML pipeline is a quick way to code a workflow that allows us to do everything from transforming data to training models.
Using the scikit-learn package on Python, we can write an automated code that we just enter data into and it returns a trained model.
In order to build a functioning pipeline that returns the predicted values or score we want we need to remember our process when it comes to creating a machine learning model. Steps like exploratory analysis and…
A recursion algorithm is one where the algorithm calls on itself but using iteratively simpler and smaller values or structures. A very quick example is often taught by creating a new factorial function.
If you don’t already know, factorials are the product of every integer from some given n until you get to 1 — and it’s denoted by an exclamation point. So 4! = 4 x 3 x 2 x1 = 24. Most languages already have a function, if not a package containing a function, that deals with factorials. …
When you first start your path towards being a data scientist, you may ask yourself — Python or R?
I, like most, chose Python. I want to say that it was because of the ease of use or how adaptable it was in different situations. But really, it was the name. Python spoke to me and I found it really cool. If there was a language called Wolf I probably would’ve learned that instead.
But as I’ve grown even more comfortable with Python I’ve noticed a lot of job listings that ask about your comfort with R. …
Data Scientist with a Physicist’s heart. Looking through numbers to tell a story that people will care about.