DEV Community

Cover image for Home Value Regression Analysis
Cristopher Delgado
Cristopher Delgado

Posted on

Home Value Regression Analysis

GitHub

  • As I continue my journey into data science and continue learning new data analysis methods, I am proud to have completed my second project showcasing my skills. This project is all about predicting home values located in King County Washington within the years 2021-2022.

Methodology:

  • Perform data cleaning which consisted of changing data types to appropriate/expected types.
  • Normalize data and linearize continuous data accordingly.
  • Perform exploratory data analysis to understand the correlations of the features with the price of a home.
  • Take on an iterative approach to creating prediction models using Linear Regression.

Data Normalization:

  • I prepared the continous data by standardizing the distribution in order to compare it to a z-distribution. In doing so I was able to remove any outliers 3 standards deviations away from the sample mean and then convert the values back into their original values.

normalization

General Trends:

  • After removing outliers, correcting datatypes, and manipulating data representations I was then able to begin viewing general trends in the data with my target variable being 'price'. Price in this context refers to the home sell price.
  • Viewing general trends then allowed me to see the features that most impact a home value.

bath
cat

Best Model:

This model has the best interpretability and lowest err out of all the models I generated for this project. This model was achieved using polynomial regression.

model

Conclusions:

  • Location can make up for most of the price in a home's value.
  • Home aspects such as bedrooms, bathrooms, and square footage matter.
  • A home's construction and state matter.

Challenges:

  • A part of my data preparation was zip code extraction so I can consider visualizing how location impacts price more in depth. Unfortunately, I was not able to develop a clear graph that showcased its impact due to so many unique zip codes. I still included zip code impact in my models.

Learning Outcomes:

  • I managed to create predictive models and learned all about statics used in data science.
  • I improved my data visualization abilities.
  • Learned to draw insights from regression modeling.

Please feel free to review the entire project in my GitHub repository.

Top comments (0)