DEV Community

Indra Juliansyah Putra
Indra Juliansyah Putra

Posted on

Exploring Laptop Price and Specifications Part 1: A Data Preparation Analysis using Python

Python Logo

Introduction

Welcome to the captivating realm of laptop price and specifications analysis! In this project, we'll take a deep dive into the fascinating world of laptops, uncovering valuable insights about their prices and technical specifications. By harnessing the power of Python and various data analysis techniques, we aim to provide you with a comprehensive understanding of the laptop market.

This project is divided into four exciting parts, each focusing on a crucial aspect of the analysis. Let's take a closer look at what each part entails:

Part 1: Data Preparation using Python In this initial phase, we'll gather and prepare the laptop dataset for analysis. We'll handle missing values, and format the data in a way that suits our analysis needs.

Part 2: Data Cleaning using Python Building upon the prepared dataset, it's time to roll up our sleeves and tackle the important task of data cleaning. We'll address inconsistencies, deal with outliers, and perform necessary transformations to ensure the data is accurate and reliable. By the end of this part, we'll have a squeaky-clean dataset ready for exploration.

Part 3: Data Visualization using Python Now, get ready for the visual feast! With our pristine dataset in hand, we'll dive into the world of data visualization. Using awesome Python libraries like Matplotlib, Seaborn, and Plotly, we'll create captivating charts and graphs that bring the laptop prices and specifications to life. Prepare to be amazed as we uncover hidden patterns, spot trends, and unravel the relationships within the data.

Part 4: Explanatory Data Analysis using Python In the final part of our adventure, we'll embark on a thrilling journey of exploratory data analysis (EDA). Armed with statistical techniques, hypothesis testing, and advanced analytics, we'll dig deeper into the data to unearth meaningful insights. From data-driven recommendations to valuable conclusions about the laptop market, we'll unravel the secrets hidden within the numbers.

Throughout this project, our aim is to make the world of laptop prices and specifications accessible and enjoyable for all. Whether you're a tech enthusiast, a prospective laptop buyer, or simply love exploring data, this project will provide you with a wealth of knowledge about the ever-evolving laptop market.

So, grab your favorite beverage, get comfy, and join us on this exciting journey as we explore the intricacies of laptop prices and specifications. From playful data preparation to groovy data cleaning, vibrant data visualizations, and eye-opening exploratory data analysis, we promise a thrilling adventure that will leave you with valuable insights and a newfound appreciation for laptops.
Let's kick off the fun with Part 1: Data Preparation using Python!


Description of Dataset

Alright, let's take a closer look at the dataset we'll be working with. We've got an exciting collection of laptop data here, ready to be explored! 🎉

This dataset provides information on a whopping 977 laptops, giving us a diverse range of options to delve into. Each laptop comes with a variety of specifications and features, allowing us to analyze and visualize the factors that influence laptop prices.

The dataset comprises 13 columns, each holding specific details about the laptops. Here's a breakdown of the columns and what they represent:

  1. Manufacturer: The name of the laptop manufacturer, showcasing the variety of brands available.
  2. Model Name: The specific model or series name of the laptop.
  3. Category: The category or type of the laptop, such as gaming, ultrabook, or workstation.
  4. Screen Size: The size of the laptop screen in inches, giving us an idea of its display dimensions.
  5. Screen: Additional details about the laptop screen, like its resolution or touchscreen capabilities.
  6. CPU: The central processing unit (CPU) or processor model of the laptop.
  7. RAM: The amount of random access memory (RAM) in the laptop, representing its memory capacity.
  8. Storage: The storage capacity and type of the laptop.
  9. GPU: The graphics processing unit (GPU) or graphics card specifications of the laptop, crucial for graphic-intensive tasks.
  10. Operating System: The operating system pre-installed on the laptop, such as Windows, macOS, or Linux.
  11. Operating System Version: The specific version or edition of the operating system (if available).
  12. Weight: The weight of the laptop, giving us an idea of its portability and convenience.
  13. Price: The price of the laptop in a numerical format (INR), which is the target variable we'll be exploring.

With this diverse set of information, we'll be able to uncover fascinating insights about laptop prices and how various specifications impact them. So, get ready to dive into the data and embark on an exciting journey of laptop price prediction using specifications!
You can access the dataset through the following link: Dataset Link

Let's dive in and uncover the secrets hidden within this dataset! 💻💡


Data Preparation

Now, let's dive into the "Data Preparation" phase where we'll roll up our sleeves and get the dataset ready for analysis by tackling missing values, tidying up columns, and shaping the data in a more organized and workable form.

First things first, we need to import the necessary libraries to work with the dataset and perform data manipulation. We'll be using the popular pandas library in Python, which provides powerful tools for data analysis and manipulation. So, let's fire up our coding engines and import pandas:

import pandas as pd
import numpy as np
Enter fullscreen mode Exit fullscreen mode

Next, we'll read the dataset into a pandas DataFrame using the read_csv() function. Assuming you've downloaded the dataset and saved it as a CSV file, here's how you can load it into a DataFrame:
laptop = pd.read_csv('laptops.csv')

Great! Now we have our dataset loaded into the DataFrame named laptop, and we're ready to dive deeper.

Before we proceed, it's always a good practice to examine the structure and content of the dataset. Let's start by checking the dimensions of the DataFrame using the shape attribute:

print("Dataset dimensions: ", laptop.shape)
Enter fullscreen mode Exit fullscreen mode

Dataset Dimension
This line of code prints the dimensions of the dataset, indicating that it has 977 rows and 13 columns. This information gives us an overview of the dataset's size and structure.

To get a glimpse of the dataset, we can use the head() function to display the first few rows:

laptop.head()
Enter fullscreen mode Exit fullscreen mode

Head Function
The head() function displays the first few rows (5 rows) of the dataset, allowing us to quickly preview the data. The output shows a sample of the dataset. This gives us a sense of the data's content and structure.

Now, let's inspect the data info of each column in the DataFrame using the info() attribute:

laptop.info()
Enter fullscreen mode Exit fullscreen mode

Info Dataset
The info() function provides a summary of the dataset's information. It displays the column names, the count of non-null values in each column, the data type of each column, and memory usage. From the output, we can observe that the dataset contains 977 entries across 13 columns. The columns consist of a combination of object (string) and float data types.

Additionally, we noticed that the 'Operating System Version' column has 841 non-null values, which means there are missing values in that column. To handle these missing values, we can replace them with empty quotes (""). By doing this, we're essentially filling in the gaps with an empty string, indicating that the information for those particular entries is unavailable or unknown. We can use the fillna() function to achieve this:

laptop['Operating System Version'].fillna("", inplace=True)
Enter fullscreen mode Exit fullscreen mode

By replacing the missing values with empty quotes, we maintain the structure and integrity of our dataset, allowing us to continue with our analysis without any disruptions.

laptop = laptop.rename(columns=str.lower)
laptop = laptop.rename(columns={" storage": "storage"})
laptop.columns
Enter fullscreen mode Exit fullscreen mode

Rename Column Name
We changed the column names to lowercase using laptop = laptop.rename(columns=str.lower). This was done to ensure consistency and ease of access in our dataset. By converting all column names to lowercase, we prevent any potential errors due to mismatched capitalization and simplify column usage in our code. It also aligns with the convention in pandas, where lowercase column names are commonly used.

In addition, we used laptop = laptop.rename(columns={" storage": "storage"}) to rename the ' Storage' column to 'storage', removing the leading space and making it more accessible and convenient for further analysis. These modifications help prepare our dataset for smoother data manipulation and visualization tasks.


Congratulations on completing Part 1 of our exciting laptop price and specifications analysis project! In this section, we focused on preparing the dataset for further exploration. By leveraging Python's powerful tools and techniques, we laid the foundation for a successful analysis journey.

Now that we have a well-prepared dataset in our hands, we're ready to dive into Part 2: Data Cleaning using Python. In the next phase, we'll address inconsistencies, handle outliers, and perform necessary transformations to make our dataset accurate and reliable.

Stay tuned for more thrilling insights and captivating discoveries as we continue our exploration of laptop prices and specifications. Get ready to unleash the power of Python once again!

Great job on completing Part 1, and let's keep the momentum going as we move forward. Onward to Part 2: Data Cleaning using Python!

Top comments (1)