DEV Community: Uche Emmanuel

Are you a dev?

Uche Emmanuel — Mon, 15 Jan 2024 23:53:24 +0000

Are you a full-stack software engineer looking to join a resilient and young team working on exciting projects for the renewable energy sector?

Then reach out to me with your CV.

We are looking to hire for 12 months, we could extend if you wish to continue working on more exciting projects with us.

Site Resource Assessment and Rotor Design

Uche Emmanuel — Mon, 07 Aug 2023 07:32:10 +0000

The first step in my blade design process was to assess the wind condition of the site my wind turbine will operate. In this step, my main objective was to estimate the main parameters that will define my wind turbine model, calculate airfoil aerodynamic properties, and most importantly, define the blade geometry.

One might ask

why did you perform a site assessment before designing your turbine blades?

Site assessment provided me an insight into my blade design and how suitably my blades can operate at the given site, through a clear description of the Design tip speed ratio, Weibull probability distribution, Power curve, and Annual energy production (AEP).

The AEP is based on the annual wind speed distribution (10 min. average) at a particular site. In simpler form, the AEP is a direct measure of how much energy a wind turbine can generate annually while operating on a particular site.

Maximizing wind turbines' annual energy production (AEP) is important for economic reasons.

By knowing the AEP at my given site, I was able to optimize the design of the turbine blades to capture the maximum amount of energy from the prevailing wind conditions at the site. In this process I tailored the blade design parameters, such as blade maximum length and aerodynamic profile to match the specific wind characteristics at the site, thereby maximizing energy capture.

Mean Wind Speed at Site:
The mean wind speed at the 100m hub height is 6.20m/s and this was obtained from the Weibull scaling factor A of 7 and Weibull shape parameter k of 2.15.

Rotor Sizing

The blade is expected to operate on an IEC type-class IIIb wind turbine with the following calculated sizing parameters:

Maximum Blade Length (Referenced) = 50m
Hub Radius = 1.25m
Hub Height (Referenced) = 100m
Maximum Tip Speed (Referenced) = 75
Rated RPM = 13.9746
Rated Electrical Power = 1.5MW
Mechanical Efficiency = 94.55%
Electrical Efficiency = 94.69%
Total Conversion Efficiency = 47% (obtained from the mechanical and electrical efficiency)

Rotor Radius = Maximum Blade Length (Referenced) + Hub Radius = 51.25m

Required Wind Power = Rated Electrical Power/Total Conversion Efficiency = 3.19GW

Estimated Rated Wind Speed = The estimated rated wind speed of 8.58m/s (9m/s, rounded up) was obtained at the maximum blade length and required wind power. 

Design Tip Speed Ratio (TSR) = Maximum Tip Speed (Referenced)/Estimated Rated Wind Speed = 8.33

The TSR is a very important parameter stepwise in designing my blade because I used it in determining the Chord Length and Twist Angle of my blade.

Weibull Distribution

Weibull distribution is a statistical analysis method that I employed to model the probability distribution of wind speeds at 100m turbine hub height at the given site. With this distribution, I can understand the wind speed characteristics and the likelihood of different wind speeds occurring over the lifetime of my turbine. The Weibull distribution is a key step in determining the AEP at the given site.

Power Curve

The Power Curve shows a relationship between the power output from my turbine and the wind speed. The power curve describes the cut-in, rated, and cut-out wind speeds of my turbine. The cut-in is the minimum speed at which the turbine starts producing energy. The rated wind speed is the speed at which the turbine produces maximum power output and the cut-out could be described as a run to safety, that is the maximum speed at which the turbine shuts down to prevent damage.

The average wind speed at the given site is 6.20m/s and the cut-in speed for my turbine is 3.5m/s.

This power curve is only a theoretical approximation (ideal case). Most power curves, in reality, do not look exactly like this. In practical situations, the turbine most times reaches rated speed before it reaches its rated power and does not shut down automatically it reaches cut-out speed as seen in the power curve above but shuts down gradually even as speed increases.

Annual Energy Production (AEP):

To calculate the AEP, I typically multiplied the Weibull probability distribution with the power curve of my turbine.

AEP = sum([wind_speed_freq_distribution[i] * power_output_data[i] for i in range(len(wind_speed_data))])

In conclusion, my wind turbine is planned to operate at a low wind speed site with a total net annual energy production (AEP) 5.53GWh.

On a final note, we have now completed our site assessment, we have determined how much energy we can produce from our site, and we have sized our turbine to match the energy yield and obtained vital blade design parameters. This information will now be applied in designing the turbine blades which will perform the most important job of harnessing the wind energy at the site.

To be continued...

Acknowledgement

Design of Wind Energy Masters' Course Tutorial - University of Oldenburg, Germany

Please note, this episode of my "Blade Design Series" was created as an excerpt of my tutorial project work in my Wind Energy Design Course.

Web Scraping using Python and Selenium

Uche Emmanuel — Tue, 28 Mar 2023 05:30:21 +0000

Introduction

Web scraping is the process of extracting data from websites. It can be used for a variety of purposes such as research, data analysis, or automation. In this guide, I will focus on web scraping with Python and Selenium.

Selenium is a powerful tool for web automation and can be used to automate tasks such as filling out forms and clicking buttons. In this documentation, I will demonstrate how to use Selenium to extract data from a website.

Setup

Before we begin, we need to install Selenium. You can install Selenium using pip:

pip install selenium

You also need to install a web driver for your browser. You can download the Chrome driver from the following link:

https://sites.google.com/a/chromium.org/chromedriver/downloads

Once you have downloaded the driver, make sure to add its path to your system's PATH variable.

Now I walk you through the entire process in five (5) steps:

Step 1: Launch the browser

The first step is to launch the browser using Selenium. Here's is a code snippet:

from selenium import webdriver

Launch Chrome browser

Note that in this documentation, I am using the Google Chrome browser, you could also play around with other browsers.

browser = webdriver.Chrome()

In this code snippet, I first imported the web driver module from Selenium and created an instance of the Chrome driver. This will launch a new Chrome browser window.

Step 2: Navigate to the website

The second step is to navigate to the website from which you wish to extract data. Here is a code snippet to achieve this:

Navigate to the website

browser.get('https://www.example.com')

In the above code snippet, I used the get() method of the browser object to navigate to the website. Replace the URL with the website that you want to extract data from.

Step 3: Find the element to extract data from

In order to extract data from a website, you need to find the HTML element that contains the data. You can use the find_element_by_* methods of the browser object to find the element. Here's a code snippet:

Find element by class name

element = browser.find_element_by_class_name('example-class')

In this code snippet, I used the find_element_by_class_name() method to find an element with the class name 'example-class'. You can also use other methods such as find_element_by_id(), find_element_by_name(), and find_element_by_xpath() to find elements.

Step 4: Extract data from the element

Once you have figured out the element that contains the data you want to scrape, you can scrape the data using the text attribute. Here's a code snippet:

Extract text from element

text = element.text
print(text)

In this code snippet, I used the text attribute of the element object to scrape the text contained within the element.

Step 5: Close the browser

Finally, you need to close the browser window after scraping data. Here's a code snippet:

Close browser

browser.quit()

In this code snippet, I used the quit() method of the browser object to close the browser window.

Conclusion

In conclusion, web scraping can be a powerful tool for extracting data from websites. Python and Selenium provide a powerful combination of web scraping and automation. In this guide, I covered the basic steps for extracting data from a website using Python and Selenium. With these tools and techniques, you can automate repetitive tasks and extract valuable data from websites.

Grid and Randomized Hyperparameter Optimization for XGBoost Algorithms

Uche Emmanuel — Thu, 23 Mar 2023 13:02:29 +0000

Introduction

Welcome to this guide on Grid and Randomized Hyperparameter Optimization for XGBoost algorithms! In this guide, I have explained what hyperparameters mean, the different parameters for both search methods, how to tune hyperparameters for XGBoost algorithms using both methods: Grid Search and Randomized Search.

XGBoost Algorithms

XGBoost is a popular open-source software library used for gradient-boosting algorithms. It is used to model and optimize complex data structures, making it a popular choice for machine learning tasks. Hyperparameter optimization is a crucial part of optimizing the performance of any machine learning model.

What are Hyperparameters?

Hyperparameters are parameters that are set before training a machine learning model. They are not learned from the data but are set by the user. Examples of hyperparameters include the learning rate, the number of trees, the maximum depth of a tree, and the regularization parameters.

Hyperparameters play a critical role in the performance of a machine learning model, and tuning them can often lead to significant improvements in the model's accuracy.

Tuning the Parameters

max_depth: This parameter specifies the maximum depth of a tree. Increasing max_depth makes the model more complex and can lead to overfitting, while decreasing it can lead to underfitting. Setting max_depth to a high value may result in a longer training time and more memory usage.

learning_rate: This parameter controls the step size when updating the weights during training. A smaller learning_rate means slower learning and can help prevent overfitting. However, too small of a learning_rate can result in slower convergence and a longer training time. Increasing learning_rate can lead to faster convergence, but it may also result in overfitting.

n_estimators: This parameter specifies the number of trees in the model. Increasing n_estimators generally improves the performance of the model, but it also increases the training time and memory usage. It is important to find a balance between performance and training time.

subsample: This parameter specifies the fraction of observations to be randomly sampled for each tree. Increasing subsample can improve the model's ability to generalize to new data, but it can also lead to overfitting. Decreasing subsample can reduce overfitting, but it may also result in underfitting.

colsample_bytree: This parameter specifies the fraction of features to be randomly sampled for each tree. Increasing colsample_bytree can improve the model's ability to generalize to new data, but it can also lead to overfitting. Decreasing colsample_bytree can reduce overfitting, but it may also result in underfitting.

Grid Search

Grid Search is a brute-force approach to hyperparameter tuning. It involves defining a grid of hyperparameter values and exhaustively searching through all possible combinations of these values to find the best combination.

Here's an example of how to perform a grid search for hyperparameter optimization using the scikit-learn library:

The first step is to import the necessary libraries:

from sklearn.model_selection import GridSearchCV
import xgboost as xgb

Define the parameter grid

param_grid = {
    'learning_rate': [0.01, 0.1, 0.2],
    'max_depth': [3, 5, 7],
    'subsample': [0.5, 0.7],
    'colsample_bytree': [0.5, 0.7]
    'n_estimators': [100, 200, 300]
}

Initialize XGBoost model

xgb_model = xgb.XGBClassifier()

Perform Grid Search

grid_search = GridSearchCV(estimator = xgb_model, param_grid=params, cv=3)
grid_search.fit(X_train, y_train)

Get best hyperparameters

best_params = grid_search.best_params_
print(f"Best Hyperparameters: {best_params}")

In this example, we define a grid of hyperparameters including max_depth, learning_rate, and n_estimators. We initialize an XGBoost model and then perform Grid Search using the GridSearchCV function from sci-kit-learn.

We specify the number of folds for cross-validation using the cv parameter. The fit method performs the Grid Search and returns the best hyperparameters found.

Randomized Search

Randomized Search is a more efficient approach to hyperparameter tuning. It involves defining a distribution for each hyperparameter and randomly sampling values from these distributions to find the best combination. This approach can be useful when the search space is large and it is not feasible to perform an exhaustive search.

Here's an example of how to perform a randomized search for hyperparameter optimization using the scikit-learn library:

Import the necessary libraries

from sklearn.model_selection import RandomizedSearchCV
import xgboost as xgb
from scipy.stats import randint, uniform

Define hyperparameters distributions

params = {
    'max_depth': randint(3, 10),
    'learning_rate': uniform(0.01, 0.1),
    'n_estimators': randint(100, 1000)
    'subsample': uniform(0.5, 0.5),
    'colsample_bytree': uniform(0.5, 0.5)
}

Initialize XGBoost model

xgb_model = xgb.XGBClassifier()

Perform Randomized Search

random_search = RandomizedSearchCV(estimator=xgb_model, param_distributions=params, cv=3, n_iter=10)
random_search.fit(X_train, y_train)

Get the best hyperparameters

print(random_search.best_params_)

In the above example, we have defined a distribution for each hyperparameter. We have then created an XGBClassifier object and a RandomizedSearchCV object. We have then fit the RandomizedSearchCV object to the data and printed the best hyperparameters found by the algorithm.

Conclusion

Hyperparameter optimization is an important step in optimizing the performance of machine learning models, and grid and randomized hyperparameter optimization are two popular approaches. Grid search involves an exhaustive search over a predefined set of hyperparameters, while randomized search involves randomly sampling hyperparameters from a predefined distribution.

Both approaches have their pros and cons. Grid search is more thorough and can guarantee that the optimal hyperparameters are found within the search space, but it can be computationally expensive when the search space is large. Randomized search is faster and can be more effective when the search space is large, but there is a chance that the optimal hyperparameters may not be found.

In conclusion, hyperparameter optimization is a crucial step in building accurate and effective machine learning models, and both grid and randomized hyperparameter optimization are powerful tools to achieve this goal on any machine learning model. By using these techniques, we can identify the best combination of hyperparameters that maximizes the performance of our models.

Pandas Core Data Preprocessing Techniques - A Recap

Uche Emmanuel — Tue, 21 Mar 2023 08:01:24 +0000

Introduction

Data preprocessing is a crucial step in any data analysis/data science/machine learning project. It involves transforming and cleaning raw data into a format that can be easily analyzed and visualized and used for modeling. In this document, I have provided a recap of some core data preprocessing techniques and procedures using Python's Pandas library.

Pandas is a popular open-source data analysis library for Python. It provides powerful data structures for working with structured data and a wide range of tools for data manipulation, analysis, and visualization.

Getting started

Before we dive into data preprocessing techniques, let's first ensure that Pandas is installed in your environment. You can install Pandas using pip:

!pip install pandas

Once Pandas is installed, you can import it into your Python script or notebook using:

import pandas as pd

Data loading and exploration

The first step in any data analysis/data science/machine learning project is to load the data and explore its structure and properties. Pandas provides several methods for loading data from various file formats, including CSV, Excel, SQL databases, and more.

Here's an example of loading a CSV file using Pandas:

data = pd.read_csv('data.csv')

Here you have to ensure that the data file is in the current directory where you are writing your code. Once the data is loaded, we can explore its structure using methods such as head, tail, info, describe, and more. These methods provide useful information about the data, such as column names, data types, summary statistics, and sample rows.

# show the first five rows of the data
print(data.head())

# show the last five rows of the data
print(data.tail())

# show information about the data
print(data.info())

# show summary statistics of the data
print(data.describe())

Data cleaning

After exploring the data, the next step is to clean it by handling missing values, duplicate data, outliers, and incorrect data types. Pandas provides several methods for handling these issues.

Handling missing values

Missing values are common in real-world datasets and can be problematic for data analysis. Pandas provides several methods for handling missing values, including dropna, fillna, and more.

Here's an example of dropping rows with missing values:

# drop rows with missing values
clean_data = data.dropna()

And here's an example of filling missing values with a specific value:

# fill missing values with zero
clean_data = data.fillna(0)

Handling duplicate data

Duplicate data can skew analysis results and should be removed before analysis. Pandas provides a drop_duplicates method for removing duplicate rows.

# drop duplicate rows
clean_data = data.drop_duplicates()

Handling outliers

Outliers can also skew analysis results and should be handled appropriately. Pandas provides several methods for handling outliers, including clip and quantile.

Here's an example of clipping values at a certain threshold:

# clip values at 5th and 95th percentile
clean_data = data.clip(lower=data.quantile(0.05), upper=data.quantile(0.95), axis=1)

Data transformation

After cleaning the data, the next step is to transform it into a format that can be easily analyzed and visualized. Pandas provides several methods for transforming data, including groupby, pivot_table, merge, and more.

Grouping data

Grouping data is a common operation in data analysis, and Pandas provides a groupby method for this purpose. Here's an
example of grouping data by a specific column and calculating the mean:

# group data by 'category' column and calculate the mean of 'value' column
grouped_data = data.groupby('category')['value'].mean()

Pivoting data

Pivoting data involves reshaping data from a long format to a wide format. Pandas provides a pivot_table method for pivoting data.

Here's an example of pivoting data based on two columns:

# pivot data based on 'category' and 'date' columns
pivoted_data = pd.pivot_table(data, values='value', index='category', columns='date')

Merging data

Merging data involves combining data from multiple sources based on a common column. Pandas provides a merge method for merging data.

Here's an example of merging two data frames based on a common column:

# merge two dataframes based on 'id' column
merged_data = pd.merge(df1, df2, on='id')

Conclusion

In this dev post, I have provided a recap of some core data preprocessing techniques and procedures using Python's Pandas library. I covered data loading and exploration, data cleaning, and data transformation using methods such as dropna, fillna, drop_duplicates, groupby, pivot_table, and merge.

These are just some of the many techniques and procedures available in Pandas for data preprocessing. By mastering these techniques and combining them with other tools in your data analysis toolkit, you'll be well on your way to becoming a proficient data analyst or data scientist.

I hope that this dev post has been helpful in expanding your knowledge of Pandas and data preprocessing, and I wish you the best of luck in your future data projects!