While whispers of persistent coughs and distant outbreaks linger, the global narrative of COVID-19 seems to have shifted. Yet, the virus's shadow remains, particularly in regions like Kenya. As an MPH student in Epidemiology and Disease Control, I embarked on a data-driven exploration of Kenya's COVID-19 journey using the World Health Organization (WHO) dataset, venturing up to December 31, 2023.
Cleaning and Shaping the Data:
Before delving into the Kenyan story, I addressed the messy reality of data. Country names were streamlined for clarity (Tanzania replacing "United Republic of Tanzania"), and missing values were tackled.
I carved out a dedicated dataframe for Kenya, ready for focused analysis. Some of the code I utilized to achieve this is tagged below:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df['Country'].replace({"Bolivia (Plurinational State of)":"Bolivia", "Democratic Republic of the Congo":"DRC", "Iran (Islamic Republic of)": "Iran", "Kosovo (in accordance with UN Security Council resolution 1244 (1999))": "Kosovo", "Micronesia (Federated States of)": "Micronesia", "Netherlands (Kingdom of the)": "Netherlands", "occupied Palestinian territory, including east Jerusalem": "Palestine", "Republic of Korea": "South Korea", "Republic of Moldova": "Moldova", "Russian Federation": "Russia", "Syrian Arab Republic": "Syria", "United Kingdom of Great Britain and Northern Ireland": "UK and Norther Ireland", "United Arab Emirates": "UAE", "United Republic of Tanzania": "Tanzania", "United States of America": "USA", "United States Virgin Islands": "Virgin Islands", "Venezuela (Bolivarian Republic of)": "Bolivia"}, inplace=True)
Kenya_Statistics = df[df['Country'] == 'Kenya']
Unveiling Kenya's COVID-19 Landscape:
The data revealed a captivating story:
Peak Panic: December 26, 2021, saw Kenya grapple with its highest reported caseload – a staggering 19,023.
Early Echoes: The lowest case numbers were recorded on January 5, 2020, likely reflecting limited detection efforts in the pandemic's nascent stages.
Spikes and Silences: The data displayed periods of worrying spikes, interspersed with quieter stretches. However, a concerning gap emerged after November 11, 2023, hindering further analysis and potentially impacting the accuracy of predictions.
Predicting the Future with Prophet:
Despite the data gap, I ventured into the realm of prediction using Prophet, a simple yet powerful forecasting tool. The model, while projecting zero cases for later periods, highlighted the limitations of incomplete training data. This serves as a stark reminder: accurate models rely on robust and comprehensive data.
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(Kenya_Statistics, test_size=0.2, shuffle=False)
from prophet import Prophet
train_prophet = train_data.reset_index().rename(columns={'Date_reported': 'ds', 'New_cases': 'y'})
prophet_model = Prophet()
prophet_model.fit(train_prophet)
future = prophet_model.make_future_dataframe(periods=5, freq='M')
forecast = prophet_model.predict(future)
prophet_model.plot(forecast, xlabel='Date', ylabel='New cases and New deaths', figsize=(15, 6))
plt.title('Forecast: Infections and Deaths Over Time in Kenya next 5 months')
plt.legend()
plt.show()
This points the need towards testing validity and reliability of data when developing models.
Beyond the Numbers:
This exploration offers valuable takeaways:
Data matters: Highlighting the importance of data quality and completeness for reliable predictions.
Machine learning's potential: Demonstrating the power of machine learning tools like Prophet in healthcare decision-making.
Addressing data gaps: Emphasizing the need for continuous data collection and filling existing gaps for accurate analysis.
Machine learning models could help various industries in predicting future results. A production facility could use data to predict the production output of a process in the future. It could also be used to predict health events such as epidemics.
The Road Ahead:
My short journey through Kenya's COVID-19 data is just the beginning. Further research is needed to address data gaps, refine models, and provide reliable predictions for informed decision-making. As we navigate the pandemic's evolving landscape, let's remember: that high-quality data is our compass, and machine learning tools can be powerful allies in charting a safer future.
The code I have used for my models and Exploratory Data Analysis can be found at my Github .
Top comments (0)