<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: soul-o mutwiri</title>
    <description>The latest articles on DEV Community by soul-o mutwiri (@o_mutwiri).</description>
    <link>https://dev.to/o_mutwiri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1028668%2F9d5a8311-5d5f-44fd-ae5b-3dce7d4a81f1.png</url>
      <title>DEV Community: soul-o mutwiri</title>
      <link>https://dev.to/o_mutwiri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/o_mutwiri"/>
    <language>en</language>
    <item>
      <title>The Role of Data Engineers - AWS</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Wed, 02 Jul 2025 05:50:31 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/the-role-of-data-engineers-aws-3g6j</link>
      <guid>https://dev.to/o_mutwiri/the-role-of-data-engineers-aws-3g6j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wi33l1x9r8jp9xt4xxj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3wi33l1x9r8jp9xt4xxj.png" alt="Image description" width="800" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Building and managing Data Infrastructure and platforms:&lt;/li&gt;
&lt;li&gt;databases&lt;/li&gt;
&lt;li&gt;&lt;p&gt;data warehouses on cloud - s3, aws Glue, Amazon Redshift etc.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ingest data from various sources:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use tools like AWS glue Jobs or aws Lambda functions to ingest data&lt;br&gt;&lt;br&gt;
from databases, applications, files, streaming devices into a centralized data platforms.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Prepare ingested data for analytics&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;use AWS glue, Apache spark, Amazon EMR to prepare data for cleaning, transforming and enriching it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Catalog and document Curated datasets&lt;br&gt;
-use AWS Glue crawlers to determine format and schema, group data into tables. write metadata to aws Glue data Catalog. Use metadata tagging in Data catalog for data governance, compliance and discoverability.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automate regular data workflows and pipelines&lt;br&gt;
simplify and accelerate data processing using services like AWS Glue Workflows, AWS lambda or AWS step functions. &lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The data engineer builds the system that delivers usable data to the data analyst, who querys and analyzes the data to gain business insights/reports/visualizations.&lt;/p&gt;

&lt;p&gt;Before a data engineer begins these questions must be answered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which data should be analyzed? What is its value to the business or organization?&lt;/li&gt;
&lt;li&gt;Who owns the data? Where is it located?&lt;/li&gt;
&lt;li&gt;Is the data usable in its current state? What transformations are required?&lt;/li&gt;
&lt;li&gt;Who needs to see the data?&lt;/li&gt;
&lt;li&gt;After the data is curated and ready for consumption, how should it be presented?&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>MODEL TRAINING AND EVALUATION</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Tue, 01 Jul 2025 03:08:44 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/model-training-and-evaluation-55kn</link>
      <guid>https://dev.to/o_mutwiri/model-training-and-evaluation-55kn</guid>
      <description>&lt;h2&gt;
  
  
  model training
&lt;/h2&gt;

&lt;p&gt;Model training is a big part of Machine learning. it is important to ensure a proper division between training and evaluation efforts.&lt;/p&gt;

&lt;p&gt;It is important to evaluation the model to estimate quality of its predictions for the data that the model has not been trained on. &lt;br&gt;
bUT as a starting point your cannot check the accuracy of predictions for future instances as its supervised learning. so you need to use some of the data that you already know the answer for as a proxy for future data.&lt;br&gt;
Instead of using the same data that was used for training to evaluate. A common strategy is to split all available labeled data into training set, validation set and test set often in 80:10:10 ratio or 70:15:15 &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5tnzt7wc1z5inbjyqr8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn5tnzt7wc1z5inbjyqr8.png" alt="Image description" width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Evaluation
&lt;/h2&gt;

&lt;p&gt;After the model has interacted with unseen test data, we can deploy the model to production and monitor its to ensure business problem was indeed being addressed. &lt;/p&gt;

&lt;p&gt;Its ability to more accurately predict skils, would reduce number of transfers a customer experienced. Thus resulting to a better customer experience.  Model evaluation is used to verify that the model is performing accurately.&lt;/p&gt;

&lt;h2&gt;
  
  
  MODEL TUNING AND FEATURE ENGINEERING
&lt;/h2&gt;

&lt;p&gt;Once we have evaluated our model and began the process of iterative tweaks to the model and our data. We can adjust how fast or slow the model was learning or taking to reach an optimal value.&lt;br&gt;
then we move to feature engineering, &lt;br&gt;
Feature engineering trying to answer questions like what was the time of a customer most recent orders, what was a customer most recent order....we feed these features into the model training aligorithm, it can only learn from exactly what we show it. &lt;/p&gt;

&lt;h2&gt;
  
  
  MODEL DEPLOYMENT
&lt;/h2&gt;

&lt;p&gt;deploying the model, to solve the business needs and meet the expectations suh as directing customer to the correct agent the first time. Imagine if a company has a endless types of products, customer can be sent to a generalizt or even a wrong specialist, who will then figure what customer needs before sending them to agent with right skills... for a company handling millions of customer calls, this is inneffiecient and costs money and time.&lt;br&gt;
customer calls get connected to..wrong department, non-technical support..then correct agent...&lt;/p&gt;

</description>
    </item>
    <item>
      <title>DATA PROCESSING PHASES</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Tue, 01 Jul 2025 02:45:42 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/data-processing-phases-f83</link>
      <guid>https://dev.to/o_mutwiri/data-processing-phases-f83</guid>
      <description>&lt;p&gt;Once a business plan and formulation of the problem to be solved in data analysis or data engineering or machine learning solutions, the next phase is about data collection and preparation.&lt;br&gt;
Data processing steps include data collection and intergration, data preprocesssing and data visualization, and feature engineering.&lt;/p&gt;

&lt;p&gt;example: how to route customers to agents with the right skills thus reducing call transfers. &lt;br&gt;
How can we predict which skill would solve a customer call..&lt;/p&gt;

&lt;p&gt;data collection and intergration ensures the raw data is in once centrally accessible place. &lt;br&gt;
data preprocessing and data visualization involves transforming raw data into an understandable format. this involves data cleaning and exploratory data analysis. &lt;br&gt;
at this stage, we exclude unnecesary labels, entirely inaccurate labels and even combine simmilar labels so as to simplify our model.&lt;br&gt;
data visualization hepps give a quick sense of features and labels summaries. this helps better understand the data and so on. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcltwtlt6mck507ip7em.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcltwtlt6mck507ip7em.png" alt="Image description" width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;feature engineering is the process of creating and extracting variables from data&lt;/p&gt;

&lt;p&gt;example: we want to base predictions on past data from customer service calls, thus supervised learning/ &lt;br&gt;
training our model on historical data that include correct labels or agent skills. then the model can make it own predictions on simmilar data moving forward. the data we need comes from asking questions that help establish our features. &lt;/p&gt;

</description>
    </item>
    <item>
      <title>kenyan debt</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Sat, 21 Jun 2025 12:10:49 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/kenyan-debt-26li</link>
      <guid>https://dev.to/o_mutwiri/kenyan-debt-26li</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqyt5sncgsaotwsa51cbi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqyt5sncgsaotwsa51cbi.png" alt="Image description" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>data</category>
      <category>opendata</category>
      <category>discuss</category>
    </item>
    <item>
      <title>DATA CLEANING: Common data issues and their solutions</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Wed, 21 May 2025 15:50:00 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/data-cleaning-common-data-issues-and-their-solutions-5656</link>
      <guid>https://dev.to/o_mutwiri/data-cleaning-common-data-issues-and-their-solutions-5656</guid>
      <description>&lt;p&gt;Data cleaning is a useful process to articulate the desired state of data before being ingested and used for insights and visualization. Data driven decisions often depend on accuracy of the data being presented. &lt;/p&gt;

&lt;h2&gt;
  
  
  Some of the common data issues are
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Missing data &lt;/li&gt;
&lt;li&gt;Incorrect data &lt;/li&gt;
&lt;li&gt;Outliers&lt;/li&gt;
&lt;li&gt;Duplication &lt;/li&gt;
&lt;li&gt;Irrelevant data &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  potential techniques for fixing
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Missing data &lt;/li&gt;
&lt;li&gt;Imputation (mean, median, mode)&lt;/li&gt;
&lt;li&gt;dropping rows/columns with excessive missing values&lt;/li&gt;
&lt;li&gt;Incorrect data &lt;/li&gt;
&lt;li&gt;Validate against external reference&lt;/li&gt;
&lt;li&gt;standardization of formats&lt;/li&gt;
&lt;li&gt;manual correction by domain expert/review&lt;/li&gt;
&lt;li&gt;Outliers  - deleting / retaining based on domain review&lt;/li&gt;
&lt;li&gt;Duplication  - fuzzy matching, use unique indentifiers&lt;/li&gt;
&lt;li&gt;Irrelevant data - feature importance score, correlation analysis to determine and remove features of low variance or no contribution to target variables.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Pandas based solutions
&lt;/h2&gt;

&lt;p&gt;import pandas as pd&lt;br&gt;
from sklearn.impute import SimpleImputer&lt;br&gt;
from sklearn.ensemble import RandomForestClassifier&lt;br&gt;
from scipy import stats&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 1: Load your dataset
&lt;/h1&gt;

&lt;p&gt;df = pd.read_csv('your_data.csv')&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 2: Handle missing data
&lt;/h1&gt;

&lt;p&gt;imputer = SimpleImputer(strategy='mean')&lt;/p&gt;

&lt;p&gt;df['A'] = imputer.fit_transform(df[['A']])&lt;/p&gt;

&lt;p&gt;df = df.dropna(axis=0, thresh=df.shape[1] // 2)  # Drop rows with &amp;gt; 50% missing&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 3: Handle incorrect data (standardize formats)
&lt;/h1&gt;

&lt;p&gt;df['A'] = df['A'].astype(str)&lt;br&gt;
df['date'] = pd.to_datetime(df['date'], errors='coerce')&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 4: Handle outliers (IQR method)
&lt;/h1&gt;

&lt;p&gt;Q1 = df['A'].quantile(0.25)&lt;br&gt;
Q3 = df['A'].quantile(0.75)&lt;br&gt;
IQR = Q3 - Q1&lt;br&gt;
df = df[(df['A'] &amp;gt;= (Q1 - 1.5 * IQR)) &amp;amp; (df['A'] &amp;lt;= (Q3 + 1.5 * IQR))]&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 5: Remove duplicates
&lt;/h1&gt;

&lt;p&gt;df = df.drop_duplicates()&lt;/p&gt;

&lt;h1&gt;
  
  
  Step 6: Remove irrelevant features (based on feature importance)
&lt;/h1&gt;

&lt;p&gt;X = df.drop(columns=['target'])&lt;br&gt;
y = df['target']&lt;br&gt;
model = RandomForestClassifier()&lt;br&gt;
model.fit(X, y)&lt;br&gt;
feature_importances = pd.Series(model.feature_importances_, index=X.columns)&lt;br&gt;
important_features = feature_importances[feature_importances &amp;gt; 0.01].index&lt;br&gt;
df = df[important_features.tolist() + ['target']]&lt;/p&gt;

&lt;h1&gt;
  
  
  Now df is cleaned and ready for analysis or modeling.
&lt;/h1&gt;

</description>
    </item>
    <item>
      <title>Features that orchestration offers</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Sat, 17 May 2025 17:35:44 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/features-that-orchestration-offers-3lp5</link>
      <guid>https://dev.to/o_mutwiri/features-that-orchestration-offers-3lp5</guid>
      <description>&lt;ul&gt;
&lt;li&gt;High availability no downtime&lt;/li&gt;
&lt;li&gt;Scalability or high performance&lt;/li&gt;
&lt;li&gt;Disaster recovery - backup and restore &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;K8s architecture&lt;br&gt;
A cluster is made of atleast one master node(control plane) and worker nodes which have kubelet running on it. kubelet makes possible for nodes to communicates. &lt;/p&gt;

&lt;p&gt;on master node - api server (entrypoint to k8s cluster), controller manager(keeps track of whats happenin in the cluster DESIRED STATE&amp;lt;&amp;gt; ACTUAL STATE), scheduler(ensures pods placement), etcd(status data, snapshots and data recovery), virtual network(turns all nodes into a one machine). &lt;/p&gt;

&lt;p&gt;K8S components&lt;br&gt;
node -  virtual machine&lt;br&gt;
deployment - is a blueprint of my-app pods, abstraction of &lt;br&gt;
           pods. Db cannot be replicated using a &lt;br&gt;
  deployment; so use statefulset for stateful apps. &lt;br&gt;
  to reduce read/write duplications/ improve consistencies.&lt;br&gt;
pod - creates a running environment abstraction inside a &lt;br&gt;
      container.&lt;br&gt;
each pod gets ip address, emphemeral(data lost on restart)&lt;br&gt;
service  and its ip address stays even when a pod dies, it is also a load balancer&lt;br&gt;
ingress -  does the forwarding to allow external &lt;br&gt;
      communication.&lt;br&gt;
configmap -  configuration of the application, no need to &lt;br&gt;
            create new image just change the config map &lt;br&gt;
secrets -  confidential information is stored here e.g &lt;br&gt;
        credentials. password, certificates&lt;br&gt;
reference secret in deployment/pod&lt;/p&gt;

&lt;p&gt;volume - storage plugged into the kubernetes to aid data &lt;br&gt;
        persistence.&lt;/p&gt;

&lt;p&gt;.....&lt;br&gt;
METADATA&lt;br&gt;
SPECIFICATION&lt;br&gt;
STATUS .. k8s updates status vs specification... e.g replicates... etcd hold the current status of k8s &lt;br&gt;
........&lt;br&gt;
KUBECTL &lt;br&gt;
is cli to interact with cluster api-server to submit commands, to create or delete compoenents. work processes make it happen that is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;create pods, create services.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Make it an external service&lt;br&gt;
type -  service type &lt;br&gt;
default = ClusterIP &amp;gt; AN INTERNAL SERVICE &lt;br&gt;
nodeport = exposes the service on each node ip at a static port for external communication&lt;/p&gt;

&lt;p&gt;configmap and secret must exist before deployments.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Kubernetes foundation</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Thu, 15 May 2025 08:10:22 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/kubernetes-foundation-48jf</link>
      <guid>https://dev.to/o_mutwiri/kubernetes-foundation-48jf</guid>
      <description>&lt;p&gt;Gitops Tools like Flux and ArgoCD are pull based approached... continously monitor the Git repository for changes and pull those changes to update the kubernetes cluster.&lt;/p&gt;

&lt;p&gt;ingress objects define how external traffic shd be routed to different services within the cluster. They expose services within the cluster to external networks without need for each service havings its IP addresses. &lt;/p&gt;

&lt;p&gt;pipeline help automate build, test and deployment of an application.&lt;/p&gt;

&lt;p&gt;Cloud Native Orchestration &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;High level of automation
from development to deployment, CI/CD pipelines with minimal human involvement, backed with version control system like git. easier disaster recovery &lt;/li&gt;
&lt;li&gt;&lt;p&gt;Self healing&lt;br&gt;
failing is expected, health checks monitor applications and restart them, one some parts stop working while other continue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scalable&lt;br&gt;
handling more load, scaling based on metrics like memory to ensure performance of services. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost efficient&lt;br&gt;
usage based pricing, optimized infrastructure usage. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Easy to maintain &lt;br&gt;
microservices allow small and portable aplications easier to test &amp;amp; distribute across teams.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Secure by default&lt;br&gt;
zero trust computing, user and processess are authenticated.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;running containers&lt;br&gt;
to start containers  -  docker run nginx&lt;/p&gt;

&lt;p&gt;building container images&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;containers are metaphoe of shipping containers.&lt;/li&gt;
&lt;li&gt;there is a standard format of a shipping containr to make it easy to stack on a container ship, unload and onto a truck no matter what is inside. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;container images -  are what makes containers portable and easy to reuse. &lt;br&gt;
Docker describes a container image as following:&lt;/p&gt;

&lt;p&gt;_&lt;/p&gt;

&lt;p&gt;“A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;``&lt;br&gt;
__&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;dockerfile&lt;br&gt;
images can be built by reading instructions from a buildfile called dockerfile. &lt;/p&gt;

&lt;p&gt;The instructions are the same as the ones one would use to install an application on a server. &lt;/p&gt;

&lt;h1&gt;
  
  
  Every container image starts with a base image.
&lt;/h1&gt;

&lt;h1&gt;
  
  
  This could be your favorite linux distribution
&lt;/h1&gt;

&lt;p&gt;FROM ubuntu:20.04 &lt;/p&gt;

&lt;h1&gt;
  
  
  Run commands to add software and libraries to your image
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Here we install python3 and the pip package manager
&lt;/h1&gt;

&lt;p&gt;RUN apt-get update &amp;amp;&amp;amp; \&lt;br&gt;
    apt-get -y install python3 python3-pip &lt;/p&gt;

&lt;h1&gt;
  
  
  The copy command can be used to copy your code to the image
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Here we copy a script called "my-app.py" to the containers filesystem
&lt;/h1&gt;

&lt;p&gt;COPY my-app.py /app/ &lt;/p&gt;

&lt;h1&gt;
  
  
  Defines the workdir in which the application runs
&lt;/h1&gt;

&lt;h1&gt;
  
  
  From this point on everything will be executed in /app
&lt;/h1&gt;

&lt;p&gt;WORKDIR /app&lt;/p&gt;

&lt;h1&gt;
  
  
  The process that should be started when the container runs
&lt;/h1&gt;

&lt;h1&gt;
  
  
  In this case we start our python app "my-app.py"
&lt;/h1&gt;

&lt;h2&gt;
  
  
  CMD ["python3","my-app.py"]
&lt;/h2&gt;

&lt;h2&gt;
  
  
  To build the image
&lt;/h2&gt;

&lt;p&gt;docker build -t my-python-image -f dockerfile&lt;/p&gt;

&lt;p&gt;-t my-python-image - specifies the name tag of your image.&lt;br&gt;
-f Dockerfil - specify the dockerfile to be used.&lt;/p&gt;

&lt;p&gt;to distribute these images we use a container registry; where you can upload and download images by push and pull commands&lt;/p&gt;

&lt;p&gt;docker push my-registry.com/my-python-image&lt;br&gt;
docker pull my0registry.com/my-python-image &lt;/p&gt;

&lt;p&gt;some of the public registries is Docker Hub and QUay &lt;/p&gt;

&lt;p&gt;4cs OF CLOUD NATIVE SECURITY&lt;br&gt;
CLOUD&lt;br&gt;
CLUSTER&lt;br&gt;
CONTAINER &lt;br&gt;
CODE &lt;/p&gt;

&lt;h1&gt;
  
  
  These are layer that need to be protected when using containers
&lt;/h1&gt;

&lt;p&gt;CONTAINER ORCHESTRATION FUNDAMENTATIONS&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it is harder to manage alot of containers so you need a system for management of these containers.
container orchestration provides a way to build a cluster of multiple server and host container on top... most systems have a control plane for management of the containers and worker nodes that host the containers. one of the most common systems is kubernetes..to orchestrate containers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Problems to be solved through container orchestration ssytems... &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;providing computing resource like VMs where containers run on&lt;/li&gt;
&lt;li&gt;schedule containers to serve efficiently&lt;/li&gt;
&lt;li&gt;allocate resources like CPU and memory&lt;/li&gt;
&lt;li&gt;scale containers based on the load&lt;/li&gt;
&lt;li&gt;provide networking to connect containers together&lt;/li&gt;
&lt;li&gt;provide storage if containers need to persist data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Networking &lt;br&gt;
to make application accessible to outside -  containers have the ability to map a port from the container to a port from the host system. &lt;br&gt;
to communicate between container across host-  overlay netwoekr is spanned across host systems. the overlay network manages ip address; which container gets which IP address and how the traffic has to flow to access individual containers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffinc37iqyw7dz92l6n78.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffinc37iqyw7dz92l6n78.png" alt="CNI" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;CNI standard -  guides writing and configuring network plugins and makes it easy to swap plugin in various orchestration platforms. &lt;/p&gt;

&lt;h2&gt;
  
  
  SERVICE DISCOVERY and DNS
&lt;/h2&gt;

&lt;p&gt;no need to rememmer the ip addressses of important systems&lt;br&gt;
1000s of containers have individual ip addresses.&lt;br&gt;
diferent hosts in different Data centers and locations&lt;br&gt;
information in containers can be removed when deleted&lt;br&gt;
SERVICE REGISTRY all this information is put in a service registry&lt;br&gt;
SERVICE DISCOVERY - process to find other services in the network and reguest information from them. &lt;/p&gt;

&lt;h2&gt;
  
  
  Approaches to Service Discovery
&lt;/h2&gt;

&lt;p&gt;DNS servers -  have a service APi that register new services as they are created.&lt;br&gt;
KEY-VALUE store - datastore to store infomration about services.  Popular choices for clustering are &lt;strong&gt;etcd, consul and apache zookeeper&lt;/strong&gt;.&lt;br&gt;
They are highly availble systems with strong failover mechanisms.&lt;/p&gt;

&lt;p&gt;SERVICE MESH&lt;br&gt;
Helps with monitoring, access control and encryption of network traffic when containers communicate with each other....a service mesh add a a proxy server to every container in your architecture....a proxy is a software that is used to manage network traffic.... it sits between client and server and modify/filter traffic before it reaches the server. POPULAR proxies are &lt;strong&gt;nginx, haproxy and envoy&lt;/strong&gt;.&lt;br&gt;
proxies handle communication between services....traffic is routed through proxies instead... popular service meshes are &lt;strong&gt;istio **and **linkerd&lt;/strong&gt;. The proxies in a mesh service form the data plane shaping traffic flow... the rules and configs to be applied to proxies are managed centrally in the control plane of the service mesh.&lt;br&gt;
Service mesh interface smi IS STANDARD .&lt;/p&gt;

&lt;p&gt;STORAGE &lt;br&gt;
containers are ephemeral&lt;br&gt;
container images are read only so we need to add a read-write layer to allow writing files.&lt;br&gt;
if need to persist data on host, a volume can be used to achieve that. &lt;br&gt;
directories on host are passed through into container filesystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  **CONTROL PLANE NODES
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; ## KUBE-APISERVER 
centerpiece, all components interact with Api-server, users access the cluster thro it.&lt;/li&gt;
&lt;li&gt;## ETCD
a db that holds state of cluster&lt;/li&gt;
&lt;li&gt;## KUBE-SCHEDULER
when a new workload is scheduled, kube-scheduler chooses a worker node that could fit.&lt;/li&gt;
&lt;li&gt;## KUBE-CONTROLLER-MANAGER
manage state of cluster. A desired no of application is available all the time.&lt;/li&gt;
&lt;li&gt;## CLOUD-CONTROLLER-MANAGER 
 interacts with api of cloud providers like load balancers,storage or SG.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  **Worker PLANE NODES
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Container runtime
responsible for running containers of worker node. Docker and now containerd&lt;/li&gt;
&lt;li&gt;kubelet
Agent that runs on every worker node in cluster. talks to api-server and container runtime to handle stage of starting containers.&lt;/li&gt;
&lt;li&gt;kube-proxy
network proxy that handles inside and outside communication. networking capabiliites of underlying os. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;kubernetes namespace are used to divide a cluster into virtual clusters, organize objects and manage which users has access to which folder.&lt;/p&gt;

&lt;p&gt;setting up a test cluster : minikube, kind, MicroK8s.&lt;br&gt;
setting up prod grade cluster: kubeadmin, kops, kubespray&lt;br&gt;
cloude provider: EKS, GKE, AKS&lt;/p&gt;

&lt;p&gt;Kubernetes API: Authentication (X.509 digitally signed certificates), authorization(RBAC), admission control (Open poLICY Agent to manage admission control externally). &lt;br&gt;
Thro the API a user or service can create, delete, retrieve resources in k8s.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpm2sfibwb1b89sm7npxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpm2sfibwb1b89sm7npxg.png" alt="k8s api" width="800" height="346"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;RUNNING CONTAINERS IN KUBERNETES&lt;br&gt;
Unlike in local machine where you start containers directly, in K8S, they use PODS.&lt;/p&gt;

&lt;p&gt;kubelet &amp;lt;--&amp;gt;cri containerd plugin  containers&lt;/p&gt;

&lt;p&gt;NETWORKING &lt;br&gt;
-Container-to-container communications {pod }&lt;br&gt;
-pod to pod communication {overlay network}&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;pod-to-service communication {kube-proxy and packet filter on the node}
-external-to-service communications {kube-proxy and packet filter on node}&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When implementing networking, choose network vendors: CALICO, CILIUM, WEAVE.&lt;/p&gt;

&lt;p&gt;Every pod gets it own ip address dynamically, USING core-dns, a dns server used for service discovery and name resolution in the cluster.&lt;/p&gt;

&lt;p&gt;Network policies: cluster internal firewalls, with help of selector to specify the traffic allowed to and from pods that match the selector.&lt;/p&gt;

&lt;h2&gt;
  
  
  SCHEDULING
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;process of choosing the right worker node to run a containerized workload on. 
scheduling process starts when a pod is created. 
when a pod is described, scheduler selects a node where the pod actually get started by kubelet and container runtime.
scheduler uses information about application requirements, to filter all nodes that fit the requirements, if multiple fit, the node with least amount of pods is choosen. if fails, the scheduler keeps trying.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  KUBERNETES objects vs WORKLOAD OBJECTS
&lt;/h2&gt;

&lt;p&gt;Objects are described in data serialization language YAML and send them to the API-server, where they get validated before being created. &lt;br&gt;
the version(apiversion), kind of object to be created (kind), metadata(unique data that can be used to indentify it),  Spec(specifications of the object).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;K8s cluster like a factory has many components...security, storage, management of pods. &lt;/li&gt;
&lt;li&gt;K8s objects control how pods are deployed, scaled and managed....they are like different parts of the factory.&lt;/li&gt;
&lt;li&gt;Configuration management, cross-node networking, routing of external traffic, load balancing or scaling of the pods.&lt;/li&gt;
&lt;li&gt;Workload Objects - they actually build and manage the applications. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Breakdown of the Analogy using a factory&lt;br&gt;
Kubernetes Concept  Factory Analogy Purpose&lt;/p&gt;




&lt;p&gt;Pod A single worker assembling a product&lt;br&gt;&lt;br&gt;
Runs one or more containers (the "product").&lt;/p&gt;




&lt;p&gt;ReplicaSet  A controller object to ensure desired number of pods is running at any given time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment  An automated assembly line for mass production  Ensures many copies of a Pod run smoothly::by defining lifecycle, managing replicasetss,stateless applications in k8s.
&lt;/h2&gt;

&lt;p&gt;StatefulSet A specialized assembly line for custom orders   Manages Pods that need unique identities like ip add, stable name, persistent storage and graceful handling of updates and scaling (e.g., stateful applications like databases).&lt;/p&gt;




&lt;p&gt;DaemonSet   Maintenance crew in every section of the factory&lt;br&gt;&lt;br&gt;
Runs a Pod on every node (e.g., log collectors, monitoring and other infrastructure related workload).&lt;/p&gt;




&lt;p&gt;Job A temporary worker hired for a one-time task    Runs a Pod to completion (e.g., a backup job).&lt;/p&gt;




&lt;p&gt;CronJob A scheduled task (like a nightly cleanup crew)  Runs Jobs at specific times.&lt;/p&gt;




&lt;p&gt;Service The shipping department (delivers products) Exposes Pods to the network.&lt;/p&gt;




&lt;p&gt;ConfigMap/Secret    Blueprints &amp;amp; security documents Stores configuration &amp;amp; sensitive data.&lt;/p&gt;




&lt;p&gt;PersistentVolume    Warehouse storage   Provides long-term storage for Pods.&lt;/p&gt;




&lt;p&gt;Namespace   Different factory departments (e.g., "Production" vs. "R&amp;amp;D")    Isolates resources in the cluster.&lt;/p&gt;




&lt;p&gt;Key Takeaway&lt;br&gt;
Workload objects = Assembly lines (they handle the actual work of running apps).&lt;/p&gt;

&lt;p&gt;Other Kubernetes objects = Supporting roles (security, storage, networking, etc.).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;KUBECTL&lt;/strong&gt; official command line interface client&lt;/p&gt;

&lt;p&gt;kubectl api-resources - - to list available objects&lt;/p&gt;

&lt;p&gt;kubectl explain pod - - to get more info about object&lt;/p&gt;

&lt;p&gt;kubectl explain pod.spec - - more about pod spec &lt;/p&gt;

&lt;p&gt;kubectl create -f .yaml - - create object in k8s from yaml &lt;/p&gt;

&lt;h2&gt;
  
  
  HELM
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt; used to create templates and package k8s objects. package manager, allowing easier update and interaction with objects. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  HELM Charts - helm packages k8s objects in charts, which can be shared with other via registry (artifactHub).
&lt;/h2&gt;

&lt;h2&gt;
  
  
  POD CONCEPT
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Collection of one or more containers that shared a namespace and cgroups. &lt;/li&gt;
&lt;li&gt;&lt;p&gt;it is the smallest unit of deployable unit in k8s. &lt;br&gt;
A pod allows combination of multiple processes that are interdependent. all containers in a pod share ip address and even filesystem even if they are different images. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sidecare container - supports the main application like logging, monitoring,security, proxying within the same pod.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;initContainers - starts containers before main application starts....containers: .... initContainers: .....&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Resources - set a resource request and a maximum limit for CPU and memory&lt;/li&gt;
&lt;li&gt;LivelinessProbe - configure health check that periodically check if container is alive, if check fails, container restarted. &lt;/li&gt;
&lt;li&gt;SecurityContext -  set user and group setting and kernel capabilities. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Pod lifecycle phases
&lt;/h2&gt;

&lt;p&gt;pending -- when a pod is in a cluster but no container is set up and ready to run...waiting to be scheduled or downloading images over the network.&lt;br&gt;
Running -- when pod is bound to a node and all containers created, atleast one container running/starting/restarting.&lt;br&gt;
Succeeded -- all containers are terminated and not restarted&lt;br&gt;
Failed -- all container in pod terminated, atleast non-zero status container &lt;br&gt;
Unknown - state could not be obtained.&lt;/p&gt;

&lt;h2&gt;
  
  
  **NETWORKING OBJECTS
&lt;/h2&gt;

&lt;p&gt;** defines and abstract netwokring &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Service objects - used to expose set of pods as network sertvice.&lt;/li&gt;
&lt;li&gt;ingress objects&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  service types
&lt;/h2&gt;

&lt;p&gt;clusterIP - round-robin load balancer- single endpoint for a set of pods. most common service type.&lt;br&gt;
NodePort - extends ClusterIP by adding simple routing rules ..opens a port on every node in a cluster and maps it clusterip - allows external traffic to cluster&lt;br&gt;
LoadBalancer - extends NodePort by deploying external LB instance. need API like GCP, AWS.&lt;br&gt;
ExternalName - Uses K8S internal DNS server to create DNS alias. resolves hostnames..&lt;br&gt;
Headless Services -  depends on whether service has selectors defined ...eg statefulset containter&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7eqsnrt64go5932pgya9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7eqsnrt64go5932pgya9.png" alt="Service Types" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Ingress Objects&lt;br&gt;
provides a means to expose HTTP and HTTPS routes from outside of cluster for a service within the cluster. it routing rules are implemented with ingress controller. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ingress and egress rules are deffined in NetworkPolicies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;kubectl expose -- help &lt;br&gt;
--exposes a resource as a new kubernetes service&lt;br&gt;
kubectl get service ---  helps list services exposed&lt;br&gt;
kubectl scale --help   ;;; scales as per replicasets indicated&lt;/p&gt;

&lt;p&gt;Configuration Objects&lt;br&gt;
-storing confiigs in env&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;its bad practice to include config in container build
-configMap  - store config files as key-value pairs.&lt;/li&gt;
&lt;li&gt;- mount configMap as a volume in Pod&lt;/li&gt;
&lt;li&gt;- map variable from configMap to env variables in pod&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;volumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;name: nginx-conf
configMap:
  name: nginx-conf

&lt;ul&gt;
&lt;li&gt;to store passwords, key and other sensitive information, secret objects are used... secrets as base64 encoded.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;HASHICORP vault is a seret management tool in cloud environment.&lt;/p&gt;

&lt;p&gt;AUTOSCALING Objects&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Horizontal Pod Autoscaler (HPA)&lt;/li&gt;
&lt;li&gt;most used, a second pod, third pod gets scheduled if capacity threshold is met...metrics-server &lt;/li&gt;
&lt;li&gt;Cluster Autoscaler&lt;/li&gt;
&lt;li&gt;if cluster capacity if fized, new worker nodes are added to cluster as per demand...&lt;/li&gt;
&lt;li&gt;Vertical pod Autoscaler&lt;/li&gt;
&lt;li&gt;pod resource requests and limits increase dynamically, new concept...&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;CLOUD NATIVE APPLICATION DELIVERY&lt;br&gt;
continous intergration - building and testing of code - version control and collaboration of temas&lt;br&gt;
Continous Delivery  - automates deployment of pre-built software.  deployed to development or staging before prod envirments or systems.&lt;br&gt;
CICD tools: ArgoCD, jenkins , GitLab&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Principles of GITops and how it intergrates with kubernetes 
Gitops - intergrates provisioning and change process of infrastructure with version control operations. ,, manages infrastructure changes... 
Pull based: agent watches the git repository for changes and if changes are detected, they are applied to the infrastructure  running state/// FLUX and ArgoCD... &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;CLOUD NATIVE OBSERVABILITY&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allow analysis of collected data, understand system and react to errorr states.
## PROMOTHEUS
metrics - quantitative measurements taken over time..error rate
logs - messages of error, warning, debug infor present..
traces - progress of request as it passes through a service ; in distributed system, how long it took when being processed by a service.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;docker log nginx  -  views logs in a container nginx&lt;br&gt;
kubectl logs nginx&lt;br&gt;
kubectl logs -p -c ruby web-1 -- view previous terminated container logs from pod web-1&lt;/p&gt;

&lt;p&gt;kubectl logs -f -c ruby web-1 -- streaming of logs &lt;/p&gt;

&lt;p&gt;fluentd or filebear tools can ship and store logs ... grafana&lt;br&gt;
prometheus - open source monitoing system to collect metrics&lt;/p&gt;

&lt;p&gt;prometheus collect data and grafana helps build dashboard for collected metrics,,with prometheus one of the data sources..&lt;/p&gt;

&lt;p&gt;counter 0 value that increases(error count)&lt;br&gt;
gauge 0 value that increase or decrease(memory)&lt;br&gt;
node-level logging, logging via sidecare container, application-level logging 00pushes logs directly from application running in a cluster...&lt;/p&gt;

</description>
    </item>
    <item>
      <title>docker and dockerfiles</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Wed, 30 Apr 2025 03:30:41 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/docker-and-dockerfiles-4dk</link>
      <guid>https://dev.to/o_mutwiri/docker-and-dockerfiles-4dk</guid>
      <description>&lt;p&gt;Containers help with:&lt;br&gt;
 Dependency management of applications&lt;br&gt;
 Writing secure application code&lt;br&gt;
 Efficient use of hardware resource &lt;br&gt;
Open container initiative runtime specifications (OCI)&lt;br&gt;
Run basic containers&lt;br&gt;
Docker desktop&lt;br&gt;
Docker version&lt;br&gt;
Docker hub&lt;br&gt;
 Building container images is based on iso 668 standardization&lt;br&gt;
 Dockerfile contains instructions on how to build container images with docker.&lt;br&gt;
 Processes are isolated using namespaces and cgroups. &lt;br&gt;
 Container images make containers portable and easy to reuse. It contains everything need to run an application. The code, runtime, system tools. System libraries and settings. &lt;br&gt;
 DOCKERFILE::&lt;br&gt;
Container images&lt;br&gt;
FROM UBUNTU 20:4                                                                                ## this is the base image &lt;br&gt;
RUN apt-get update &amp;amp;&amp;amp; apt-get -y install python python3-pip        #Run commands to add soft and lib&lt;br&gt;
COPY my-app.py /app/                                                      #copy commands copy code to image filesystem&lt;br&gt;
WORKDIR /app                                                                    #define the workdir in which the app runs&lt;br&gt;
CMD [“python 3”, “my-app-py”]                     #process that should be started when the container runs, &lt;br&gt;
                                                                              # we  are running our python app  “my-app-py”&lt;/p&gt;

&lt;p&gt;To Build this image &lt;br&gt;
Docker build -t my-python-image -f Dockerfile&lt;br&gt;
-t my-python-image = specifies a name tag for the image&lt;br&gt;
-f Dockerfile = specifies where your dockerfile can be found&lt;br&gt;
To distribute the images use a container registry&lt;br&gt;
Docker push my-registry.com/my-python-image &lt;br&gt;
Docker pull my-registry.com/my-python-image &lt;/p&gt;

&lt;p&gt;CONTAINER ORCHESTRATION &lt;br&gt;
With large amounts of containers, one needs a system that helps wit the management of these containers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Providing compute resources like virtual machines where containers can run on&lt;/li&gt;
&lt;li&gt; Schedule containers to servers in an efficient way&lt;/li&gt;
&lt;li&gt; Allocate resources like CPU and Memory to containers&lt;/li&gt;
&lt;li&gt; Manage the availability of containers and replace them if they fail&lt;/li&gt;
&lt;li&gt; Scale if load increases&lt;/li&gt;
&lt;li&gt; Provide networking to connect containers together&lt;/li&gt;
&lt;li&gt; Provision storage if containers need to persist data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In most container orchestration system consists of control plane and work node&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Responsible for the management of the containers – control plane&lt;/li&gt;
&lt;li&gt; Work notes -  host the containers
Kubernetes is the standard system to orchestrate containers
Networking
Container networking implementation is based on the container network interface (CNI). 
It guide network plugins and how they can be swapped in different orchestration platforms. 
Network namespaces allow each container to have its own ip address, 
Need to map a port from the container to port from the host system to open access from outside the host system.
Overlay network – puts container across hosts in a virtual network that is spanned across the host systems.
Host network may be 172.16.4.x
Container network – 192.168.8.X &lt;/li&gt;
&lt;li&gt;  Server in this network may have 172.16.4.11, 172.16.4.12, 172.16.4.13
containers inside these each of these servers may derive a wide range of ips  – 192.168.1.1, 192.168.1.4&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Service Discovery and DNS&lt;br&gt;
In container orchestration platforms there are 1000s of containers with individual ip addresses,&lt;br&gt;
Containers deployed in different hosts, data centers and even geolocations.&lt;br&gt;
Use of ip to communicate is nearly impossible to communicate, DNS is used to communicate&lt;br&gt;
All this information is automated through use of service registry. &lt;br&gt;
Finding about other services in the network and requesting info about them is service discovery.&lt;br&gt;
Approaches to service discovery&lt;br&gt;
DNS -  Register new services as they are created&lt;br&gt;
Key-value store -  data stores like etcd, consul, apache zookeeper&lt;/p&gt;

&lt;p&gt;Service Mesh&lt;br&gt;
Service describes how traffic in container platforms is handled by proxies. (SMI)&lt;br&gt;
Proxy is a server application that sits between a client and server. Used to manage network traffic&lt;br&gt;
Popular proxies - Nginx, haproxy or envoy.&lt;br&gt;
A service mesh adds a proxy server to every container that you have in the architecture&lt;br&gt;
Therefore, it helps manage complex and opaque networking, implement monitoring, access control and encryption of networking traffic as containers communicate with each other. &lt;br&gt;
When service meshes are used, traffic is routed through proxies instead of application talking to each other directly&lt;br&gt;
Istio and linkerd are popular service meshes&lt;br&gt;
The proxies in a service mesh form the data plane, where rules centrally managed in the control plane of the service mesh are implemented and shape traffic flow.&lt;br&gt;
Config files are written and uploaded to the control plane to enforce new rules. e.g. service A and service B should always communicate encrypted. &lt;/p&gt;

&lt;p&gt;Storage &lt;br&gt;
Containers are ephemeral&lt;br&gt;
They are read only and read-write is lost when container is stopped or deleted. &lt;br&gt;
To persist container, a volume is used.&lt;br&gt;
Often multiple containers are started on different host systems or a container started on a different container still needs to access its volume. &lt;br&gt;
A robust storage system that is attached to the host servers. Storage is provisioned via a storage system. Server A and server B can share a volume to read and write data. &lt;br&gt;
  Container storage interface (CSI) offers a uniform interface which allows attaching different storage systems no matter if its on premise or cloud.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>KUBERNETES CONTAINERIZATION ORCHESTRATION</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Tue, 29 Apr 2025 16:26:41 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/kubernetes-containerization-orchestration-39hf</link>
      <guid>https://dev.to/o_mutwiri/kubernetes-containerization-orchestration-39hf</guid>
      <description>&lt;p&gt;Kubernetes and Cloud Native Essentials LFS250&lt;br&gt;
Containers are a standardized way to package and ship modern software. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  IMAGE: defines how to build and package container images&lt;/li&gt;
&lt;li&gt;  Runtime: designs configs, execution environment and lifecycle of containers.
Open source standards worthy considering&lt;/li&gt;
&lt;li&gt;  OCI specifies image,runtime, distribution&lt;/li&gt;
&lt;li&gt;  CNI -  container network interface&lt;/li&gt;
&lt;li&gt;  CRI – Container runtime Interface &lt;/li&gt;
&lt;li&gt;  CSI -  container storage interface&lt;/li&gt;
&lt;li&gt;  SMI – service mesh interface 
KUBERNETES =container orchestration systems, open source platform for managing containerized workloads in the IT sector.
Developer hhave a singular view of the application unlike the operations team;
An application may have the following items in a node js application...&lt;/li&gt;
&lt;li&gt;  Frontend service- end user accesses the frontend&lt;/li&gt;
&lt;li&gt;  Db access service -  where the frontend stores the data,&lt;/li&gt;
&lt;li&gt;  Backend service – accesses the database &lt;/li&gt;
&lt;li&gt;  The services are working together and exposed to the end user&lt;/li&gt;
&lt;li&gt;  Master node manages the applications running in the compute resources &lt;/li&gt;
&lt;li&gt;  Within teh containers there is the:
o   The applications itself 
o   Requirements
o   Dependencies for the Underlying Operating systems
o   Application runtime
o   Etc&lt;/li&gt;
&lt;li&gt;  From an operations team perspective, there are several issues to consider in regards to the compute resources in the orchestration platform when running the application in production.
o   Deploying on the master 
 COMPUTE work node – 4 vcpus 
 Scaling- Where each node hosts the containers maybe having one or two front end, 3 backends, 3 databases
 Network - exposing the services to each other and maybe an end user, load balancing 
 Insights – Prometheus and – ability to see the entire service mesh. Self healing and configuration management. 
What is the difference between VM and container
Virtual machine     Container
Host os, hypervisor, (os,libs, run time application)    Host os, runtime engine (docker engine), (actual container with the libraries which will be scaled)
Containers usally start with a manifest and if we need a third party service is introduced, it is easily scalable as they are not running on the same Host. In cloud native, it is modular and portable.
Cloud native architecture &lt;/li&gt;
&lt;li&gt;  Optimize cost , reliability and faster time to market through high level of automation (cicd pipelines to help rebuild the system incase of disaster, accommodate incremental changes, testing and deployment applicatiosn )&lt;/li&gt;
&lt;li&gt;   its design patterns that help build and run scalable applications in modern, dynamic environments such as hybrid clouds, private and public when under alot of load.&lt;/li&gt;
&lt;li&gt;  Instead of monolithic approach, cloud native architectural designs means we are looking at:
o   Containers  and microservices 
o   Service mesh
o   Immutable infrastructure -  self healing, healthchecks, 
o   Declarative Apis 
Scaling services that have alot of load like shopping cart and checkout. Despite its advantages, its complex to intergrate microservices architecture. 
Traditionally, Once you’re inside a zone, you can access every system inside. Patterns like zero trust computing mitigates that by requiring authentication from every user and process.
Autoscalling 
= configure min and max limit of instances, metric to trigger scalling. Esp. On demand pricing models are desired for autoscaling ...it improves resilience and service availability.
Means the resources are dybamically adjusted based on the current demand. ...metrics like CPU and memory can decide when to scale based on increase or decreate in loads
Horizontal scalling -  like spawning new compute resources -  new racks and hardware  A, B, C 
Vertical scaling -  change in size of hardware like adding more cpu or memory Ram slots IN SERVER A.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Serverless&lt;br&gt;
= abstract the underlying infrastructure &lt;br&gt;
= based on ideal of scalling and provisioning based on events like incoming requests from an event data across ervices, platforms etc.&lt;br&gt;
No need to prepare and configure resources like load balancers, ec2, OS and network to run an application. &lt;br&gt;
Let the cloud provider choose the right environment, just provide the application code. &lt;br&gt;
Ideal for = Event or data streams, scheduled tasks, business logic and batch processing.&lt;/p&gt;

&lt;p&gt;FUNDAMENTALS OF CONTAINER ORCHESTRATION&lt;br&gt;
Container Orchestration&lt;br&gt;
Introduction&lt;br&gt;
Container Orchestration&lt;br&gt;
Use of Containers&lt;br&gt;
Container Basics&lt;br&gt;
Running Containers&lt;br&gt;
Demo: Running Containers&lt;br&gt;
Building Container Images&lt;br&gt;
Demo: Building Container Images&lt;br&gt;
Security&lt;br&gt;
Container Orchestration Fundamentals&lt;br&gt;
Networking&lt;br&gt;
Service Discovery &amp;amp; DNS&lt;br&gt;
Service Mesh&lt;br&gt;
Storage&lt;/p&gt;

</description>
    </item>
    <item>
      <title>INTRODUCTION TO POSTGRES AND DBEAVER</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Wed, 09 Apr 2025 21:46:02 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/introduction-to-postgres-and-dbeaver-1lma</link>
      <guid>https://dev.to/o_mutwiri/introduction-to-postgres-and-dbeaver-1lma</guid>
      <description>&lt;p&gt;local host mayb keep losing dataso you need to go dbeaver and invalidate /reconnect&lt;br&gt;
also you can replace localhost with 127.0.0.1 or with the assigned ip address. &lt;/p&gt;

&lt;p&gt;this will help when there is no localhost and need to use the database within the local network or host. &lt;/p&gt;

&lt;p&gt;the port where the db is running&lt;br&gt;
by default postgress 5432&lt;/p&gt;

&lt;p&gt;assignment is on &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Host: 172.178.131.221&lt;/li&gt;
&lt;li&gt;Port: 5432&lt;/li&gt;
&lt;li&gt;Database: warehouse&lt;/li&gt;
&lt;li&gt;User: luxds&lt;/li&gt;
&lt;li&gt;Password: 1234&lt;/li&gt;
&lt;li&gt;Schema: dataanalytics&lt;/li&gt;
&lt;li&gt;Table: international_debt
&lt;a href="https://neon.tech/postgresql/postgresql-administration/psql-commands" rel="noopener noreferrer"&gt;https://neon.tech/postgresql/postgresql-administration/psql-commands&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1tkkge7a6so66miciz9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1tkkge7a6so66miciz9.png" alt="Image description" width="800" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;select * from datanalytics.international_debt&lt;br&gt;
select count(*) from dataanalytics.international_debt&lt;br&gt;
;&lt;br&gt;
github repository containinng sql scripts plus and article for intro to sql for data analytics...&lt;/p&gt;

&lt;p&gt;harun mbaabu&lt;br&gt;
22:27&lt;br&gt;
SQL STATEMENT TO CHECK NUMBER OF ROWS?&lt;br&gt;
Elijah Mwangi&lt;br&gt;
22:28&lt;br&gt;
Select count(&lt;em&gt;);&lt;br&gt;
dancan Kombo&lt;br&gt;
22:28&lt;br&gt;
select count(&lt;/em&gt;) from dataanalytics.international_debt&lt;br&gt;
You&lt;br&gt;
22:29&lt;br&gt;
select count(*) as row_counts from analytics.internationa_debt&lt;br&gt;
harun mbaabu&lt;br&gt;
22:29&lt;br&gt;
1). What is the total amount of debt owed by all countries in the dataset?&lt;br&gt;
harun mbaabu&lt;br&gt;
22:31&lt;br&gt;
2). How many distinct countries are recorded in the dataset?&lt;/p&gt;

&lt;p&gt;3). What are the distinct types of debt indicators, and what do they represent?&lt;/p&gt;

&lt;p&gt;4). Which country has the highest total debt, and how much does it owe?&lt;/p&gt;

&lt;p&gt;5). What is the average debt across different debt indicators?&lt;/p&gt;

&lt;p&gt;6). Which country has made the highest amount of principal repayments?&lt;/p&gt;

&lt;p&gt;7). What is the most common debt indicator across all countries?&lt;/p&gt;

&lt;p&gt;8). Identify any other key debt trends and summarize your findings&lt;/p&gt;

&lt;p&gt;markdown, push the changes to github, &lt;/p&gt;

</description>
      <category>postgres</category>
      <category>database</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>INTRODUCTION TO PYTHON</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Wed, 09 Apr 2025 21:45:37 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/introduction-to-python-3fah</link>
      <guid>https://dev.to/o_mutwiri/introduction-to-python-3fah</guid>
      <description></description>
      <category>python</category>
      <category>tutorial</category>
      <category>learning</category>
      <category>beginners</category>
    </item>
    <item>
      <title>INTRODUCTION TO SQL</title>
      <dc:creator>soul-o mutwiri</dc:creator>
      <pubDate>Wed, 09 Apr 2025 21:45:18 +0000</pubDate>
      <link>https://dev.to/o_mutwiri/introduction-to-sql-i71</link>
      <guid>https://dev.to/o_mutwiri/introduction-to-sql-i71</guid>
      <description>&lt;p&gt;SQL databases are used in digital spaces for instance in commerce websbites, social media and other online spaces. &lt;br&gt;
they help store and manipulate data, using sql queries, analyst can derive insights from the data stored in the data. &lt;/p&gt;

&lt;p&gt;lets dive in and create a table, where the data is stored.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;customers table showcasing table fields.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;user_ID|firstname|lastname|age|city&lt;/p&gt;

&lt;p&gt;SQl query to select all the data from this table looks like this:&lt;br&gt;
select * from customers&lt;/p&gt;

&lt;p&gt;it is important to note that SQL is case insensitive so &lt;br&gt;
select =  SELECT&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A SELECT statement is used to select data from a table.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SQL query find the name and salary columsn from the Employees table:&lt;/p&gt;

&lt;p&gt;select name, salary&lt;br&gt;
from Employees;&lt;/p&gt;

&lt;p&gt;SQL query to sort customers by their age and limit by the 2 oldest customers. in this query, where is used to filter out the rows which have no age value. Desc will be used to rank the oldest to the youngest.&lt;/p&gt;

&lt;p&gt;select * from customers&lt;br&gt;
where age is not null&lt;br&gt;
order by age desc&lt;br&gt;
limit 2&lt;/p&gt;

&lt;p&gt;SQL query that selects the 3rd to 5th rows from the Employeees, ordered by the salary column in descending order.&lt;/p&gt;

&lt;p&gt;select * employees&lt;br&gt;
order by salary desc&lt;br&gt;
limit 3 offset 2&lt;/p&gt;

&lt;p&gt;STRING FUNCTIONS&lt;br&gt;
there are some useful functions in sql to work with on text data &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
we can use concat function to combine text from multiple columns. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;select concat (first_name, '', lastname) as name&lt;br&gt;
from customers&lt;br&gt;
order by lastname desc;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Lower and upper functions&lt;br&gt;
this convert the texts in the column to lowercase or uppercase respectively.&lt;br&gt;
select lower(firstname) from customers&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Substring function&lt;br&gt;
it allows you to extract part of the text in a column, it takes the starting position and the number of characters we want to extract&lt;br&gt;
to take the first 3 characters of the firstname&lt;br&gt;
select substring(firstname, 1,3)&lt;br&gt;
from customers;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>sql</category>
      <category>tutorial</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
