DEV Community: NgetichB

RAG for Dummies

NgetichB — Sat, 13 Sep 2025 21:04:33 +0000

Retrieval Augmented Generation(RAG) is a machine learning technique that enhances the capabilities of Large Language Models to provide more accurate and up-to date responses.

How RAG works:

(i)The user asks the Large Language Models (LLM) a question

(ii)Retrieval is the second step whereby the RAG system uses the question asked to search an external knowledge base for relevant information. The RAG system uses these three techniques; chunking, embedding and vector database. Chunking- the information in the knowledge base is broken into smaller pieces for efficient searching, the chunks are then converted into numerical representations that capture their meanings, finally the system searches a vector database to find the chunks that are more likely similar to the question asked.

(iii)Augmentation-The most relevant information from the retrieval process is then added to the original question to form an ‘augmented prompt’

(iv)The LLM receives the prompt and uses the original question and the retrieved context to generate a more comprehensive and accurate response

Models used in RAG

(i)Retrieval Models- these act as a detective that gathers relevant documents from the external knowledge base before the LLM generates an answer. The two types of retriever models are Sparse- examples BM25 and TF-IDF and Dense retrievers -eg Llamaindex & Haystack.

(ii) Language Models (LLMs)- the generation component takes the users original prompt and the retrieved information and uses its learned knowledge to create a coherent, natural language response. The examples are- Transformer-based models( GPT-2, GPT-3, and BART (Bidirectional and Auto-Regressive Transformers) and Flan T5 used for the generation part

RAG is applied in Medical AI, chatbots, chat engines and legal assistance. It serves the purpose of bridging the gap between static information and dynamic knowledge hence reduces ambiguity and increases precision, transparency and accuracy.

UNSUPERVISED LEARNING: Clustering…

NgetichB — Sun, 31 Aug 2025 01:51:53 +0000

Unsupervised learning is a type of machine learning where algorithms analyze and cluster unlabeled datasets to identify hidden patterns, similarities and structures without predefined categories or even human intervention.
You may ask yourself what the difference between supervised and unsupervised learning is.. Well it is the objectives of each. Concrete and ground-truth labels are necessary for supervised learning in order to train models that respond to highly particular queries whereas unsupervised learning aims at analyzing only features, sorting them into groups without specific tasks
How Unsupervised Learning works;
(i)Unlabeled data input- the algorithm is provided with a huge dataset without pre-defined labels
(ii)Pattern discovery- the algorithm processes the raw data provided, identifying relationships, patterns and structures
(iii)Inference and Organization-the algorithm infers rules and organizes the data provided to it by clustering, association rule and dimensionality reduction
(iv) Insight Generation- The result is a model that offers fresh perspectives, such spotting obscure subgroups in the data or identifying linkages might have been missed.
The three main models used in unsupervised learning are; Clustering Models, Dimensional Reduction Models and Association Rule Model.
Lets have a look into the Clustering Model:
Clustering is a technique in unsupervised learning that divides a collection of data points into groups according to how similar they are. It uses unlabeled data to find underlying structures and patterns in the dataset. The techniques of clustering are listed below:
(i)K-Means Clustering which is an unsupervised machine learning technique that is used to partition a number of observations in to clusters where each observation belongs to the cluster with the nearest mean.
(ii)Hierarchial Clustering is an unsupervised machine learning technique in clustering that creates a hierarchy of nested clusters by combining related data points into clusters. Hierarchical clustering creates a hierarchy of nested clusters by combining related data points into clusters. Hierarchical clustering does not require the number of clusters to be predetermined, unlike K-Means clustering. The two types of Hierarchial clustering are: Agglomerative and divisive clustering.
Clustering is an important technique that is practically applicable in various fields. Based on the data characteristics rather than the process itself, it is key to choose the right clustering method to use.

Supervised Learning: Classification

NgetichB — Sat, 23 Aug 2025 22:31:47 +0000

Supervised learning is a type of machine learning where a given algorithm learns from a labeled dataset that consists of input and output pairs;
It is used for regression(predicting progressive numerical values)and classification (Categorization of data into discrete classes)

How does Classification work in machine learning?

(i)Data Preparation where a dataset containing input features and their corresponding class labels is prepared

(ii) Model Training whereby a classification algorithm (e.g., Logistic Regression, Support Vector Machines, Decision Trees, K-Nearest Neighbors) is chosen and trained on the labeled training data, here the algorithm learns the relationship between the input features and the target class labels.

(iii) Evaluation of the model- the performance of the trained model is evaluated based on the testing data. The metrics- accuracy, precision, recall and F1-score are used for evaluation

(iv)Prediction- A model can be used to predict the class label once the model is trained and evaluated

What Models are used for Classification?!
-Logistic Regression
-Decision Trees
-Support Vector Machines (SVMs)
-Naive Bayes
-K-Nearest Neighbors

While precision, recall and accuracy are useful metrics for evaluating classifiers, determining an appropriate balance between false positives and false negatives depends on the particular problem application and the relative costs of each.
Machine learning is important as it results in automation of tasks and enhances accuracy, efficiency and proper decision making; which are key in the most important industries that surround and support human existence in general

How Excel is used in Real-World Data Analysis

NgetichB — Wed, 11 Jun 2025 08:53:09 +0000

Excel is a spreadsheet program developed by Microsoft that features calculation capabilities, graphic tools, pivot tables and VBA-a macro programming language. Excel organizes data or information in rows and columns

Excel is a useful tool in the real-world data analysis! It is used for business analysis-use of various types of data to help in decision making with regards to the business. Excel is used for financial planning and reporting-one can generate financial statements and balance sheets. Financial reporting is essential in predicting future financial performances and assessing probable investments and risks that may be involved. Additionally, excel is used to store and manage data hence making it easy to access and extract information when needed

The features in excel are:
1.Data entry and management-Excel allows one to enter data in rows and columns, the data can further be used to perform tasks by applying the formulas in excel such as SUM, AVERAGE & IF
2.Formulas: Excel can be used to perform functions; for instance, the SUM adds the values in the cells, AVERAGE calculates the average of values in a particular column, VLOOKUP is used to search for a value in column and match it in the second column
3.Operators-Excel consists of various operators such as +, -, *, <,>...etc, these are used to perform calculations during the analysis of data.

After learning excel, my perception of it has really changed, it is a simple tool that is interesting and powerful at the same time, I can't wait to explore more of it and apply it!