DEV Community: Njeri Kimaru

Does ramalama make AI boring?? Running AI models with Ramalama.

Njeri Kimaru — Mon, 30 Mar 2026 16:48:27 +0000

What is ramalama

Ramalama is an open source command line tool that makes running AI models locally simple by treating them like containers.
Ramalama runs models with podman/docker and there's no config needed.
It is GPU optimizedand accelerates performance.
It is compatible with llama.cpp, openvino, vLLM, whisper.cpp and manymore.

Installing ramalama

Ramalama is easy to install.
After installing check the version you are using.

sudo dnf pip install python3-ramalama

ramalama version

Ramalama supports multiple model registries(transports);

1. Ollama

It is the quickest and easiest registry.
Here are a few AI models i ran using ollama.

ramalama run granite moe3

ramalama run ollama://llama4:scout

2. Hugging face

Some hugging face model require one to login.
Here are some that don't require logging in:

ramalama run huggingface://instructlab/granite-7b-lab-Q4_K_M.gguf

ramalama run huggingface://microsoft/Phi-3-mini-4k-instruct-q4.gguf

3. Modelscope

Model scope worked quite well too.
but I had to upgrade ramalama's version.

sudo dnf upgrade ramalama

Here are some of modelscope's model I used;

ramalama run modelscope://Qwen/Qwen2.5-7B-Instruct-GGUF/qwen2.5-7b-instruct-q4_k_m.gguf

4. OCI registries

Let's start with what is OCI?
OCI(Open Container Initiative), is a standard or a specification which defines how containers and their images should be packaged and determined.
There are several OCI registries;

quay.io
docker.io
github container registry(ghcr.io) In github I had to login first then get an authentication token. Afterwards, I pushed a model then accessed using the ghcr.io

ramalama convert ollama://mistral oci://ghcr.io/njeri-kimaru/mistral:gguf

ramalama run oci://ghcr.io/njeri-kimaru/mistral:gguf

google container registry(gcr.io)
amazon elastic container registry(ecr.io)
Ramalama Container Registry(rlcr.io)

5.URL based source

RamaLama also supports loading models directly from URLs instead of registries.

They include:

https:// → download from the internet
file:// → load from your local machine

6.Hosted API

For a model like Openai to run it requires a secret key which you get from openai API-keys then you'll have to pay for your model to run successfully.

Docling CLI to parse PDFs and export it to multiple formats

Njeri Kimaru — Sat, 28 Mar 2026 03:22:25 +0000

What is Docling ???

Docling is an open source document processing library that converts various document formats into structured outputs.
Docling plays an important part in the RAG pipeline.

I'll be taking you through the process of parsing PDFs into structured formats.

Step 1: Set up

Create the project structure in your terminal;

mkdir docling_cli
cd docling_cli

Create your virtual environment and activate it. Fedora

Windows

Step 2: Installing docling

pip install docling
docling --version

Fedora

Windows

Check the docling's version

Step 3: Creating input and outputs folders

create a folder called data where you will stored your desired pdfs.
create a new folder and name it outputs then inside the folders create new folders called; markdown outputs, html outputs and json outputs.

step 4: Default options.

Start by running default options
run;
Changes pdf into markdown format.

docling your-pdf

Step 4: Changing the pdfs into html format

docling --to html *.pdf --output ~Documents/docling_cli/outputs/html_outputs

Step 5: Changing the pdfs into other formats

1. Markdown

docling --to md *.pdf --output ~Documents/docling_cli/outputs/markdown_outputs

2. Json

docling --to json *.pdf --output ~Documents/docling_cli/outputs/json_outputs

3. Plain text

docling --to text *.pdf --output ~Documents/docling_cli/outputs/plaintext_outputs

4. yaml

docling --to yaml *.pdf --output ~Documents/docling_cli/outputs/yaml_outputs

5. html_split_page

docling --to html_split_page *.pdf --output ~Documents/docling_cli/outputs/html_split_page_outputs

6. DOCtags

docling --to doctags *.pdf --output ~Documents/docling_cli/outputs/doctags_outputs

7. vtt

docling --to vtt *.pdf --output ~Documents/docling_cli/outputs/vtt_outputs

Step 6: Analyzing the result findings.

I used three types of pdfss;
one with tables, the other with text and images and the other had tables and paragraphs. Here are my key findings;

1. Pdf with tables

In HTML, the rows and columns came out better than they were in the original pdf.
Markdown outputs were good too as it wrote the tables in markdown format without losing anything.
JSON was broke everything down into nested objects
Plain text was good too but not as compared to markdown.

2. Pdf with text and images

HTML lost the color of the images.

3. Pdf with tables and paragraphs

Paragraphs in all formats came out nicely as texts.

Dealing with unstructured, scanned multilingual pdfs?? Here's how to parse them using OCR engines with docling CLI.

Njeri Kimaru — Wed, 25 Mar 2026 22:23:04 +0000

WHAT DOCLING OCR?

Docling OCR is an open-source document processing library developed by IBM research. It is designed to parse and convert complex multilingual documents into json or markdown. Most documents need to be parsed before translating.

What is OCR.

OCR stands for Optimal Character Recognition ability.
It allows to exract text from:

images
scanned pdfs
multilingual documents.

I'll be taking you through the steps of parsing scanned multilingual documents.

Creating a folder and a virtual environment

mkdir your_foldername
cd your_foldername

The first step is to always create a folder where you'll store your files. Then cd that folder. Finally, create a virtual environment and activate it.
Fedora

Windows

Install docling and easyocr using python package manager

NB; Docling takes sometime to install

pip install docling

Fedora
docling install

easy ocr install

pip install easyocr

Windows

Check for the versions of both docling and easyocr.

Fedora

docling --version

pip show easyocr

Windows

Look for scanned multilingual or non-english pdf.

I would recomend this site here for your scanned multilingual documents.
Create an account and log in and search for the documents you would like to use.
Finally, download them using pdf format and save them into a folder.

Converting document into html or markdown using docling.

Start by creating a folder where you will save your output files.

mkdir your_output_folder
cd your_output_folder

Then run the following codes using Docling CLI
docling original_scanned_pdfs/hindu_scanned.pdf  #your pdf
--ocr   #enables ocr since you have a scanned documment
--ocr-engine easyocr   #specifies the ocr engine
--ocr-lang hi  #specifies the language in my case it's hindu
--to md   #specifies output format md markdown
--output ./markdown_output/  #where the output will be saved

Here's the output Fedora

Here are some of the languages abbreviations you can use in the easyocr-lang;

English: en
Hindi: hi
French: fr
German: de
Spanish: es
Portuguese: pt
Italian: it
Dutch: nl
Russian: ru
Chinese (Simplified): ch_sim
Chinese (Traditional): ch_tra
Japanese: ja
Korean: ko
Arabic: ar

Now let's try other OCR-engines;

1. Rapid ocr

As the name itself says it is very fast.
It converts the languages into html or markdown really fast. code steps:
install rapidocr using pip

pip install rapidocr
pip show rapidocr #to get the version

But like seen below you must install onnxruntime

pip install onnxruntime

Then run your codes to get your output; eg my arabic pdf;


docling   # we are using docling cli
--ocr     # ocr
--force-ocr 
--ocr-engine rapidocr  #specify your ocr engine
--to md                # to markdown format
--output ./markdown_outputs_rapidocr  #save in this folder
./original_scanned_pdfs/arabic_scanned.pdf #arabic pdf

Outputs

2. Tesseract

Install the OCR engine:

pip install tesseract
pip show --version

Requires one to install packages for every language.eg for arabic.

curl -L https://github.com/tesseract-ocr/tessdata/raw/main/ara.traineddata -o ~/Documents/docling_ocr/tessdata/ara.traineddata

![ ](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/oitfh2zzqi90stuwa0mo.png)
- Then run the parsing codes and save them in a folder

docling # we are using docling cli
--ocr # ocr
--force-ocr
--ocr-engine tesseract #specify your ocr engine
--to md # to markdown format
--output ./markdown_outputs_tesserocr #save in this folder
./original_scanned_pdfs/arabic_scanned.pdf #arabic pdf

Here are the outputs;

3. Tesserocr OCR

install the ocr and check the version.

pip install tesserocr
pip show --version

install the ocr and check the version.

Requires one to install each language package

wget -P /directory-to-where-you-want-to-store/tessdata \
  https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata \
  https://github.com/tesseract-ocr/tessdata/raw/main/swa.traineddata

parse the documents and save them in a tesserocr folder

- docling   # we are using docling cli
--ocr     # ocr
--force-ocr 
--ocr-engine tesserocr  #specify your ocr engine
--to md                # to markdown format
--output ./markdown_outputs_tesserocr  #save in this folder
./original_scanned_pdfs/arabic_scanned.pdf #arabic pdf

Here's the output

Let's analyse these different OCR-engines.

1. Easyocr

It is easy to use you just install easyocr then input your parsing code and that's it.
However, it takes a lot of time to parse your document, it is realy slow.

2. Rapidocr

You have to install onnxruntime.
As the word itself says it is really fast and installation is not complex at all.

3. Tesseract

You have to install a lot more packages hence the installation is really complex.
For every language you must upload it's own package.
Give really good results.

4. Tesserocr

Uses tesseract internally.
A bit complex when installing.

5 Ocrmac

Requires mac to install.

Result Findimgs

Other than differing character changes in the french document outputs, the ocr-engines gave almost similar results for the scanned pdfs.
easy ocr combines only the listed languages
using html

Wondering how to join outreachy? Look no further. A clear onboarding guide for outreachy application process.

Njeri Kimaru — Tue, 24 Mar 2026 19:35:53 +0000

What is outreachy
Outreachy is a community which provides internships in open source.
It is a paid remote internship.
Outreachy provides internships to anyone from any background who faces underrepresentation, systemic bias, or discrimination in the technical industry where they are living.

What is open source
Open source refers to a software whose source code is publicly available for anyone to view, use, modify, and share.

Outreachy application process;

1.The initial process

This is the first step in the outreachy application process.
Please check out on here on the initial application stage guidelines.
Here are some of the tips that I would advise the 2026 outreachy applicants December round;

Apply early. Applying early increases your chances of getting your initial stage approved as there are a lot of applicants.
Have your four essays ready before the initial application stage opens. I wrote mine two weeks before the portal opened such that once the portal opened I just copied and pasted and submitted.
When writing your essay questions please be clear and do not use AI. Be authentic and give your personal stories.
Be alert for the date the outreachy initial stage opens. Outreachy usually have a timeline for when the initial application stage opens. For the specific date please subscribe to their social media handles to get the information once they open.

2. The contribution phase

I am currently in my contribution phase(I am really hoping to join this amazing community) but so far I have the following tips;

Start contributing early.
Carefully look at the projects and choose a project that you are passionate about.
There's a final application form that must be filled only if you've made a contribution.
Contribute as much as you can and also be active and help others.
You can check out this blog from a past outreachy intern.

3. The intern selection phase.

This is the last stage where interns are selected. Then afterwards they start their three months internship.

Have you been underrepresented in the tech industry?? Here's your chance to get involved with a community which supports diversity.

Also I'll be updating more blogs about outreachy.
look out for my next blog on Frequently asked questions on outreachy and another one on how to stand out during the contribution phase.

Fedora linux not fedora hats, a beginner's guide to fedora.

Njeri Kimaru — Tue, 24 Mar 2026 08:29:24 +0000

What is fedora?

When I mention fedora some might think am referring to fedora hats 😂.
Let me introduce you to fedora linux. This term might be new to you and that's okay initially, it was new to me too. Fedora is a free and open source operating system based on linux. However, fedora is more than just a software it's a community project which is open to everyone.
In this article I'll be writing an introduction on the following:

The Fedora project
Fedora linux
The Fedora community

The fedora project.

The Fedora project is a global community of users and developers who collaborate to build Fedora Linux, an open-source operating system. What makes Fedora stand out? For one, it’s completely free and open-source, with new releases every six months and updates for 13 months. The project places strong emphasis on detailed documentation, ensuring users have clear guides on installation and usage. Unlike many Linux distributions, Fedora follows a liberal updates policy—balancing frequent improvements with minimal disruption. Backed by an active and diverse community, it evolves rapidly.

Fedora Linux.

Fedora Linux is a free and open-source Linux distribution developed by the Fedora Project. It was originally developed in 2003 as a continuation of the Red Hat Linux project, and it aims to be on the leading edge of open-source technologies. It is now the upstream source for CentOS Stream and Red Hat Enterprise Linux.

The Fedora community

The fedora community is is an online community aimed at improving people's lives through free software. It was formed in 2003 as a partnership between Red Hat and volunteers from around the world, and has grown to tens of thousands of project members.
Some of Fedora community initiatives include:

Fedora Week of Diversity (FWD): An annual event celebrating the diverse individuals within the Fedora Community.
Contributor Stories: A recognition initiative that highlights individual contributors who have positively impacted others during their time in Fedora.

Conclusion

If you want to get involved in fedora, begin at the Fedora Project website with resources for new contributors, including a mailing list, forums, and chat channels for getting connected with other Fedora enthusiasts.

Also check out for my next blog I'll be writing a guide on how to install fedora and its packages using dnf.

BAYESIAN AND FREQUENTISTS

Njeri Kimaru — Thu, 16 Oct 2025 13:09:02 +0000

Bayesian and frequentist are two different approaches to statistical inference, differing primarily in how they define and use probability to interpret uncertainty. The frequentist approach considers probability as the long-run frequency of an event and views population parameters as fixed but unknown. In contrast, the Bayesian approach treats probability as a degree of belief and considers parameters to be random variables that can be updated with new evidence using prior beliefs and observed data.

Frequentist approach

Probability:

Views probability as the long-run frequency of an event if an experiment were repeated many times. Parameters:
Treats parameters of a model as fixed, but unknown, values. Key output:
Focuses on estimating parameters based on the observed data, often using methods like maximum likelihood estimation. It provides a single best estimate for the parameter. Uncertainty:
Quantifies uncertainty through confidence intervals, which describe the range that would contain the true parameter in a high percentage of repeated experiments. Example:
When testing a coin, the frequentist approach would ask, "What is the probability of getting this result, given a fair coin?" The probability is a property of the data, not the hypothesis itself.

Bayesian approach

Probability:

Views probability as a degree of belief or certainty about an unknown event or parameter. Parameters:
Treats parameters as random variables with their own probability distributions. Key output:
Updates the probability distribution of a parameter based on new evidence, combining prior beliefs with observed data through Bayes' theorem. Uncertainty:
Quantifies uncertainty through a posterior distribution, which is a probability distribution of the parameter after considering the data. Example:
When testing a coin, the Bayesian approach would ask, "What is the probability that the coin is biased, given the results of my experiment?" It starts with a prior belief about the coin and updates it with each flip.

ANOVA

Njeri Kimaru — Thu, 16 Oct 2025 12:37:45 +0000

Types of ANOVA

There are 3 main types of ANOVA, depending on the number of independent variables and interactions involved:

One-Way ANOVA also ttest two sample

What it compares: One independent variable (factor) with 2 or more groups.

Example: Comparing test scores between 3 teaching methods.

Assumption: Groups are independent and data is normally distributed.

Python function: scipy.stats.f_oneway()

from scipy.stats import f_oneway
import numpy as np

campaign_A = [12, 15, 14, 10, 13, 15, 11, 14, 13, 16]
campaign_B = [18, 17, 16, 15, 20, 19, 18, 16, 17, 19]
campaign_C = [10, 9, 11, 10, 12, 9, 11, 8, 10, 9]

f_stats, p_value = f_oneway(campaign_A,campaign_B,campaign_C)
print(f_stats, p_value)
alpha = 0.05

if p_value < alpha:
    print ("reject the null hypothesis")
else:
    print ("fail to reject null hypothesis")

Two-Way ANOVA

What it compares: Two independent variables, possibly with interaction.

Example: Test scores by teaching method and gender (2 factors).

You can also test: Interaction effect — whether the effect of one factor depends on the other.

📍 Usually implemented using statsmodels with a formula:

from statsmodels.formula.api import ols
import statsmodels.api as sm

model = ols('Score ~ C(Method) + C(Gender) + C(Method):C(Gender)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

Repeated Measures ANOVA

What it compares: Same subjects measured under different conditions or times.

Example: Blood pressure before, during, and after treatment on same patients.

Use when: Data is not independent, i.e., repeated measures from same subjects.

Python: statsmodels or pingouin library.

PARAMETRIC AND NON-PARAMETRIC TESTS

Njeri Kimaru — Thu, 09 Oct 2025 12:43:34 +0000

Parametric Tests

Assume your data follows a specific distribution — usually a normal distribution (bell-shaped curve).

Key assumptions:

The data is normally distributed
The sample size is large enough
Data is measured on interval or ratio scale
Homogeneity of variance (similar spread in groups)

Examples:

t-test -- Compare means between 2 groups
ANOVA -- Compare means across 3+ groups
Pearson correlation -- Relationship between two variables
Linear regression -- Predicting outcomes based on predictors

Non-Parametric Tests

Don’t assume any specific distribution of data. These are more flexible, especially for:

Skewed data
Ordinal data
Small sample sizes

Examples:

Mann–Whitney U test -- Non-parametric alternative to t-test
Kruskal–Wallis test -- Alternative to ANOVA
Wilcoxon signed-rank -- Paired samples (like paired t-test)
Spearman correlation -- Non-parametric correlation
Chi-square test Categorical data (e.g., frequencies)

Degrees of Freedom and their role in Statistics

Njeri Kimaru — Thu, 02 Oct 2025 11:26:56 +0000

You’ve seen it in t-tests, chi-square, and regression: Degrees of Freedom (DoF). But what does it really mean?

🎓 A Simple Definition

Degrees of Freedom refers to the number of independent values in a calculation that are free to vary.

Imagine you have 3 numbers that must add up to 100. If you choose two freely, the third is fixed. So you have 2 degrees of freedom.

🧮 Why Do They Matter?

In statistics, DoF adjust for the fact that we estimate parameters (like the mean) from our sample data.

Examples:

1. Sample Variance

import numpy as np
data = [4, 7, 9]
print(np.var(data)) # population variance (DoF = N)
print(np.var(data, ddof=1)) # sample variance (DoF = N-1)

Turning Two Lists into a Dictionary

Njeri Kimaru — Thu, 02 Oct 2025 11:25:03 +0000

Ever had two lists — one with keys and one with values — and wondered how to merge them into a dictionary?

Method 1: Comprehension loop

Let’s say you have:

people = [ "Alice", "Bob", "Charlie", "Diana", "Ethan", "Fiona", "George", "Hannah", "Isaac", "Julia", "Kevin", "Laura", "Michael", "Nina", "Oscar" ] heights = [ 165, 178, 172, 160, 185, 170, 182, 158, 174, 169, 180, 162, 176, 168, 181 ]

Now turning the two lists into a dictionary using comprehension loops:

people_heights = {people[i]: heights[i] for i in range(len(people))} people_heights

outcome:

{'Alice': 165, 'Bob': 178, 'Charlie': 172, 'Diana': 160, 'Ethan': 185, 'Fiona': 170, 'George': 182, 'Hannah': 158, 'Isaac': 174, 'Julia': 169, 'Kevin': 180, 'Laura': 162, 'Michael': 176, 'Nina': 168, 'Oscar': 181}

Method 2: Using zip

Let's say you have:

pumpkin = ["a","b","c","d","e","f"] weights = [19,14,15,9,10,17]

Now combining the lists into a dictionary:

pumpkin_dict = dict(zip(pumpkin,weights)) pumpkin_dict

Outcome:

{'a': 19, 'b': 14, 'c': 15, 'd': 9, 'e': 10, 'f': 17}

List Comprehension vs. Dictionary Comprehension

Njeri Kimaru — Thu, 02 Oct 2025 11:23:28 +0000

Differences between list and dictionary comprehensions in python

Python makes it easy to write clean and compact code using comprehensions. But what's the difference between list comprehension and dictionary comprehension?

📝 List Comprehension

Used to build lists from iterables.

squares = [x**2 for x in range(5)]

output:

print(squares) # [0, 1, 4, 9, 16]

Dictionary Comprehension

Used to create a dictionary by applying expressions to generate keys and values.

squares_dict = {x: x**2 for x in range(5)} print(squares_dict)

Output:

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

With a condition:

even_squares_dict = {x: x**2 for x in range(10) if x % 2 == 0}

Output:

{0: 0, 2: 4, 4: 16, 6: 36, 8: 64}

Understanding Skewness and Kurtosis: A Friendly Guide for Data Enthusiasts

Njeri Kimaru — Thu, 02 Oct 2025 11:21:38 +0000

If you've ever looked at a histogram and thought, "Hmm... this looks weirdly stretched or tilted," you're not alone. What you're noticing might be skewness or kurtosis — two important concepts in statistics that describe the shape of a distribution.

📈 What is Skewness?

Skewness tells us about the asymmetry of a distribution.

positive skewed:

Tail stretches more on the right. Mean > Median.

negatively skewed:

Tail stretches more on the left. Mean < Median.

Zero skewness:

Perfectly symmetrical (like a normal distribution).

💡 Example in Python:

import scipy.stats as stats
import numpy as np
data = np.random.exponential(scale=2, size=1000)
print("Skewness:", stats.skew(data))

📊 What is Kurtosis?

Kurtosis is a statistical measure that describes the “tailedness” of a distribution — in other words:

How heavy or light the tails of your data are compared to a normal distribution.

Types of Kurtosis:

Mesokurtic

Normal distribution (reference standard)
its value is equal to 3

Leptokurtic

Heavy tails (more outliers); sharper peak
its value is more than 3

Platykurtic

Light tails (fewer outliers); flatter, wider peak
its value is less than 3

🔹 In Python (e.g., scipy.stats.kurtosis()), the default subtracts 3 (so normal = 0).

This is called excess kurtosis.