DEV Community: Fawole Joshua

Why Signature-Based Security Is No Longer Enough To Detect Cyber Attacks And How UEBA Hunts vicious threats.

Fawole Joshua — Fri, 20 Mar 2026 13:18:20 +0000

Introduction

Imagine a national museum that holds thousands of historic antiquities.

The monuments are very precious, worth billions and are very essential to the preservation of the cultural heritage of that nation.

The department of security has the job of preventing unwanted persons from entering the museum. In order to do the job effectively, security personnel have lists and mugshots of criminals. At every entrance, guards check visitors against a book of mugshots actively searching for known criminals, troublemakers and persons of interest.

The identity of anyone entering into the museum is verified. For centuries this worked because a criminal looked like a criminal; they wear suspicious clothes, walk around aimlessly, carry odd bags and lose their temper at the slightest questioning by security guards.

However, criminals and people with malicious intentions evolved. Today's thieves don't look like thieves. They dress sharply, walk with confidence, and never once glance nervously at a security camera. They've done their homework. They know which employees have access to the restricted wings. They've studied their mannerisms, their routines, their faces. And on the day of the heist, they walk straight through the front gate wearing the face of a trusted curator.

Mugshots became useless, posters became increasingly less effective and thieves walked straight through the gate into the museum with little or no stress.

This is the problem with modern cybersecurity, your network is the museum attackers are trying to access, your databases and servers are precious and invaluable artifacts . While your firewalls, antivirus softwares and intrusion detection systems are the security guards with mugshots, lists of criminals and types of attacks expected.

They are checking everything against a list of known-bad signatures, malicious IPs and known attack patterns. This is not effective in any form. Attackers no longer look like attackers, they use stolen credentials, they avoid irrational and suspicious movements, they move stealthily past your rules using different and dynamic techniques and head straight for your data.

This article makes a sole argument: signature-based security alone is insufficient. In a world where attackers constantly change their tools, malware architecture, IP and overall techniques, manually setting rules to detect different types of attacks is not only stressful but also impracticable and very inefficient.

New malware variants appear in their thousands almost every month, there is absolutely no way antivirus software can keep up, attacks like Living-off-the-land use legitimate tools and require no bad signature. Zero-day-exploits are even worse. They are ghosts and leave almost no signature.

This means, attackers have an easy entrance to networks. Furthermore, they can move mountains if they lay their hands on an authorized account and the signature-based security system will not flag it.

The cat and mouse game of find-the-bad-activities is a losing battle. The only sustainable defence mechanism is to know what good activities look like. The defence system needs to know everything about it such that bad activities cannot hide no matter what form it takes.

This is the philosophical shift at the heart of modern threat hunting. It is the undiluted application of machine learning and artificial intelligence in the realm of security. This is the security agency that does not need a list of attacks before it can successfully flag one.

If Jude from the HR department at the museum situated in California suddenly logs in on a Sunday evening from Dubai and starts downloading 678 gigabytes of customer data, we don't need to debate whether the IP is malicious or that the download tool has a signature. It is obvious that this is an unusual activity and has never happened in the space of 5 years that Jude had been working with the Museum, then it will certainly be flagged as an anomaly.

User and Entity Behaviour Analytics (UEBA) Detection: How it Learns.

The government of the country realizes the critical issue at the museum and decided to introduce a special task force to help the security department. This is where UBEA comes in.

User and Entity Behaviour Analytics focuses on studying everyone and everything (humans, their instruments and other factors) this is done in order to establish a ground truth, what we can otherwise term as normal activities. Anything apart from these normal activities are potential threats.

Imagine a new security guard named Owen
(UEBA) assigned to the Museum. For his first three months Owen does nothing but to carefully observe every employee, every visitor, every delivery person. Their times of resumption, exit, levels of access and general mode of conducting their activities.

Owen is not just memorizing facts, he is building a robust infrastructure that will serve as the baseline for evaluating all future activities. This is what UEBA does with your data, it consumes logs from countless sources:

Authentication logs (VPN, Active Directory)
Network flows (NetFlow, DNS queries)
Endpoint logs (process creation, file access)
Application logs (database queries, web server access)
Cloud service logs (Office 365, AWS, Salesforce)


Assuming you've loaded the data, imported the libraries and performed data cleaning and preprocessing 

print("\n[3] Feature Engineering for Behavioral Profiles")

df_behavior = df.copy()
df_behavior['is_attack'] = (df_behavior['Label'] == 'DDoS').astype(int)

# replace inf values first
df_behavior = df_behavior.replace([np.inf, -np.inf], np.nan)

# 1. Packet rate features - add small epsilon to avoid division by zero

df_behavior['packets_per_second'] = df_behavior['Total Fwd Packets'] / (df_behavior['Flow Duration'] + 1e-10)

df_behavior['bytes_per_packet'] = df_behavior['Total Length of Fwd Packets'] / (df_behavior['Total Fwd Packets'] + 1e-10)

# 2. Flag ratios

df_behavior['syn_ack_ratio'] = df_behavior['SYN Flag Count'] / (df_behavior['ACK Flag Count'] + 1e-10) \ (df_behavior['ACK Flag Count'] > 0).astype(int)...

Output

[3] Feature Engineering for Behavioral Profiles
Created new behavioral features:
['packets_per_second', 'bytes_per_packet', 'syn_ack_ratio', 'flag_diversity', 'fwd_bwd_ratio', 'packet_size_variation', 'iat_cv']

From this raw data, UEBA extracts behavioural features and truth for every employee and user. It learns that Jonathan from the accounting department usually logs-in in the morning at 7:35AM, opens the spreadsheet and had never attempted to open the organization's source code repository.

That series of observations is the ground truth, any deviation is tantamount to a breach. The concept of ground truth is perhaps the strongest asset a defender can possess.

How UEBA Works and Prevents Spamming of False Positives.

If UEBA relies on the establishment of the ground truth, does that mean it fluctuates and flags everything that deviates from the ground truth?

Not exactly, however, it only flags but does not report everything. It classifies signals according to a predefined and domain-specific level of seriousness (risk scoring system). This ensures that SOC Analysts are not drowned in threat reports which later turns out to be insignificant or totally non-malicious.

Say, on the 23rd of June, 2025, a thief manages to compromise the account of a young employee at the maintenance department of the museum. Let's call this account “IamCareless600”

Day 1: the account logs in on a Saturday at 10:45PM. This is unusual and the owner has never done this. Owen sees it but doesn't react (maybe it's an emergency, or he just needs to get something).

Day 2: The account logs in on Monday, but instead of heading to the maintenance department, the account went to the restoration lab (a place he had never gone to) the account was denied entrance, it headed again to the administrative block and the entrance was further denied and finally starts making its way to the server room. Access once again denied.

Owen now has weak signals:

An anomalous late night entry
Multiple failed access attempts to restricted areas
A pattern of wandering that does not match any employees normal behaviour

Day 3: IamCareless600 Logs in and immediately tries to gain access to the database, he succeeds this time and starts transferring 500 gig of file to an external IP in a foreign country. The combination of these activities give Owen a strong probable cause.

Owen's machine-learning-powered brain correlates the signals: [Anomalous entry time] + [Multiple failed access attempts to restricted areas] + [First-time server access] + [Massive data exfiltration] = COMPROMISED ACCOUNT. Owen doesn't raise a generic alarm. He runs to the security team with a precise report.

The thief is caught in the act, halfway through stealing the museum's most precious records. This is threat hunting. This is the difference between waiting for an alarm and actively watching out for strange activities.


print("\n[5] Detecting Anomalous Ports with Isolation Forest")

port_features = ['total_flows', 'avg_flow_duration', 'avg_packet_size', 'avg_packet_rate', 'syn_ack_ratio', 'packet_size_std']

X_port = port_df[port_features].copy() 
# Handle any remaining infinite or NaN values
X_port = X_port.replace([np.inf, -np.inf], np.nan) X_port = X_port.fillna(X_port.mean())

from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
X_scaled = scaler.fit_transform(X_port)

expected_contamination = port_df['is_malicious_port'].mean()
print(f"Expected contamination: {expected_contamination:.2%}")

from sklearn.ensemble import IsolationForest
iso_forest = IsolationForest(n_estimators=100, contamination=expected_contamination, random_state=42, bootstrap=False)  # Disable bootstrap to avoid issues
iso_forest.fit(X_scaled)

port_df['anomaly_score'] = iso_forest.decision_function(X_scaled)
port_df['predicted_anomaly'] = (iso_forest.predict(X_scaled) == -1).astype(int)

print("\nTop 20 most anomalous ports:")
anomalous = port_df.sort_values('anomaly_score').head(20) 

print(anomalous[['port', 'total_flows', 'attack_rate', 'anomaly_score', 'attack_rate', 'anomaly_score', 'is_malicious_port']].to_string())

Practical Implementation of UEBA Using CIC-IDS-2017 Dataset as a Case Study

We conducted a practical implementation of UEBA on a collection of network traffic containing both normal activity and DDoS attacks.

Owen was able to find the needle in the haystack, using an isolation forest to detect anomaly. We discovered that port 80, the web server, was drowning in a traffic attack: 136,951 flows, 93% of them malicious. The volume was 4.1 standard deviations above normal. The packet sizes were also 3.4 standard deviations above normal. The probability of this happening by chance is less than 1 in 35 million.

Owen does not need to be aware of how the attack was made or what name was it called, all he cares about is that “This is an unusual activity, a dangerous one at that and must be instantly stopped”. Here is the link to the comprehensive code https://github.com/Akanji102/DDoS-anomaly-detection-using-Isolation-forest

Classification Report:

precision recall f1-score

Normal Port 1.00 1.00 1.00 19

Malicious Port 1.00 1.00 1.00 1

Accuracy: 100%

ROC-AUC: 1.00

Matthews Correlation Coefficient: 1.00

Although the dataset was generated under a constrained environment and was essentially made for educational purposes, it actively demonstrates how Isolation forest, other unsupervised algorithms and deep learning networks can detect strange activities and any form of attack, old or new.

Tools and Techniques UEBA Utilizes to Hunt Threats.

Owen isn't a single model, he is an ensemble of different algorithms and tools, all working together to classify behaviours and detect anomalies. Some of them include

(1). Temporal analysis:

Temporal analysis (time series analysis) models every user and entity's activity as a pattern over time. It doesn't just track what they do, but when they do it and in what sequence. It detects unusual login times, modified work patterns and sequence violations.

The algorithms here range from statistical methods like Seasonal ARIMA (which captures weekly patterns) to deep learning approaches like LSTMs (which excel at learning sequences).

(2). Graph Analysis:

UEBA as a defence system does not see every account or network as a single node that only needs to be studied independently, it sees them as a giant dynamic web of connections.

It detects data exfiltration, lateral movements and insider collusion. When a Clerk suddenly starts interacting and sending huge data files to businessmen in the Middle East or a group of connections keep moving with malicious intents, they are instantly studied as a whole. UEBA traces the underlying relationships between every single network in order to detect fraud.

The magic here lies in algorithms like Community Detection (which automatically finds groups that normally work together) and Graph Neural Networks (which learn to spot structural anomalies).

(3). Statistical analysis:

The Volume Detective tracks quantities, data volumes, file counts, action frequencies. It builds what “normal” should look like. A marketing intern might possibly download an average of 200mb of data per day while a video editor might go as far as 10gig.

It detects massive downloads (a video editor normally should not download salary dataset or download a dataset of 500 gig). It uses models like Gaussian distribution, moving averages, exponential smoothing etc.

(4). Unsupervised Learning:

This is the instrument for setting the final ground truth. It is the sorting of all of the data into clusters in order to find the odd ones. It creates the truth and what should be avoided.

It includes models like KMeans, DBScan, Hierarchical Clustering and a very good one is also Isolation forest.

Things to note before implementingUEBA

(1). Data Quality:

UEBA as a defence system is only as good as the one on which it was built. The data should be clean, realistic and gathered with a good pipeline. Bad data automatically connotes bad UEBA.

(2). Cold Start:

As previously explained, UEBA needs a significant amount of time to gather knowledge and establish the base line. Therefore, in the first few weeks of setting it up, the signals will be noisy and will not generate favourable results but it will increase in accuracy over time.

(3). Concept Drift:

Companies are not static, roles change and so do policies, it is therefore recommended that UEBA models be retrained after major drifts in order to ensure accuracy

(4). Signature-based and human-in-the-loop:

UEBA doesn't replace signature-based threat detection, it enriches it. The system does not replace human analysts as well, it empowers them. Human hunters investigate, confirm or refute, and provide feedback that closes the loop and improves future detection. This symbiosis is essential.

Conclusion

There are two options available for the hypothetical scenario presented in this article: the museum can add more guards and maintain its rule-based system or it could fundamentally rethink its approach, shifting from reactive detection to proactive behavior monitoring.

The same applies to the security of your company's data, UEBA gives flexibility, dynamic reaction and cautious proactiveness. It is currently one of the strongest defence mechanisms against cyber attacks.

How AI Learns: Gradient Descent Explained Through a Midnight Smoky Jollof Adventure

Fawole Joshua — Tue, 16 Dec 2025 07:59:24 +0000

Many aspects of the modern world are now powered by artificial intelligence, and this has significantly accelerated human civilization.

From faster disease detection to automated decision-making. From breakthroughs in medical imaging to the quiet and rapid adoption of artificial intelligence in law firms and the entire judicial system. Artificial intelligence is actively reshaping the future of agriculture and its impact can be felt across nearly all sectors.

Yet, despite this tremendous progress, many people do not actually understand where artificial intelligence gets its brilliance from. AI's ability to identify errors and iteratively improve is certainly amazing.

This article will gently hold you by the hand and explain the true superpower behind AI and machine learning.

The answer lies in a simple mathematical algorithm called Gradient Descent.

What is Gradient Descent?

Gradient descent can be explained as a general-purpose mathematical algorithm that is capable of finding the best solutions to a very wide range of problems. In machine learning, it works by rapidly updating parameters to quickly minimize a loss (or cost) function.

In very simple terms, Gradient descent helps AI figure out how wrong it is and how to quickly become less wrong.

To explicitly understand what gradient descent is, its complexities and purpose, we can look under the hood and reason like an AI model.

Midnight Smoky Jollof Adventure

Say you went for Thanksgiving and your mom cooked a special and very taste-bud-pleasing Nigerian jollof. Thanksgiving was perfect, you reconnected with your siblings and then everyone went to bed. But in the middle of the night, your brain and tongue just kept craving more, the smoky jollof rice was so tantalising that you could smell it several feet away.

You resisted the feeling but it got the better side of you and so you stood up and started making your way to the kitchen. But here is the problem, the lights are off, you can't see a thing. You don't want to get caught, nor do you want to fall off something.

Imagine the house floor as a graph paper.

X-axis = left-right position

Y-axis = forward-backward position

Your location = coordinates (X, Y)

You are currently at point (1, 1)

The Loss Function

We need to find a way to measure how close we are to the jollof rice.

Normal distance formula:

Distance = √(x - 3)²+(y - 4)²

Let's just use squared distance:

Loss(x, y) = (x - 3)²+(y - 4)²

This loss is very important, it will be our compass to get to the jollof rice, it will show how far-off (wrong) we are.

The higher the loss, the more wrong we are (i.e we are very far off from the kitchen). Therefore, our goal is to greatly reduce the loss function so that we can reach the kitchen and the jollof rice.

At starting point (1, 1):

Loss = (1 - 3)² + (1 - 4)² = (-2)² + (-3)² = 4 + 9 = 13. This means we are very far from the kitchen.

Testing Directions

Then let's tweak the parameters a little:

From (1, 1) to (1.001, 1)

New loss: (1.001 - 3)² + (1 - 4)² = (-1.999)² + (-3)² = 3.996 + 9 = 12.996

The old loss was 13, now the new loss is 12.996 (decreased by 0.004, we are making progress!)

Then let's say we tweak the parameters even more. From (1, 1) to (1, 1.001):

New loss: (1 - 3)² + (1.001 - 4)² = (-2)² + (-2.999)² = 4 + 8.994 = 12.994 (getting closer)

The Mathematical Shortcut

Instead of testing each direction, we can take a mathematical shortcut (find the derivative):

For loss = (x - 3)² + (y - 4)²

How loss changes with x:

If we change x by ∆x, loss changes by approximately:

2 * (x - 3) * ∆x

Why? This is because the derivative of (x - 3)² = 2(x - 3).

So at x = 1:

2 * (1 - 3) = 2 * (-2) = -4

This means that for every tiny step right, loss decreases by 4 times that step size.

How loss changes with y:

2 * (y - 4) * ∆x

At y = 1:

2 * (1 - 4) = 2 * (-3) = -6

For every step forward, loss decreases by 6 times that tiny step size.

The Gradient Vector

We put these together into a gradient vector:

Gradient = [-4, -6]^T

To always update our position, we need to adopt a movement sequence or otherwise called a learning rate (η = 0.1).

The learning rate must not be too slow or small ( we don't want to take forever) nor should it be too fast or large ( we don't want to fall or overshoot).

Now our movement formula will be:

New position = old position - η * Gradient

x-new = 1 - 0.1 * (-4) = 1 + 0.4 = 1.4

y-new = 1 - 0.1 * (-6) = 1 + 0.6 = 1.6.

We just moved from (1, 1) to (1.4, 1.6).

Old loss at (1, 1) = 13

New loss at (1.4, 1.6) = (1.4 - 3)² + (1.6 - 4)² = (-1.6)² + (-2.4)² = 2.56 + 5.76 = 8.32

We just improved from a loss of 13 to only 8.32, this is great progress and we are certainly close to the kitchen now.

Next Iterations

As our little journey continues we compute the next gradients:

Now at (1.4, 1.6):

For x: 2 * (1.4 - 3) = 2 * (-1.6) = -3.2

For y: 2 * (1.6 - 4) = 2 * (-2.4) = -4.8

Gradient = [-3.2, -4.8]^T

x-new = 1.4 - 0.1 * (-3.2) = 1.4 + 0.32 = 1.72

y-new = 1.6 - 0.1 * (-4.8) = 1.6 + 0.48 = 2.08

Loss at (1.72, 2.08): (-1.28)² + (-1.92)² = 1.6384 + 3.686 = 5.3248

Loss dropped from 8.32 to 5.32. Congratulations, you are now at the kitchen door!

With a couple more iterations, you will have reached the global optimum, this is certain because your loss function is convex and gradient descent is guaranteed to converge (your goal: the lowest loss, little to no error).

In essence, gradient descent measures the local gradient of the error function with regard to the parameter vector θ and it goes in the descending gradient. Once the gradient is zero, you have reached the minimum! (or more precisely, a critical point, which could be a minimum, maximum, or saddle point).

In Real Machine Learning

Instead of 2 parameters (x, y), there are millions or billions (weights in a neural network).

Instead of "squared distance," they use losses like Cross-Entropy or Mean Squared Error.

Instead of one perfect pot, they navigate a complex, multi-dimensional "loss landscape" with hills, valleys, and plateaus.

But the core algorithm, the relentless optimization engine, remains Gradient Descent and its smarter variants (Adam, RMSProp).

This is exactly how gradient descent works and how artificial intelligence can learn patterns and improve its predictions.

Types of Gradient Descent

Batch Gradient Descent

This is the process whereby all training examples are utilized to compute the gradient, then take one update step!

θ_new = θ_old - η * (1/m) * Σ(∇L(θ, x_i, y_i))

Where:

m = total number of training examples
η = learning rate
∇L = gradient for example

Stochastic Gradient Descent

While Batch gradient descent uses the whole training data to compute the gradient at every step which eventually greatly slows down computation, stochastic gradient descent on the other hand only picks a random instance in the training set at every step and then computes the gradients based on that single instance.

This makes the algorithm much faster but also noisier. Due to its stochasticity, SGD's stochastic noise can help it escape some local minima and will also end up very close to the global optimum but with a constant learning rate, it oscillates around the minimum rather than converging exactly.

For each random example i:

θ = θ - η * ∇L(θ, x_i, y_i)

Mini-Batch Gradient Descent

This is a system where a small batch, probably 16 or 32 is used to compute the gradient, then update is initiated. It is like the sweet spot between SGD and BGD.

For each batch B of size b:

∇L_batch = (1/b) * Σ ∇L(θ, x_i, y_i) for i in B

θ = θ - η * ∇L_batch

Conclusion

Understanding how gradient descent works is very profound and points to the very fact that artificial intelligence and its system of learning isn't about being perfect from the very beginning, it's about having a reliable method to quickly and accurately become less wrong.

This is how AI learns, it could also be instrumental in how humans function as well, psychologists often mention that every human being should have a reflective/mediation time in order to reason what went wrong and how to fix it. Gradient descent is somehow a link between artificial intelligence and the human race.