DEV Community: Anshuman Ojha

The Evolution of Large Language Models: From Rule-Based Systems to Modern AI

Anshuman Ojha — Tue, 29 Jul 2025 12:11:51 +0000

The journey of Large Language Models (LLMs) is a fascinating narrative of continuous innovation in Machine Learning (ML) and Deep Learning (DL). It's a story of moving from rigid rules to nuanced understanding, powered by breakthroughs at every level, from fundamental algorithms to grand architectures.

Phase 1: The Foundations – Rule-Based Systems & Early Statistical Methods

Before the deep learning revolution, language processing was a meticulous craft, often requiring manual engineering.

Rule-Based Systems (1950s-1980s):

Concept: These systems used hand-coded rules to interpret and generate language. Think of them as elaborate flowcharts.

Example: ELIZA, a famous early chatbot, would respond to keywords with pre-programmed phrases. If you typed "I am sad," it might reply, "Why are you sad?"

Contribution: Demonstrated the potential for human-computer interaction, but lacked flexibility and scalability.

Diagram Concept: Imagine a flowchart with decision diamonds and action boxes for every possible linguistic pattern.

Statistical Models (1980s-Early 2000s):

Concept: Instead of rules, these models learned probabilities from data. N-grams were dominant, predicting the next word based on the N previous words (e.g., a trigram predicts the next word based on the two preceding words).

Algorithms:

N-gram Probability Calculation: Counting word sequences in a large corpus to estimate likelihoods (e.g., P(word3 | word1, word2)).

Contribution: More robust to variations in language than rule-based systems, enabling early machine translation and speech recognition.

Limitations: Suffered from the "curse of dimensionality" (too many unique sequences) and couldn't capture long-range dependencies.

Diagram Concept: A chain of words, with arrows indicating probabilities of transitions between them.

Phase 2: The Neural Network Dawn – Understanding Sequence and Context

The emergence of neural networks brought the ability to learn complex patterns and representations from data.

Early Neural Networks (Perceptrons, MLPs - 1980s-1990s):

Concept: Simple interconnected nodes that learn mappings from input to output.

Contribution: Laid the groundwork for more complex neural architectures, but insufficient for sequential language data on their own.

Recurrent Neural Networks (RNNs - 1990s onwards):

Concept: Designed for sequential data, RNNs have a "memory" that allows information to persist from one step to the next. They process words one by one, updating a hidden state that encapsulates previous information.

Algorithms:

Backpropagation Through Time (BPTT): The method for training RNNs by unfolding the network over time.

Contribution: First true sequential models for language, enabling some understanding of context.

Limitations: Prone to vanishing/exploding gradients, making it hard to learn very long-range dependencies.

Diagram Concept: A chain of connected boxes, each representing a time step (word), with a loop indicating information feeding back into itself.

Long Short-Term Memory (LSTMs) & Gated Recurrent Units (GRUs - 1997, 2014):

Concept: Enhancements to RNNs that introduce "gates" (forget, input, output) to control the flow of information, mitigating the vanishing gradient problem. GRUs are a simpler variant.

Contribution: Revolutionized sequence modeling, making tasks like machine translation and speech recognition practical for longer sequences.

Diagram Concept: An RNN cell, but with internal "gates" that control information flow through a "cell state" (a long-term memory component).

Word Embeddings (Word2Vec, GloVe - 2013-2014):

Concept: Instead of representing words as discrete IDs, word embeddings map words to dense, continuous vectors in a high-dimensional space. Words with similar meanings are located closer together in this space.

Algorithms:

Skip-gram/CBOW (Word2Vec): Learning word embeddings by predicting context words from a target word or vice-versa.

Cosine Similarity: A common metric to measure the semantic similarity between two word vectors. The cosine of the angle between two vectors (ranging from -1 to 1, where 1 means identical direction) determines their similarity.

Euclidean Distance: Another metric to measure the "distance" between two word vectors in space. Shorter distances imply greater similarity.

Contribution: Captured semantic relationships between words, providing a much richer input representation for neural networks.

Diagram Concept: A 2D or 3D scatter plot where each point is a word, and semantically similar words (like "king" and "queen") are clustered together. Arrows showing vector arithmetic (e.g., "king" - "man" + "woman" = "queen").

Sequence-to-Sequence (Seq2Seq) with Attention (2014):

Concept: An encoder-decoder architecture where an encoder processes an input sequence into a context vector, and a decoder generates an output sequence from that vector. The attention mechanism allows the decoder to "look back" at relevant parts of the input sequence at each step.

Contribution: Significantly improved performance in tasks like machine translation by allowing models to focus on important parts of the input, rather than compressing everything into a single context vector.

Diagram Concept: Two connected RNNs (encoder and decoder). The encoder reads the input. At each step, the decoder draws attention lines to relevant parts of the encoder's output, with varying strengths (thicker lines mean more attention).

Phase 3: The Transformer Revolution – Parallelism and Scalability

The Transformer architecture marked a fundamental shift, moving away from recurrence and embracing parallelism.

"Attention Is All You Need" (The Transformer - 2017):

Concept: The Transformer completely replaced recurrence with multiple layers of self-attention and feed-forward networks. Each word can directly attend to every other word in the sequence, no matter how far apart, computing "attention scores" that determine their relevance.

Algorithms:

Multi-Head Self-Attention: Computes attention multiple times in parallel, allowing the model to focus on different aspects of relationships within the sequence.

Positional Encoding: Added to word embeddings to retain information about word order since self-attention is permutation-invariant.

Contribution:

Parallelization: Enabled much faster training on GPUs by processing all words simultaneously.

Long-Range Dependencies: Excellently captured complex relationships over long distances.

Scalability: Paved the way for models with billions of parameters.

Diagram Concept: An encoder-decoder block. Inside the encoder: multi-head self-attention and a feed-forward layer. Inside the decoder: masked multi-head self-attention, encoder-decoder attention, and a feed-forward layer. Lines crisscross between words, showing attention weights.

Phase 4: The Era of Large Pre-trained Models – Emergent Abilities

With Transformers, the paradigm shifted to pre-training massive models on vast amounts of unlabelled text, followed by fine-tuning for specific tasks.

BERT (Bidirectional Encoder Representations from Transformers - 2018):

Concept: A bidirectional Transformer encoder pre-trained on two tasks: Masked Language Modeling (MLM) (predicting masked words in context) and Next Sentence Prediction (NSP).

Contribution: Set new benchmarks across diverse NLP tasks by understanding context from both left and right simultaneously. Introduced the power of pre-training on general language understanding.

Diagram Concept: A single Transformer encoder block. Text with some words [MASKED]. Lines showing attention flowing both left and right.

GPT Series (Generative Pre-trained Transformers - 2018 onwards):

Concept: Unidirectional (causal) Transformer decoders pre-trained on causal language modeling (predicting the next word in a sequence).

Contribution: Demonstrated remarkable generative abilities, producing coherent and contextually relevant text. Scaling up these models (GPT-3, GPT-4) revealed emergent abilities like in-context learning, where the model can perform new tasks given only a few examples in the prompt, without explicit fine-tuning.

Diagram Concept: A single Transformer decoder block. Text flowing from left to right. Attention lines only point to previous words, never future words.

T5 (Text-to-Text Transfer Transformer - 2019):

Concept: Framed all NLP tasks as text-to-text problems (e.g., "translate English to German: hello" -> "hallo"). Uses a Transformer encoder-decoder architecture.

Contribution: Unified diverse NLP tasks under a single framework, simplifying model development and achieving strong performance.

Phase 5: The Refinement and Expansion – Safety, Alignment, and Multimodality

As LLMs became more powerful and ubiquitous, focus shifted to making them safe, useful, and capable of handling more than just text.

Reinforcement Learning from Human Feedback (RLHF - 2022 onwards):

Concept: A critical step for aligning LLMs with human preferences and values. It involves:

Supervised Fine-Tuning (SFT): Initially fine-tuning a pre-trained LLM on a dataset of high-quality human-written prompt-response pairs. This makes the model follow instructions better.

Reward Model Training: Training a separate "reward model" to predict human preference scores for different model outputs, based on human rankings.

Reinforcement Learning (PPO): Using the reward model to guide the LLM's learning (often with Proximal Policy Optimization - PPO) to maximize the reward, thereby generating outputs that humans prefer.

Contribution: Greatly improved the helpfulness, harmlessness, and honesty of LLMs, reducing undesirable outputs and aligning them with user intent. This is why models like ChatGPT feel so "conversational" and "helpful."

Diagram Concept: A loop: LLM generates responses -> Humans rate responses -> Reward model learns from ratings -> LLM is updated using RL to maximize reward.

Multimodal LLMs (2023 onwards - e.g., Gemini, GPT-4o):

Concept: LLMs that can process and generate information across multiple modalities – text, images, audio, video.

Contribution: Opens up new applications like image captioning, visual question answering, and speech-to-text/text-to-speech interaction, bringing AI closer to human-like perception.

Diagram Concept: A single core LLM, with input modules for different data types (image encoder, audio encoder) feeding into it, and output modules for generating different data types.

Advanced Reasoning and Agency (Current Research):

Concept: Developing LLMs that can perform complex multi-step reasoning, break down problems, and even plan and execute actions.

Example Algorithms/Techniques:

Chain-of-Thought (CoT) Prompting: Guiding the model to show its step-by-step reasoning.

Tree-of-Thought: Exploring multiple reasoning paths.

Tool Use: Enabling LLMs to call external tools (like search engines, calculators, code interpreters) to augment their capabilities.

Contribution: Moving LLMs beyond just language generation to become more capable problem-solvers and intelligent agents.

Diagram Concept: An LLM engaging in a multi-step process, potentially interacting with external APIs or knowledge bases at various stages.

Conclusion

The evolution of LLMs is a vibrant testament to continuous research and engineering. Each stage built upon the last, leveraging fundamental mathematical concepts like vector spaces for embeddings, probability for statistical models, and advanced optimization for neural networks. The journey continues, with ongoing efforts to make LLMs even more intelligent, reliable, and universally accessible.

My Summer of Bitcoin 2025 Journey: Building, Simulating, and Securing Bitcoin Protocols

Anshuman Ojha — Sat, 10 May 2025 19:54:23 +0000

Author: Anshuman Ojha
Email: anshumanojha91@gmail.com
GitHub: H-ario-m
LinkedIn: Anshuman Ojha

Introduction

I participated in the Summer of Bitcoin 2025 program, where I explored Bitcoin deeply at the protocol level. The experience was hands-on, rigorous, and focused on both Bitcoin Core and Lightning Network internals. I contributed to two distinct projects:

Improving the fee estimation model in LND’s Sweeper subsystem.
Fixing a critical key handover exploit in the Coinswap protocol.

Through the bootcamp and these contributions, I developed technical skills in Bitcoin scripting, descriptor parsing, multisig construction, mempool validation, fee dynamics, and protocol security.

Bootcamp Learnings

Week 1 – RPC Interaction and OP_RETURN Transaction

Goal: Interact with Bitcoin Core RPC, send a transaction with a payment and OP_RETURN.

Technical Approach:

Launched a Bitcoin node in regtest mode using Docker.
Created a wallet using createwallet and generated addresses.
Used generatetoaddress to mine blocks and earn testnet BTC.
Built a raw transaction using createrawtransaction and fundrawtransaction.
Added a secondary output with an OP_RETURN opcode, embedding the string: We are all Satoshi!! as binary.
Signed and broadcasted the transaction using signrawtransactionwithwallet and sendrawtransaction.

Learnings:

Bitcoin Core's RPC system is highly programmable and allows complete control over node behavior.
Embedding metadata in transactions using OP_RETURN provides provably unspendable outputs.
Understanding UTXO funding and fee mechanics is essential for crafting valid transactions manually.

Week 2 – P2SH-P2WSH 2-of-2 Multisig Transaction

Goal: Manually construct a transaction spending from a P2SH-wrapped P2WSH address.

Technical Approach:

Used two private keys and a given redeem script.
Constructed the scriptPubKey by hashing the redeem script and wrapping it in a P2SH.
Built an unsigned transaction and computed the BIP143-style sighash.
Signed using both private keys to produce ECDSA signatures.
Constructed the scriptSig and witness fields:
- scriptSig: contained the serialized redeem script.
- witness: included both signatures and the serialized redeem script.
Serialized the transaction into hex and verified structure manually.

Learnings:

Differentiated between scriptSig (legacy) and witness (SegWit).
Understood how P2SH acts as a compatibility wrapper for newer SegWit scripts.
Practiced signature validation at the byte level and verified redeem script correctness.

Week 3 – Simulated Block Mining

Goal: Mine a valid block from a mempool of transactions, respecting dependencies and fee rates.

Technical Approach:

Parsed JSON-encoded transactions with dependencies.
Topologically sorted them to preserve input-output ordering.
Calculated fee-to-weight ratios and selected optimal transactions under the block size limit.
Created a Merkle root from the selected transactions.
Built a block header including:
- Previous block hash
- Merkle root
- Timestamp, difficulty bits
- Nonce satisfying the PoW target: 0000ffff...
Created a valid coinbase transaction with a dummy miner address.

Learnings:

Built a minimal Bitcoin miner simulator.
Explored real transaction dependency resolution and mempool structure.
Learned how PoW constraints interact with block content.

Week 4 – Descriptor Parsing and Balance Aggregation with Esplora

Goal: Parse a descriptor, derive addresses, and aggregate balance by querying Esplora.

Technical Approach:

Parsed the descriptor: wpkh(tpub.../*) using the bitcoinlib and BIP32 parsing.
Derived child public keys and corresponding Bech32 addresses.
Queried each address’s transactions using the local Esplora API (on port 8094).
Continued deriving until 10 unused addresses were found (gap limit).
Summed all confirmed UTXOs and output balance in BTC.

Learnings:

Gained familiarity with hierarchical deterministic wallets.
Understood the descriptor format and gap limit behavior.
Practiced using REST APIs to interact with Bitcoin indexers like Esplora.

Project 1 – Dynamic Fee Function Optimization in LND Sweeper

Problem

The Sweeper in LND handles time-sensitive transactions such as HTLC timeouts. It currently uses a linear fee increase over time, which:

Performs poorly during fee spikes.
Wastes satoshis in low-fee periods.
Is unconfigurable per use-case urgency.

My Contribution

1. Researched and modeled multiple fee functions:

Model	Characteristics
Linear	Default; simple but inflexible
Exponential	Slow start, steep climb
Stepped	Discrete jumps; easier to predict
Sigmoid	Smooth S-shaped urgency growth
Dynamic	Based on mempool fee estimate feedback

2. Simulation and Evaluation:

Built a test framework in Go that simulated mempool congestion scenarios. Metrics:

Average fee paid
Confirmation reliability
Average time to confirm

Function	Avg Fee	Conf. Rate	Time to Confirm
Linear	62.7	91.5%	10.6 blocks
Dynamic	53.9	97.4%	8.2 blocks

3. Implementation Design:

Made fee functions modular and injectable.
Exposed configuration via RPC.
Built test cases simulating fast confirmation and cost sensitivity scenarios.

Learnings

Learned about fee estimation challenges in Lightning’s context.
Practiced writing performance-aware Go code.
Designed empirical evaluation frameworks for fee dynamics.

Project 2 – Coinswap Protocol: Fixing the Private Key Handover Exploit

Vulnerability

In the original Coinswap settlement flow, the maker sends their private key before the taker sends theirs, based only on receiving the hash preimage. This enabled the taker to:

Obtain maker’s key.
Withhold their own.
Complete swap with the final party while skipping fees to intermediates.

Exploit Example

Taker -> Maker: sends preimage  
Maker -> Taker: sends private key (vulnerable)  
Taker disconnects (never sends key back)

Solution

New Message Flow:

Taker -> Maker: RespHashPreimage  
Maker -> Taker: HashPreimageAcknowledged  
Taker -> Maker: RespPrivKeyHandover  
Maker: Verifies key  
Maker -> Taker: RespPrivKeyHandover

Implementation:

Added HashPreimageAcknowledged enum in protocol message types.
Updated maker/handlers.rs to store intermediate state and delay key sharing.
Implemented key verification before final handover.
Modified taker/api.rs to wait for acknowledgment and send keys before receiving any.
Wrote full unit and integration tests:
- Malicious taker detection
- Cross-maker chain simulation
- Failure injection

Learnings

Identified and fixed a multi-hop atomic swap vulnerability.
Gained experience in secure multi-party message sequencing.
Implemented state machines in Rust with resilience to adversarial behavior.

Final Reflections

Summer of Bitcoin was more than just building code—it was about engaging with Bitcoin’s foundations, identifying real-world challenges, and proposing secure, scalable, and efficient solutions.

Technically, I learned to:

Write and interpret low-level Bitcoin transactions.
Simulate and optimize mempool-based fee strategies.
Secure privacy protocols from adversarial takers.
Use Bitcoin Core, Esplora, Rust, Go, and Docker at a production level.

Personally, I learned to:

Think adversarially about security.
Move from abstract specs to secure implementations.
Collaborate with maintainers and propose upstream fixes.

DEV Community: Anshuman Ojha

The Evolution of Large Language Models: From Rule-Based Systems to Modern AI

Phase 1: The Foundations – Rule-Based Systems & Early Statistical Methods

Rule-Based Systems (1950s-1980s):

Statistical Models (1980s-Early 2000s):

Phase 2: The Neural Network Dawn – Understanding Sequence and Context

Early Neural Networks (Perceptrons, MLPs - 1980s-1990s):

Recurrent Neural Networks (RNNs - 1990s onwards):

Long Short-Term Memory (LSTMs) & Gated Recurrent Units (GRUs - 1997, 2014):

Word Embeddings (Word2Vec, GloVe - 2013-2014):

Sequence-to-Sequence (Seq2Seq) with Attention (2014):

Phase 3: The Transformer Revolution – Parallelism and Scalability

"Attention Is All You Need" (The Transformer - 2017):

Phase 4: The Era of Large Pre-trained Models – Emergent Abilities

BERT (Bidirectional Encoder Representations from Transformers - 2018):

GPT Series (Generative Pre-trained Transformers - 2018 onwards):

T5 (Text-to-Text Transfer Transformer - 2019):

Phase 5: The Refinement and Expansion – Safety, Alignment, and Multimodality

Reinforcement Learning from Human Feedback (RLHF - 2022 onwards):

Multimodal LLMs (2023 onwards - e.g., Gemini, GPT-4o):

Advanced Reasoning and Agency (Current Research):

Conclusion

My Summer of Bitcoin 2025 Journey: Building, Simulating, and Securing Bitcoin Protocols

Introduction

Bootcamp Learnings

Week 1 – RPC Interaction and OP_RETURN Transaction

Week 2 – P2SH-P2WSH 2-of-2 Multisig Transaction

Week 3 – Simulated Block Mining

Week 4 – Descriptor Parsing and Balance Aggregation with Esplora

Project 1 – Dynamic Fee Function Optimization in LND Sweeper

Problem

My Contribution

Learnings

Project 2 – Coinswap Protocol: Fixing the Private Key Handover Exploit

Vulnerability

Exploit Example

Solution

Learnings

Final Reflections

Connect With Me