Costs, DoS Risks, and Instance vs Persistent Data Types in Soroban

In the development journey of the Soroswap.Finance protocol, critical decisions regarding data types for Smart Contracts were made. The choice of storage impacts the risk of Denial of Service (DoS), increases contract interaction costs, or introduces both risks simultaneously.

This article explores the advantages and drawbacks of four design patterns for storing an increasing amount of information on the Soroban blockchain. Additionally, it delves into the associated costs for each scenario.

Context: Soroswap.Finance is an Automated Market Maker in the Soroban Smart Contract Platform within the Stellar Blockchain, developed by the PaltaLabs 🥑 team.

Follow the code here: Instance-Persistent-Dos-Soroban

TLDR;

This article demonstrates that:

Instance data types share a common Ledger Entry, leading to contract failure if the total stored information reaches 64kb.
Instance storage Ledger Entry is independent of contract size, with a fixed 64kb reserved regardless of contract dimensions.
Instance data types are read on every interaction, making any contract interaction more costly.
Unbounded data storage in Vectors or Mappings risks reaching 64kb and becoming vulnerable to DoS attacks.
The variable DataKey technique is the recommended approach for storing unbounded data.

Instance and Persistent Data Types.

From Soroban documentation, we know that:

Instance storage has its data limit determined by the ledger entry size (Source).
All instance storage is kept in a single contract instance LedgerEntry with a 64kb size (Source).
The Ledger Entry Size is capped at 64kb (Source).

The Challenge: Store Unbounded Information

The example challenge involves creating a Smart Contract that stores two collections of an increasing amount of 32-byte addresses, representing buyers and sellers. Two separate collections are used, as the choice of data type can impact one collection when the other grows.

Externally, interactions with this contract are intended to follow this pattern for each collection:

Call a function to store a new address (number n).
Retrieve the address for a given number n.

Indeed, tests for each design are the same, and they all pass! So be aware dev, passing your test does not means that your code is safe!

In the following sections, four design patterns will be explained, with only one proving to be DoS-free and not increasing the cost of interacting with the contract.

Design 1: Store a Vector in an Instance Data Type. The Case of a Light Smart Contract.

With this technique, a vector is stored in an instance storage slot. Each time a new element is pushed, the current vector value is read, the new element is added, and the updated vector is written back to the same instance storage slot. The code looks like this:

let mut vector: Vec<Address> = env.storage().instance()
            .get(&VECTOR_A).unwrap_or(Vec::new(&env)); // If no value set, assume an empty vector.

// Push the current contract address in the vector
vector.push_back(env.current_contract_address().clone());

// Save the updated vector to instance storage
env.storage().instance().set(&VECTOR_A, &vector);

A DoS attack simulation reveals that this design causes the contract to fail with a ResourceLimitExceeded error after the sum of the collections reaches 64kb, equivalent to 818 pushes of addresses on each collection. Further details, including code, simulation, and results, can be found in the repository. Additionally, this design increases the cost of every call to the smart contract, even for unrelated functions.

Design 2: Store a Vector in an Instance Data Type.. The Case of a Heavy Smart Contract.

Similar to Design 1, this approach demonstrates that instance storage is stored in an independent LedgerEntry from the Smart Contract itself. With this design, the cost of reading the contract increases with every push operation, even for unrelated functions. This design also results in a ResourceLimitExceeded error.

Design 3: Store a Vector in a Persistent Data Type..

This design involves storing a vector in a persistent data type.

let mut vector: Vec<Address> = env.storage().persistent()
            .get(&VECTOR_A).unwrap_or(Vec::new(&env)); // If no value set, assume an empty vector.
vector.push_back(env.current_contract_address().clone());
env.storage().persistent().set(&VECTOR_A, &vector);

While it avoids sharing storage size with other variables, it is still limited to 64kb of information. Details of an attack simulation show that it allows reaching twice the number of entries compared to the previous examples. However, the cost of reading the contract does not increase for unrelated functions, as persistent data type is not retrieved every time the contract is called.

Design 4: Use Variable DataKeys with Persistent Data Type..

This design introduces the Variable DataKey technique, where the name of the storage slot depends on a parameter. The DataKey is defined as follows:

#[contracttype]
pub enum DataKey {
    StoredAddressesA(u32),
}

This technique is commonly used to store an increasing amount of data, as seen in the token contract for storing user balances. The information is stored as follows:

let mut count: u32 = env.storage().instance().get(&COUNTER_A).unwrap_or(0); 
env.storage().persistent().set(&DataKey::StoredAddressesA(count), &env.current_contract_address().clone());
count += 1;
env.storage().persistent().set(&COUNTER_A, &count);

This design, including the storage of a COUNTER to track the number of stored addresses, is considered the best option. It successfully avoids DoS attacks and minimizes the cost of reading the contract.

Using u32 provides 2^32 different storage slots for StoredAddressA, which is more than sufficient. Even if an attacker were to call the push function 2^32 times, the associated cost is calculated to be around 439,289 stroops (0.044 XLM). This cost limitation makes it practically infeasible for an attacker, even with the entire Total Supply of XLM (50,001,787,051), to add more than 2,196,523,503 entries (2^21).

Design 5: Use Variable DataKeys with Instance Data Type.

Lastly, using Variable DataKeys as a solution with instance data type is a mistake. This design is as inefficient as using a vector because storing in different slots shared in a single Ledger Entry is almost equivalent to storing everything in one slot, similar to a vector. This situation can lead to a DoS attack and result in a ResourceLimitExceeded error.

Contact us:

Did you like the article? Contact us in our Discord https://discord.gg/HFkBquZNNg or reach us in https://paltalabs.io

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Top comments (2)

Miguel Nieto A • Feb 15 '24

What a great contribution, thanks! ❤️

Just a comment: It seems that the section named "Design 4: Use Variable DataKeys with Persistent Data Type..." uses instance instead of persistent storage.