DEV Community

Cover image for Analyzing the HamsterWheel: An Advanced State Management Vulnerability in Sui's Blockchain Architecture
CertiK
CertiK

Posted on

Analyzing the HamsterWheel: An Advanced State Management Vulnerability in Sui's Blockchain Architecture

CertiK’s Skyfall team identified and disclosed a series of denial of service vulnerabilities in the Sui blockchain. Among these vulnerabilities, a new type of bug stood out due to its critical severity implications which could cause the Sui network to not be able to process new transactions, effectively causing a total network shutdown. This unique attack, different from previous known ones, allows an attacker to induce an infinite loop in the validator node by merely submitting a small payload of approximately 100 bytes. Moreover, this attack creates persistent damage that endures even after the validator network reboots. We've dubbed this unique type of attack "HamsterWheel.”

Upon discovery, we responsibly reported this vulnerability to Sui through their bug bounty program. Sui's response was prompt and efficient. They confirmed the critical severity of the vulnerability and took steps to address the issue before the network’s mainnet launch. In addition to fixing this particular vulnerability, Sui also implemented preventative mitigations to lessen the potential damage caused by exploitation.

In appreciation for the responsible disclosure, Sui network awarded a $500,000 bounty award to CertiK Skyfall team.

In this blog post, we will disclose the technical aspects of this critical vulnerability, shedding light on its root cause and potential impact.

Image description

The Vulnerability In Detail

The Critical Role of Verifiers in Sui

In Move-based blockchains, such as Sui and Aptos, the safeguarding mechanism against malicious payloads rests heavily on static verification techniques. These techniques inspect user-submitted payloads before a contract is either published or upgraded on the chain. With checkers ensuring both structural and semantic correctness, the Move runtime presumes their soundness.

unnamed (100)
Figure 1: The threats of malicious payloads on Move-based chains

Sui implements a memory model that is distinct from the original Move design, deploying a customized version of Move VM for contract development. Sui goes the extra mile in strengthening its safety measures against malformed payloads. It introduces additional, customized verifiers, dubbed as Sui verifiers. These specialized verifiers cater to Sui's unique features, such as object safety and global storage access safety, among others.

unnamed - 2023-06-18T124811.420
Figure 2: Sui's sequence of checks against the payload

Most verifiers conduct structural assessments against the CompiledModule, the runtime representation of user-provided contract payloads. For instance, duplication-checker ensures no duplicate entries in each section, while limits-checker affirms the upper bounds of entries allowed in each section. However, more complex analyses are necessary for static checking to guarantee the semantic soundness of the untrusted payloads.

Understanding Move's Abstract Interpreter: Linear and Iterative Analysis

The Abstract Interpreter, furnished by Move, is a framework specially designed for executing complex security analysis on bytecode via abstract interpretation. This mechanism enables a more refined and accurate verification process, with each verifier being allowed to define their unique abstract state for the analysis.

Beginning its operation, the Abstract Interpreter constructs Control Flow Graphs (CFGs) from compiled modules. Each Basic Block within these CFGs maintains a set of states, namely, pre-state and post-state. The pre-state offers a snapshot of the program prior to the execution of a Basic Block, while the post-state provides an image of the program after the Basic Block's execution.

When the Abstract Interpreter encounters no backedge (or loop) in the Control Flow Graph, it follows a simple linear execution. Each BasicBlock is analyzed in sequence, with the pre-state and post-state calculated based on the semantics of each instruction within the block. The result is an accurate snapshot of the program's state at each point in the execution, which aids in verification of the program's safety properties.

unnamed - 2023-06-18T124911.955
Figure 3: Move Abstract Interpreter’s workflow

However, when loops are present in the control flow, the process becomes more complex. Backedges, which signal the presence of loops, require the Abstract Interpreter to carefully manage the merging of states. This is due to the interdependence between the state at the start of the loop (the pre-state of the loop header) and the state at the end of the loop (the post-state of the loop footer).

In handling backedges, the Abstract Interpreter meticulously merges the pre-state of the target Basic Block with the post-state of the current Basic Block. If discrepancies are detected in the resulting merged state, the Abstract Interpreter initiates a re-analysis, beginning from the target Basic Block with the updated merged state.

This iterative analysis process continues until the loop's pre-state stabilizes. In other words, the process is repeated until the pre-state of the loop header no longer changes between iterations, indicating that a fixed point has been reached and the analysis of the loop is complete.

Sui’s IDLeak Verifier: Customized Abstract Interpretation Analysis

Sui's blockchain platform introduces a unique object-centric global storage model, which differentiates it from the original Move design. A distinctive feature of this model is that any struct with a 'key' ability must initiate with an ID field with the ID type. The ID field is immutable and cannot be transferred to other objects, as each object must have a unique ID. To ensure this, Sui utilizes custom analysis built upon the Abstract Interpreter.

/// Sui object identifiers
module sui::object {
    ...
    struct UID has store {
        id: ID,
    }
    struct ID has copy, drop, store {
        bytes: address
    }
}

/// Sui object example
struct A has key {
    id: UID,
    b: B,
}
Enter fullscreen mode Exit fullscreen mode

The IDLeak Verifier, alternatively known as the id_leak_verifier, operates in concert with the Abstract Interpreter to conduct its analysis. It forms its distinct AbstractDomain, termed as AbstractState, to oversee the status of each local variable using an AbstractValue. This value denotes whether the ID is fresh. During the process of struct packing, the IDLeak Verifier checks that only a fresh ID is allowed to be packed into a structure. It meticulously traces the data flow of local states to ensure that no existing ID is transferred to other objects.

The State Management Inconsistency in Sui IDLeak Verifier

The IDLeak Verifier integrates with the Move Abstract Interpreter by implementing the AbstractState::join function. This function plays integral roles in state management, notably in merging and updating state values. Let's examine these functions in detail to understand their operation.

enum AbstractValue {
    Fresh,
    Other,
}

pub(crate) struct AbstractState {
    locals: BTreeMap<LocalIndex, AbstractValue>,
}

impl AbstractDomain for AbstractState {
    /// attempts to join state to self and returns the result
        fn join(
        &mut self,
        state: &AbstractState,
        _meter: &mut impl Meter,
    ) -> Result<JoinResult, PartialVMError> {
        let mut changed = false;
        for (local, value) in &state.locals {
            let old_value = *self.locals.get(local).unwrap_or(&AbstractValue::Other);
            changed |= *value != old_value;  // determines the return value.
            self.locals.insert(*local, value.join(&old_value));  // update state value.
        }
        if changed {
            Ok(JoinResult::Changed)
        } else {
            Ok(JoinResult::Unchanged)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

In AbstractState::join, the function takes another AbstractState as input and attempts to merge its local state with the current object's local state. For each local variable in the incoming state, it compares the variable's value to its current value in the local state (with a default value of AbstractValue::Other if not found). If the two values are unequal, it sets a 'changed' flag and updates the local variable's value in the local state by calling AbstractValue::join.

impl AbstractValue {
    pub fn join(&self, value: &AbstractValue) -> AbstractValue {
        if self == value {
            *value
        } else {
            AbstractValue::Other
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

In AbstractValue::join, the function compares its value with another AbstractValue. If they are equal, it returns the incoming value. If not, it returns AbstractValue::Other.

However, the conjunction of these functions can lead to a potential inconsistency. Although AbstractState::join may indicate a change (JoinResult::Changed) due to differing old and new values, the state value after the update might remain the same. This anomaly occurs due to the order of operations: the determination of changed status in AbstractState::join happens before AbstractValue::join, which doesn't reflect the genuinely updated state value. Besides, in AbstractValue::join, AbstractValue::Other dominates the joined result. Thus, if the old value is AbstractValue::Other and the new value is AbstractValue::Fresh, the updated state value remains unchanged as AbstractValue::Other.

Screenshot 2023-06-18 at 12.51.59 PM
Figure 4: A concrete example illustrating the inconsistency in State Join

This introduces an inconsistency where the Basic Block state is marked as changed, but the state value itself remains unchanged. Such inconsistency can potentially have significant implications. To understand why, it's crucial to recall the abstract interpreter's behavior in the presence of a loop within the Control Flow Graph (CFG). When encountering a loop, the abstract interpreter employs an iterative approach to merge the target (backedge) and current state. If the joined state changes as a result of this merge, the abstract interpreter triggers a reanalysis.

However, if the join operation falsely flags the state as changed, when in reality no change in value has occurred, it could lead to an endless cycle of reanalyses.

Beyond the Inconsistency: Triggering Infinite Analysis in Sui IDLeak Verifier

Leveraging this inconsistency, an attacker could craft a malicious control flow graph that tricks the IDLeak verifier into an infinite loop. The crafty control flow graph would consist of three basic blocks, BB1 and BB2, BB3. We intentionally introduce a backward edge from BB3 to BB2.

unnamed - 2023-06-18T125253.804
Figure 5: Malicious CFG + State construction that can cause deadloop inside IDLeak verifier

The process starts with BB2, where the AbstractValue for a particular local variable is set to ::Other. After executing BB2, the flow moves to BB3, where the same variable is set to ::Fresh. At the end of BB3, there's a backward edge that leads back to BB2.

Here's where inconsistency plays a crucial role. When the backward edge is processed, the AbstractInterpreter attempts to join the post-state of BB3 (where the variable is 'Fresh') with the pre-state of BB2 (where the variable is 'Other'). The AbstractState::join function notices the difference and sets the 'changed' flag, signaling that reanalysis of BB2 is necessary. However, the dominating behavior of 'Other' in AbstractValue::join means that the actual state of the variable remains 'Other'.

So, when the verifier continues to reanalyze BB2, along with all its successors (BB3 in this case). This looping process, once initiated, continues indefinitely. The verifier consumes all available CPU cycles, effectively causing a deadlock in transaction processing – a situation that persists even after a validator reboot. This cleverly crafted vulnerability, which we've termed the "HamsterWheel" attack, effectively brings the Sui validator to a halt.

With the conceptual attack scenario delineated and all pre-assumed verifier checkers validated, we successfully demonstrate the exploit by constructing a concrete example using the following Move bytecodes:

code = vec![
    // BB0: offset 0    locals = () 
    CopyLoc(1), 
    Call(FunctionHandleIndex::new(2)),      // new_id, stack.top = ::Fresh
    Pack(StructDefinitionIndex::new(0)),    // consume the fresh ID
    Unpack(StructDefinitionIndex::new(0)),  // unpacked id, stack.top = ::Other
    StLoc(2),                               // locals (2 => ::Other)
    Branch(6), 

    // BB1: offset 6
    MoveLoc(2), 
    Call(FunctionHandleIndex::new(1)),   // delete_id, discard local 2
    CopyLoc(1), 
    Call(FunctionHandleIndex::new(2)),   // prepare tmp_object for BB transition
    Pack(StructDefinitionIndex::new(0)), 
    StLoc(3),
    Branch(13),  

    // BB2: offset 13
    MoveLoc(3),
    Unpack(StructDefinitionIndex::new(0)),  // unpacked id, stack.top = ::Other
    Call(FunctionHandleIndex::new(1)),      // delete id, discard the tmp_object
    CopyLoc(1), 
    Call(FunctionHandleIndex::new(2)),      // new id,  stack.top = ::Fresh
    StLoc(2),                               // locals (2 => ::Fresh)
    Branch(6) 
];
Enter fullscreen mode Exit fullscreen mode

The demonstration highlights the vulnerability in practice. It showcases how, through the careful crafting and manipulation of bytecodes, an attacker can trigger an infinite loop in the IDLeak verifier, where a mere 100 byte payload can consume all available CPU cycles, effectively blocking the processing of new transactions and resulting in a denial of service on the Sui validator.

The HamsterWheel Attack Causes Persistent Damage to the Blockchain Network

According to the bug bounty conditions set by Sui, for a vulnerability to reach the level of critical severity, it must exhibit a capacity to bring the entire network to a halt, impeding new transaction confirmations and requiring a hard fork for resolution. Lesser impacts, such as partial denial of service, can at best qualify as medium or high severity based on the parameters of the bug bounty program.

The HamsterWheel attack discovered by the CertiK Skyfall team carries an immense threat due to its potential to shut down the entire Sui network, an impact that earns it the classification of critical severity. To understand the gravity of this flaw, it is necessary to comprehend the intricate architecture of Sui's backend system, specifically the steps leading up to the publishing or upgrading of a transaction on the chain.

unnamed - 2023-06-18T125407.553
Figure 6: Outline of interactions to commit a transaction in Sui

Initially, user transactions are directed towards the Sui authorities via a Frontend RPC and pre-validation process. The Sui authorities are responsible for validating the incoming transactions. Following successful validation of the user's signature, the transactions are converted into the formation of transaction certificates.

These certificates, integral to the functioning of the network, are then propagated across the validator nodes. Before the transaction can be published or upgraded on the chain, the validator nodes scrutinize these certificates for validity. It's during this crucial stage of verification that the discovered 'deadloop' vulnerability can be exploited.

When the flaw is triggered, it leads to an indefinite hang in the verification process. It effectively hampers the ability of the system to process new transactions, causing a total network shutdown. The criticality of this vulnerability is further magnified as it persists even after a validator reboot, meaning conventional mitigations are insufficient. The exploitation of this vulnerability hence leads to a 'persistent damage' scenario, leaving a lasting impact on the entire Sui network.

Sui’s Approach to Mitigation

In response to the vulnerability, Sui confirmed the vulnerability in a timely manner and released a fix to address the critical flaw. The fix ensures consistency between the state change and the changed flag, eliminating the critical impact caused by the critical HamsterWheel attack.

// sui-verifier/src/id_leak_verifier.rs
impl AbstractDomain for AbstractState {
    /// attempts to join state to self and returns the result
    fn join(
        &mut self,
        state: &AbstractState,
        _meter: &mut impl Meter,
        meter: &mut impl Meter,
    ) -> Result<JoinResult, PartialVMError> {
          let mut changed = false;
       for (local, value) in &state.locals {
            let old_value = *self.locals.get(local).unwrap_or(&AbstractValue::Other);
-           changed |= *value != old_value;
-           self.locals.insert(*local, value.join(&old_value));
+           let new_value = value.join(&old_value);
+           changed |= new_value != old_value;
+           self.locals.insert(*local, new_value);
        }
     ...
    }
}
Enter fullscreen mode Exit fullscreen mode

In order to eliminate the aforementioned inconsistency, Sui's fix consisted of a minor but crucial adjustment to AbstractState::join function. Instead of determining the changed result prior to the execution of AbstractValue::join, this fix ensures that the AbstractValue::join is carried out first. Subsequently, the changed flag is set by comparing the result of AbstractValue::join with the old_value. This way, the changed flag correctly represented whether a change was made in the final state value.

In addition to fixing this specific vulnerability, Sui also deploys mitigations to reduce the impact of future verifier vulnerabilities. According to Sui’s reply in the bug report, the mitigation is involved with a feature called Denylist.

“However, validators have a node config file that allows them to temporarily denylist certain classes of transactions. This config can be used to temporarily disable processing publishing and package upgrades. Since the bug happens while running the Sui verifier before signing a publish or package upgrade tx, and denylisting will stop the verifier from being run + drop the malicious tx on the floor temporarily denylisting these tx types is a 100% effective mitigation (though it will temporarily interrupt service for folks attempting to publish or upgrade code).

As a side note, we have had this tx denylist config file for awhile, but we also added one for certificates as a follow up item to the "validator deadloop" bug you previously reported. With this mechanism in place, we would be much more resilient to that attack: we would use the certificate denylist config to make validators forget about the bad cert (breaking the crash loop), and the tx denylist config to disable publishing/upgrades and thus prevent creation of new transactions of death. Thanks for making us think about this!”

Validators have a finite number of "ticks" (different from gas) to spend on bytecode verification before signing a transaction, if all bytecode being published in a transaction cannot be verified in that many ticks, the validator will refuse to sign the transaction, preventing it from being executed on the network. Previously, metering only applied to a chosen set of complicated verifier passes. In response to this issue, we extended metering to every verifier pass, to guarantee a bound on the work a validator performs during verification per tick. We also fixed the underlying infinite loop bug in the ID Leak verifier pass.

To summarize, the Denylist enables validators to temporarily circumvent the verifier routine by disabling the publishing or upgrading processes, effectively preventing some potential disruptions from problematic transactions. Given the Denylist mitigation in place, validators should remain operational with only a portion of their functionality disabled when facing verifier panics.

Timeline

  • April 27, 2023: CertiK reports the vulnerability to Sui.
  • April 28, 2023: Sui confirms the vulnerability, severity is pending for confirmation.
  • April 28, 2023: Sui patched the vulnerability in commit 7915de5.
  • April 30, 2023: Sui confirms the severity as CRITICAL.
  • May 16, 2023: Sui pays the bug bounty reward.

Conclusion

In this blog post, we delved into the technical aspects of the HamsterWheel attack identified by the CertiK Skyfall team. We explained how this innovative form of attack leverages a critical vulnerability to result in a complete network shutdown of the Sui blockchain. Additionally, we took a closer look at Sui's timely responses to fix this critical issue, sharing their approach to mitigate the Sui blockchain against future threats.

Top comments (0)