Danilo Poccia for AWS

Posted on Jan 28

Automated reasoning and generative AI: Harness creativity with formal verifications

#security #ai #aws #machinelearning

The term "artificial intelligence" (AI) has an interesting origin story. In the summer of 1956, a group of researchers gathered at Dartmouth College for what would become known as the Dartmouth Summer Research Project. John McCarthy, then a young assistant professor, had chosen the term "artificial intelligence" for their grant proposal, though some of his colleagues initially found it too bold. The story goes that McCarthy preferred "artificial intelligence" over "machine intelligence" or "complex information processing" because he wanted to stake out the ambitious goal of genuine machine intelligence while avoiding association with cybernetics, which was then a popular field. That summer gathering, which brought together brilliant minds like Marvin Minsky, Claude Shannon, W.S. McCulloch, Arthur Samuel, and John Nash, helped establish AI as a distinct field of study. If I'd get access to a time machine, visiting this workshop would be on my top list.

Now, nearly seven decades later, artificial intelligence has evolved in ways those early pioneers might not have imagined. In today's rapidly evolving landscape, ensuring the reliability and trustworthiness of AI systems has become a key challenge. While generative AI models, particularly Large Language Models (LLMs), have demonstrated remarkable capabilities in generating human-like text, code, and creative content, they also face significant challenges related to accuracy and reliability. One of the most pressing concerns is the phenomenon of "hallucinations", where AI models generate plausible but factually incorrect information.

The term "hallucinations" is an anthropomorphic way to classify output that is not true. LLMs generate text using a statistical approach, and some of those possible combinations are not true. Even if "hallucination" can be misleading, it serves a useful purpose to clearly communicate a possible outcome.

As organizations increasingly deploy AI solutions in critical applications, from customer service to healthcare and financial services, the ability to guarantee the accuracy and reliability of AI-generated responses becomes more and more important. This is where automated reasoning, a field with deep roots in computer science and formal logic, offers a powerful complement to modern machine learning approaches.

This article explores the synergy between automated reasoning and generative AI, with a particular focus on how this combination is being implemented in practical applications. We'll examine the fundamental differences between machine learning and automated reasoning approaches, understand how they can work together to create more reliable AI systems, and look at a concrete example through Amazon's implementation of automated reasoning checks in Bedrock Guardrails.

Understanding Automated Reasoning and Machine Learning

To appreciate how automated reasoning can enhance generative AI systems, we must first understand the fundamental differences between these two approaches. Machine learning, particularly in the context of large language models, relies on pattern recognition in vast amounts of training data. These models learn by analyzing examples, identifying patterns and relationships in the training data. A language model, for instance, develops its understanding of grammar, context, and domain knowledge by analyzing millions of text documents.

The probabilistic nature of machine learning allows it to handle complex, unstructured data and generalize to new situations with remarkable flexibility. These models excel at creative and open-ended tasks, working effectively even with ambiguous or incomplete information. However, this approach comes with inherent challenges. Models can be susceptible to biases in their training data, produce hallucinations, and often lack explainability in their decision-making process. Perhaps most importantly, they cannot provide formal guarantees of correctness.

Automated reasoning takes a fundamentally different approach. Instead of learning from data, these systems employ formal logic and mathematical principles to verify statements and properties. Through the use of formal methods, automated reasoning systems can provide mathematical guarantees of correctness within a given logical framework, assuming the underlying assumptions are true. Their results are transparent and explainable, making them particularly valuable for rule-based systems and formal verification. Results are accurate, sound, and transparent if assumptions are correct.

However, automated reasoning also has its limitations. It requires formal specification of rules and properties, making it less flexible than machine learning approaches. These systems may struggle to scale with very complex problems and cannot directly handle ambiguity or uncertainty in the way that machine learning models can.

The key insight is that these approaches are not mutually exclusive but complementary. Consider a customer service chatbot: the machine learning component can understand natural language queries and generate human-like responses, while the automated reasoning component verifies that these responses align with company policies, factual information, and business rules. Together, they create a system that is both flexible and reliable.

AWS's Journey with Automated Reasoning

Before exploring how automated reasoning enhances AI systems, it's worth understanding how AWS has already been putting these mathematical verification techniques to work. In fact, automated reasoning has been quietly powering critical aspects of AWS infrastructure for years, helping ensure the reliability and security of core services that millions of users depend on daily.

Within AWS, automated reasoning serves as a mathematical guardian for essential service areas including storage, networking, virtualization, identity management, and cryptography. Take for instance cryptographic implementations. AWS uses automated reasoning to formally verify their correctness, an approach that not only improves security but also accelerates development. By employing mathematical proofs to verify system behavior, AWS can catch potential issues that might slip past traditional testing methods.

The real-world applications of automated reasoning across AWS services tell an interesting story of how mathematical verification can solve practical problems. AWS IAM Access Analyzer uses automated reasoning to help administrators manage account permissions at scale, ensuring that access policies work exactly as intended. For network administrators, Amazon VPC Reachability Analyzer applies these same principles to help visualize and understand complex network configurations, making it easier to troubleshoot connectivity issues.

Security teams benefit from automated reasoning through Amazon S3 Block Public Access, which uses mathematical verification to guarantee that objects stored in S3 buckets never have public access: not just now, but also in the future. Similarly, Amazon Verified Permissions brings automated reasoning to application-level permissions, helping developers define and verify fine-grained access controls for their users.

These verification techniques have proven particularly valuable in areas where absolute certainty is required. When dealing with storage systems that must never publicly share data, or networking protocols that must maintain security under all conditions, automated reasoning provides the mathematical guarantees needed to build with confidence. What's fascinating about AWS's approach is how to apply these formal methods at scale, using mathematical verification as a practical tool in real-world systems rather than just a theoretical exercise.

Modern software verification employs various approaches, each offering different trade-offs between ease of use and strength of guarantees. AWS contributes to the open source program verification tools used in the previous examples. Dafny and Kani represent two powerful approaches to program verification. Let's see how they work in practice before connecting the dots between automated reasoning and generative AI.

Dafny: Program Verification Through Proof

Dafny is a verification-aware programming language that allows developers to write both code and specifications in the same language. Think of it as a programming language with built-in support for mathematical reasoning.

Here's a practical example of how Dafny helps verify code correctness for a function looking for the minimum value in an array:

method FindMin(a: array<int>) returns (min: int)
  requires a.Length > 0  // Precondition: array must not be empty
  ensures forall k :: 0 <= k < a.Length ==> min <= a[k]  // Postcondition: min is less than or equal to all elements
  ensures exists k :: 0 <= k < a.Length && min == a[k]   // Postcondition: min exists in the array
{
  min := a[0];
  var i := 1;
  while i < a.Length
    invariant 1 <= i <= a.Length
    invariant forall k :: 0 <= k < i ==> min <= a[k]
    invariant exists k :: 0 <= k < i && min == a[k]
  {
    if a[i] < min {
      min := a[i];
    }
    i := i + 1;
  }
}

This example demonstrates several key features of formal verification:

Preconditions (requires): The array must be non-empty
Postconditions (ensures): The returned value must be the minimum
Loop invariants: Properties that hold before and after each loop iteration
Quantifiers: Mathematical statements about all or some elements

Dafny automatically verifies that the implementation satisfies these specifications. If we made a mistake while trying to optimize the function for speed, like forgetting to update min in some edge cases, Dafny would point out the error during verification.

Beyond Unit Tests: From Fuzzing to Formal Methods

Before diving into the next tool, it's worth understanding the progression of testing approaches.

Fuzzing is like throwing random data at your program to see what breaks. It's particularly good at finding edge cases and security vulnerabilities.

Property testing takes this a step further by letting you specify properties your code should maintain. Instead of writing specific test cases, you describe what should be true for a range of inputs. However, it is often practically impossible to test all possible inputs.

Model checking, where Kani operates, goes even further. Rather than testing with specific values, it mathematically explores all possible states of your program. This can find subtle bugs that might difficult to surface with fuzzing and testing.

Kani: Model Checking for Rust

Kani is a model checker for Rust that can prove properties about your code by exploring all possible execution paths. Here's a practical example:

pub fn absolute_value(x: i32) -> i32 {
    if x < 0 {
        return -x;
    }
    x
}

#[cfg(kani)]
#[kani::proof]
fn verify_absolute_value_is_positive() {
    let x: i32 = kani::any();

    // Skip the overflow case for i32::MIN
    kani::assume(x > i32::MIN);

    let abs_x = absolute_value(x);
    assert!(abs_x >= 0);
}

#[cfg(kani)]
#[kani::proof]
fn verify_absolute_value_preserves_positive() {
    let x: i32 = kani::any();

    // Consider only positive inputs
    kani::assume(x > 0);

    let abs_x = absolute_value(x);
    assert!(abs_x == x);
}

This example shows how Kani can verify important properties:

The proof harness verify_absolute_value_is_positive verifies that the absolute value is always non-negative
verify_absolute_value_preserves_positive checks that positive numbers remain unchanged
kani::any() tells Kani to consider all possible values
kani::assume helps focus verification on relevant cases

Kani is particularly valuable for systems programming where correctness is crucial. It can find subtle bugs that might escape traditional testing:

pub fn divide(x: i32, y: i32) -> i32 {
    x / y  // Potential division by zero!
}

#[cfg(kani)]
#[kani::proof]
fn verify_divide_safety() {
    let x: i32 = kani::any();
    let y: i32 = kani::any();

    // Demonstrate how Kani finds division by zero
    let result = divide(x, y);
}

Running Kani on this code would immediately identify the potential division by zero error, even though traditional testing might miss it if we didn't explicitly test with a zero divisor.

The combination of these verification approaches, from Dafny's proof-based verification to Kani's model checking, provides developers with powerful tools to ensure code correctness. While they require more upfront effort than traditional testing, they can provide mathematical certainty about critical properties of your code.

Combining Automated Reasoning with Generative AI

The integration of automated reasoning with generative AI represents a significant advancement in building more reliable and trustworthy AI systems. At the heart of this combination lies a solution to one of the most pressing challenges faced by Large Language Models: hallucinations. These occur when models generate content that seems plausible but is factually incorrect, internally inconsistent, or contradictory to known facts or business rules. This issue becomes particularly critical in business contexts where accuracy and reliability are paramount, such as customer support systems, legal document analysis, healthcare applications, and financial services.

Automated reasoning addresses this challenge through formal verification, structured validation, and explainable results. The process begins by translating business rules and policies into mathematical logic, which can then be used to verify that AI-generated responses conform to these rules. This approach provides not just validation but mathematical proof of correctness, along with clear reasoning for why a response is valid or invalid.

The implementation typically follows a systematic process that begins with policy creation, where business rules and policies are documented and converted into formal logic. This is followed by integration, where LLM output is connected to the automated reasoning system and validation checks are implemented. The final step involves thorough testing with real-world scenarios and continuous refinement based on results.

Real-world applications of this combined approach can span various domains. In customer service, the system can verify responses against company policies. Financial services can utilize the system to ensure compliance with regulations, while legal applications can verify compliance with specific laws and regulations.

The combination of automated reasoning and generative AI delivers several crucial advantages. Beyond enhanced reliability through reduced hallucinations and increased accuracy, organizations benefit from improved trust through verifiable results and clear explanation of reasoning. The approach also drives operational efficiency through automated validation and faster response times, while maintaining scalability through consistent application of rules and easy updates to validation policies.

Preventing Factual Errors with Amazon Bedrock Guardrails and Automated Reasoning

A few weeks back, AWS has introduced in preview Automated Reasoning checks in Bedrock Guardrails, bringing mathematical validation to their AI offerings, addressing one of the most pressing challenges in generative AI development: ensuring the accuracy of generated responses.

At its heart, Bedrock Guardrails serves as a comprehensive guardian for AI applications, watching over content generation like a thoughtful editor. The system can gracefully filter out unwanted content, carefully protect personal information through redaction, and enhance both safety and privacy. Think of it as a sophisticated safety net that catches not just obvious errors, but also subtle inconsistencies that might otherwise slip through.

When a company wants to ensure their AI stays within specific guidelines, they can use this new capability and upload their documents (things like company policies, procedures, or rules) to the Amazon Bedrock console. The system reads through these documents and translates them into a mathematical format that can be used to verify responses. It's like teaching the system to think with the precision of a mathematician while still speaking in human terms.

While behind the scenes everything is expressed in formal logic, users don't need to be mathematicians to work with it. The system presents everything in natural language, making it accessible to anyone who understands the business rules they're trying to enforce. Variables and rules are described in plain English, yet they carry the weight of mathematical certainty.

To see this in action, imagine an airline's customer service system. The airline might upload their ticket change policies, which include rules like requiring email submissions within 24 hours of purchase for name corrections. When a customer asks about changing a name on their ticket, the AI first generates a response, but before sending it, Bedrock Guardrails steps in to verify that the answer aligns perfectly with the airline's policies.

Testing these policies becomes an intuitive process through a testing playground. Here, users can try out various scenarios just as their customers might, seeing how the system handles different questions and answers. When the system finds an issue, it doesn't just say "no" but it explains why something doesn't work and suggests what would make it valid. It's like having a knowledgeable colleague review every response, ensuring nothing goes out that shouldn't.

The system plays well with others, too. It can work alongside existing AI techniques like Retrieval-Augmented Generation (RAG), adding an extra layer of mathematical certainty to these new powerful approaches. Developers can connect it to their applications through APIs or manage it through a console, making it flexible enough to fit into various workflows.

Getting the best results from the system is an ongoing journey of refinement. Users often start with basic descriptions and gradually enhance them as they learn how their customers actually phrase questions. For instance, a simple rule about employment status might evolve from "works more than 20 hours per week" to include various ways people might describe their situation such as full-time, part-time, or regular hours. This kind of iterative improvement helps the system better understand and validate real-world conversations.

The end result is a system that brings together the precision of mathematics with the nuance of human language. It helps ensure that AI systems can be both powerful and trustworthy, creative yet constrained by facts and policies. In a world where AI accuracy is increasingly crucial, this combination of automated reasoning and generative AI represents a thoughtful step toward more reliable artificial intelligence.

Future Implications and Conclusion

The integration of automated reasoning with generative AI represents more than just a technical solution to improve AI reliability. It establishes a new paradigm for building systems that are both innovative and trustworthy. Implementations like Amazon Bedrock Guardrails demonstrates that this approach is not only theoretically sound but practically viable.

The future of AI lies not just in making systems more powerful, but in making them more reliable and trustworthy. What's particularly fascinating is how this combination of automated reasoning and generative AI represents a convergence of AI's historical approaches. From the field's earliest days, AI has developed along three main branches: symbolic AI (based on rules and logic), probabilistic AI (based on statistical methods), and connectionist AI (based on neural networks). The integration of automated reasoning with modern machine learning beautifully bridges these approaches – combining the logical rigor of symbolic AI, the statistical foundations of probabilistic AI, and the neural architectures of connectionist AI.

This convergence is more than just theoretical. By applying automated reasoning to neural network outputs, we're effectively combining symbolic logic with connectionist and probabilistic approaches. When an LLM generates text using its neural networks and probability distributions, and automated reasoning then verifies this output against logical rules, we're seeing all three branches of AI working in harmony. This synthesis provides a clear path toward building AI systems that can be both creative and verifiable. As we look to the future, this combination has the potential to play a crucial role in shaping the next generation of AI applications.

To get more information on these technologies:

Top comments (2)

Jason Dunn [AWS] • Jan 28

Very detailed article, great work!

Danilo Poccia AWS • Jan 28

Thank you Jason!