Unicorn Developer

Posted on Apr 21

Let's check vibe code that acts like optimized C++ one but is actually a mess

#cpp #vibecoding #programming #ai

The value of a skilled developer is shifting toward the ability to effectively review code. Although generating code now is easier than ever, evaluating it for proper decomposition, correctness, efficiency, and security is still important. To see why it's important to understand generated code and to recognize what lies beneath a program's elegant syntax, let's look at a small project called markus, created using Claude Opus.

The markus project

While searching for an AI-generated project, I came across markus.

This is a single-header library for parsing text and converting it to HTML. It was created using Claude Opus 4.5.

I liked that the library is tiny. You can browse through it and even dive into exploring individual code snippets. When faced with a lot of code right off the bat, you don't even feel like starting to look into it.

The library consists of a single file, markus.h, containing 6,484 lines of C++20 code. In reality, there's even less code. Some algorithms use lookup table methods for optimization purposes. Four tables in the code are way too stretched out. The example:

static constexpr uint8_t kWhitespaceTable[256] = {
      0,
      0,
      0,
      0,
      ... // and so on for 256 lines

Apart from the tables, there are actually 1,000 fewer lines of code. PVS-Studio analyzer didn't spot anything unusual in the project, but the check gave me a reason to take a closer look at some code snippets. So now, as an expert, I can highlight the points worth discussing.

Vibe coding issues

Using AI for code generation can seem appealing, especially to an inexperienced programmer who hasn't worked on large legacy projects.

Anyone who has dealt with this knows that the main goal isn't just to write code; it's also to make sure that the code is easy to maintain and develop. Robert Martin put it really well:

When left on its own, AI just piles code on top of code. It cannot hold the big picture in its mind. It doesn't really even understand the concept of a big picture. Architecture is likely beyond its capacity.

Learning the principles of decomposition and knowing how to correctly formulate problems are becoming increasingly important skills for programmers.

Still, I'd like to focus on another aspect of vibe coding that is rarely discussed: the fascination with the code-generation process and, as a result, an excessive reliance on it.

Once people see that AI can perform simple tasks just as well if not better than they can themselves, they tend to lose their skepticism. It seems that even in large-scale projects, the quality would be high, and the code would be optimized.

The example is right here. Out of curiosity, I asked AI to write some code that could quickly count the number of set bits in large arrays. I was impressed by the algorithms it used. I didn't know about some of those clever solutions, even though I had once read a book that covered a whole range of algorithms.

Looking at the code generated for counting bits, one might think, "Isn't it time to take up farming?" I wouldn't have written code like that myself. In fact, I wouldn't have even thought to look for it since I didn't know such algorithms existed.

This fascination extends to all AI-generated code. At first glance, I thought there was no point in analyzing the markus code—it looked clean, well-structured, and full of optimized solutions:

fine-tuned compiler optimization using branch prediction hints like [[unlikely]];
conditional compilation using if constexpr;
optimized table-driven algorithms;
generation of SIMD-friendly helper functions;
containers from the std::pmr namespace that control memory allocation strategies.

The code looks good, so what's there to check? Well, there's actually a catch. The impression that the code is optimized is false. Or rather, some parts are really well optimized, while others are pretty bad.

Though, the project author deserves praise for his thoughtful and measured approach to describing the project. Ryan clearly warns:

DO NOT USE IN PRODUCTION. This is completely vibe coded and has not undergone any reviews for memory safety. Its performance is also 2-3x slower than cmark.

So, what issue do I want to highlight in this article? The code may appear pedantic, optimized, and secure, but it's not.

Fortunately, we can compare the performance of the markus project with that of another solution and see that markus isn't very fast. But what if there's nothing to compare it to? An expert opinion is required. The problem is that many people now believe that experts and code reviews aren't necessary.

Is an expert needed?

Or maybe an expert code review isn't really necessary, after all?

You created some code. It looks nice and neat. It feels optimized and works well. Could this be enough?

I think you can skip the review if all four conditions are met:

No one will get hurt or lose money if the code doesn't work as intended due to bugs or a misinterpretation of specifications.
The code is reckless. No one will break it, since no one needs to.
The code performance is irrelevant.
The project is expected to have a short lifespan. It won't require decades of development, ongoing maintenance, or adjustments for other platforms, etc.

Otherwise, it's still the same—all code quality requirements still apply, and they are even more important now. The same practices described in the SSDLC should apply to vibe code.

Why am I so strict about AI?

I'm not strict, but I'm demanding. The project is written in C++. The idea behind using C++ is to develop fast and efficient applications that can use memory in a smart way. If you don't need a fast program, don't use C++.

Since C++ was chosen for building a fast application, the goal is to write fast code. It doesn't matter whether a human developer writes it or an AI. If the code isn't fast, the task isn't solved.

My personal interest lies in exploring what vibe code is and how static analyzers can help ensure its reliability. Just like human programmers, AI-generated code is prone to anti-patterns, though they're different. It makes sense to develop PVS-Studio analyzer with these peculiarities in mind.

Well, since I'm looking at these projects anyway, I might as well write articles about them. I'm not trying to criticize vibe code or vibe coding in general. That would be silly—it's just a tool. Unlike AI evangelists, however, I'm not here to promote these technologies. If I see that the code is garbage, I'll say so.

SIMD-friendly code

The comment for the IsSpanBlank function states that the code is designed to streamline the vectorization process for the compiler, for example by using SSE instructions.

The key to optimization is simultaneously handling multiple elements to determine if they contain spaces.

// Check if a span contains only blank characters (space, tab, \r, \n)
// SIMD-friendly: processes 8 bytes at a time
inline bool IsSpanBlank(const char* data, size_t len) {
  size_t i = 0;
  // Process 8 bytes at a time
  for (; i + 8 <= len; i += 8) {
    // For each byte, check if it's NOT a blank character
    // Blank chars: space(0x20), tab(0x09), \n(0x0A), \r(0x0D)
    bool all_blank = true;
    for (size_t j = 0; j < 8; ++j) {
      unsigned char c = static_cast<unsigned char>(data[i + j]);
      bool is_blank = (c == ' ' || c == '\t' || c == '\n' || c == '\r');
      all_blank = all_blank && is_blank;
    }
    if (!all_blank) return false;
  }
  // Handle remaining bytes
  for (; i < len; ++i) {
    unsigned char c = static_cast<unsigned char>(data[i]);
    if (c != ' ' && c != '\t' && c != '\n' && c != '\r') {
      return false;
    }
  }
  return true;
}

The code makes it seem like the AI knows what it's doing. The inner loop handles eight characters at a time and can be optimized using SIMD instructions. A separate simple loop handles the remaining characters.

I found this code suspicious because the bytes are actually handled sequentially. The nested loop doesn't do anything useful. I think this code is just as good as the simplest way to implement it.

inline bool IsSpanBlank_Simple(const char* data, size_t len) {
  for (size_t i = 0; i < len; ++i) {
    unsigned char c = static_cast<unsigned char>(data[i]);
    if (c != ' ' && c != '\t' && c != '\n' && c != '\r') {
      return false;
    }
  }
  return true;
}

I'd write the code this way. Had I known this was a bottleneck and I absolutely had to optimize the code as much as possible, I would've looked for an intrinsic-based solution.

I wouldn't try any other option, such as one with a nested loop. I know a thing or two about optimization, but not enough to dare write fancy code just to give the compiler a hint.

One could say, "You're weak! Claude Opus did it—it came up with the smarter version!"

Now, let's compare performance. My colleague Phillip conducted a small experiment with GCC and Clang:

We create the same dataset for both benchmarks: we generate 1,000,000 strings of random lengths, with arbitrary content, ranging from 1 to 127 characters. The calculations don't include set generation.
We feed a dataset to the algorithm and determine whether each string contains spaces. We measure the total time.

In both cases, the code is compiled with aggressive -O3 optimizations. As you can see, in both compilers, the simple function performs better than the one with the nested loop.

Shall we specify a magic optimization key to better optimize the vibe code? That key doesn't exist. And even if it does, the compiler will generate an equally efficient version from simple code with a single loop, even without any hints.

Along with my colleague, I also experimented with the two previously explored options and three new algorithm variants. I used the MSVC compiler. Let's take a look at the measurement results.

First, someone suggested improving the code by replacing the original || with |. In our case, short-circuit evaluation only causes more harm than good. Here's an improved check:

bool is_blank = (c == ' ' | c == '\t' | c == '\n' | c == '\r');

The full function code

inline bool IsSpanBlank_BinOR(const char* data, size_t len) {
  size_t i = 0;
  // Process 8 bytes at a time
  for (; i + 8 <= len; i += 8) {
    // For each byte, check if it's NOT a blank character
    // Blank chars: space(0x20), tab(0x09), \n(0x0A), \r(0x0D)
    bool all_blank = true;
    for (size_t j = 0; j < 8; ++j) {
      unsigned char c = static_cast<unsigned char>(data[i + j]);
      bool is_blank = (c == ' ' | c == '\t' | c == '\n' | c == '\r');
      all_blank = all_blank && is_blank;
    }
    if (!all_blank) return false;
  }
  // Handle remaining bytes
  for (; i < len; ++i) {
    unsigned char c = static_cast<unsigned char>(data[i]);
    if (c != ' ' && c != '\t' && c != '\n' && c != '\r') {
      return false;
    }
  }
  return true;
}

Secondly, I asked DeepSeek. It suggested an optimization approach based on bitwise operations:

inline bool IsSpanBlank_Deepseek(const char* data, size_t len) noexcept {
    // Creating a bitmask for valid characters
    // Using a 128-bit mask for ASCII
    static constexpr uint64_t blank_mask_low = []() {
        uint64_t mask = 0;
        mask |= 1ULL << ' ';
        mask |= 1ULL << '\t';
        mask |= 1ULL << '\n';
        mask |= 1ULL << '\r';
        return mask;
    }();

    for (size_t i = 0; i < len; ++i) {
        unsigned char uc = static_cast<unsigned char>(data[i]);
        if (uc >= 64) return false;  // All blank characters in ASCII < 64
        if (!((blank_mask_low >> uc) & 1)) return false;
    }
    return true;
}

Finally, I asked DeepSeek for an optimized function that truly takes advantage of vectorization. Here's what I got:

inline bool IsSpanBlank_Deepseek_SSE(const char* data, size_t len) noexcept {
  // Creating vectors with valid characters
  const __m128i space = _mm_set1_epi8(' ');
  const __m128i tab = _mm_set1_epi8('\t');
  const __m128i newline = _mm_set1_epi8('\n');
  const __m128i carriage = _mm_set1_epi8('\r');

  size_t i = 0;

  // Processing 16 bytes at a time
  for (; i + 15 < len; i += 16) {
    __m128i chunk = _mm_loadu_si128(reinterpret_cast<const __m128i*>(data + i));

    // Checking that each byte matches one of the valid values
    __m128i eq_space = _mm_cmpeq_epi8(chunk, space);
    __m128i eq_tab = _mm_cmpeq_epi8(chunk, tab);
    __m128i eq_nl = _mm_cmpeq_epi8(chunk, newline);
    __m128i eq_cr = _mm_cmpeq_epi8(chunk, carriage);

    __m128i is_blank = _mm_or_si128(_mm_or_si128(eq_space, eq_tab),
                                         _mm_or_si128(eq_nl, eq_cr));

    // Obtaining the mask and checking it
    int mask = _mm_movemask_epi8(is_blank);
    if (mask != 0xFFFF) return false;
}

  // Processing the remainder
  for (; i < len; ++i) {
    char c = data[i];
    if (c != ' ' && c != '\t' && c != '\n' && c != '\r') return false;
  }

  return true;
}

Now, the results. Although 32-bit builds are no longer relevant, I ran it just for the sake of curiosity. Testing was performed on a 212-megabyte array filled with random space characters. Thread execution is bound to a single preheated core. If you're not sure what I'm talking about, check out the article "An eternal question of timing".

Win32:
1. The simplest option with a single loop:                          1.39 s.
2. The original "optimized code" with a nested loop:                1.86 s.
3. The original version but the || operators were replaced with | : 0.351 s.
4. Deepseek's implementation with bitmask optimization:             0.817 s.
5. Deepseek's implementation with SSE2 intrinsic optimization:      0.0449 s.

Win64:
1. The simplest option with a single loop:                          0.175 s.
2. The original "optimized code" with a nested loop:                0.309 s.
3. The original version but the || operators were replaced with | : 0.253 s.
4. Deepseek's implementation with bitmask optimization:             0.207 s.
5. Deepseek's implementation with SSE2 intrinsic optimization:      0.0396 s.

Clearly, the 64-bit code is much more efficient, except when SSE2 is involved. Even then, though, everything runs quickly.

The code Claude Opus generated for the markus project is the worst of all the options.

Not only is the simplest implementation with a regular loop faster, it's also shorter. The extra code lines only made things worse.

Although intrinsic code is in the lead, this comes at the cost of limited portability.

Some might say I'm worrying for nothing. Yes, Claude Opus chose a poor implementation, but DeepSeek offered a perfectly good, fast, and portable one.

I agree. However, we see some poorly written code in the project. It looks optimized, but it isn't. The issue remains unresolved. This case requires an expert who understands it and can come up with a different solution.

The issue will only get worse in the future. There will be more and more generated code, but the number of people who can review it will remain the same. Then, the systems will be trained using the same "optimized" code. It'll be a wild ride.

String comparison: the beginning

This PVS-Studio warning caught my attention: V590 [CWE-571] Consider inspecting this expression. The expression is excessive or contains a misprint. markus.h 5987

The code does contain a redundant check:

BlockNode ParseParagraph() {
  ....
  auto trimmed = detail::TrimLeft(line); // trimmed has std::string_view type
  ....
  if (trimmed.size() >= 2 && trimmed[1] == '!' && trimmed.size() >= 9) {  
  ....
}

It's really no big deal—just an extra check that the compiler will remove anyway. For the sake of readability, I'd suggest tweaking the code a bit, but no more than that:

if (trimmed.size() >= 9 && trimmed[1] == '!') // now we're talking

However, I noticed the code below the if statement. Now, this redundancy is worth a closer look:

std::pmr::string upper;
for (size_t j = 0; j < 9; ++j) {
  upper += static_cast<char>(
      std::toupper(static_cast<unsigned char>(trimmed[j])));
}
if (upper.starts_with("<!DOCTYPE")) {
  goto html_interrupt;
}

A check is performed here to ensure that the string begins with "<!doctype", regardless of whether lowercase or uppercase letters are used. To ignore the case, a temporary uppercase string is created and compared with the pattern.

It's not a very efficient solution. Even with the use of std::pmr::string, creating a temporary string is a resource-intensive operation. After all, the characters still need to be copied into memory one by one. By the way, simple std::string would work just as well here, since the string is less than 15 characters long and the Small String Optimization applies.

It's better to write a function that performs case-insensitive comparisons. For example, this one:

bool starts_with_insensitive(std::string_view str,
                             std::string_view prefix_lower_case)
{
  if (str.size() < prefix.size()) return false;

  const auto pred = [](unsigned char lhs, unsigned char rhs)
  {
    return lhs == std::tolower(rhs);
  };

  return std::equal(prefix_lower_case.begin(), prefix_lower_case.end(),
                    str.begin(),
                    pred);
}

This is just an example. We can implement the algorithm differently or create istring_view by writing our own char_traits, and so on. Most importantly, we avoid creating a temporary string and simplify the code that performs the check:

if (starts_with_insensitive(trimmed, "<!doctype")) {
  goto html_interrupt;
}

In the next chapter, we'll see even more clearly why we need to create a comparison function. As they say, this is just the beginning; the worst is yet to come.

String comparison: the worst

While looking through other PVS-Studio warnings, I noticed this string vector:

std::optional<HtmlBlock> TryParseHtmlBlock() {
  ....
  static const std::pmr::vector<std::pmr::string> type1_tags = {
      "script", "pre", "style", "textarea"
  };
  ....
}

The analyzer doesn't like it: V1096 The 'type1_tags' variable with static storage duration is declared inside the inline function with external linkage. This may lead to ODR violation. markus.h 4781

It makes no sense to discuss this warning in the article. What matters far more is how this vector is handled. An analyzer can't understand abstract concepts or recognize when something is off. However, a human can see right through it and express their thoughts on the nonsense.

The code below checks whether these tags appear in the string. But the tag still has the < angle bracket, so the AI creates temporary strings containing the tag with the bracket:

for (const auto& tag : type1_tags) {
  std::pmr::string open_tag = "<" + tag;

Next, we check whether this tag is in the string, ignoring case. Well, let's create another temporary string for this. Generating vibe code is a piece of cake.

std::pmr::string lower;
for (size_t j = 0; j < open_tag.size(); ++j) {
   lower += static_cast<char>(
       std::tolower(static_cast<unsigned char>(trimmed[j])));
}

All this mess looks like this:

static const std::pmr::vector<std::pmr::string> type1_tags = {
    "script", "pre", "style", "textarea"
};

for (const auto& tag : type1_tags) {
  std::pmr::string open_tag = "<" + tag;
  if (trimmed.size() >= open_tag.size()) {
    std::pmr::string lower;
    for (size_t j = 0; j < open_tag.size(); ++j) {
      lower += static_cast<char>(
        std::tolower(static_cast<unsigned char>(trimmed[j])));
    }
    if (lower == open_tag) {
      // ....
    }
  }
}

In the worst-case scenario (if the tag isn't in the text) eight temporary strings are created! The only silver lining is that the created strings are small, and the Small String Optimization in std::pmr::string will eliminate dynamic allocations.

It's really easy to make the code run faster. Firstly, we can create the necessary prefixes right away and store them in a vector. Secondly, we can simply replace the vector with the array of std:string_view, completely removing dynamic allocations.

using namespace std::literals;

static constexpr std::array type1_tags = {
  "<script"sv, "<pre"sv, "<style"sv, "<textarea"sv
};

Thirdly, we can now find the prefix using the starts_with_insensitive function created earlier. As a result, all the messy code shown above is shortened to:

using namespace std::literals;

static constexpr std::array type1_tags = {
  "<script"sv, "<pre"sv, "<style"sv, "<textarea"sv
};
for (auto tag : type1_tags) {
  if (starts_with_insensitive(trimmed, tag.c_str(), tag.size()) {

The code is now half as long and N times faster.

There's still room for improvement, though. For example, instead of creating a string with a tag enclosed in brackets like end_condition = "</" + tag + ">";, we can take one from the array. Okay, let's move on.

The price of beauty

When I first skimmed through the code, the code for serializing an object array into a string really impressed me. Everything is correct: various functions use the "visitor" pattern to iterate through arrays of options and create the necessary strings. The code is error-proof and concise.

There are four serialization functions: GetAltTextFromIds, GetAltText, RenderInlines, and DebugAst. For this example, however, I'll use the shortest one:

using InlineNode = std::variant<....>;

std::pmr::string GetAltText(const std::pmr::vector<InlineNode>& nodes) {
  std::pmr::string result;
  for (const auto& node : nodes) {
    std::visit(
        [&result, this](auto&& arg) {
          using T = std::decay_t<decltype(arg)>;
          if constexpr (std::is_same_v<T, Text>) {
            result += arg.content;
          } else if constexpr (std::is_same_v<T, Code>) {
            result += arg.content;
          } else if constexpr (std::is_same_v<T, SoftBreak>) {
            result += ' ';
          } else if constexpr (std::is_same_v<T, HardBreak>) {
            result += ' ';
          } else if constexpr (std::is_same_v<T, Emphasis>) {
            result += GetAltTextFromIds(arg.children);
          } else if constexpr (std::is_same_v<T, Strong>) {
            result += GetAltTextFromIds(arg.children);
          } else if constexpr (std::is_same_v<T, Link>) {
            result += GetAltTextFromIds(arg.children);
          } else if constexpr (std::is_same_v<T, Image>) {
            result += arg.alt_text;
          }
        },
        node);
  }
  return result;
}

It looks elegant. At first, because of this code, I thought about giving the article a less provocative title. After reviewing the flaws, I wanted to use this code as a positive example. Like, here, the AI wrote better code than an inexperienced C++ programmer could. Overall, some parts turned out better than others.

Naturally, before writing the praise part, I had to take a closer look at the code and figure out what was going on. Why did I ever do that... I could've stayed charmed by the patterns, the constexpr keyword, and the code neatly formatted in a column.

It turns out things aren't great here either. Let me tell you why.

The task is to store an array of elements that can be serialized into strings in different ways.

A simple approach would be to create a base class and have different entities inherit from it. The container will then store pointers to it. Serialization requires calling different virtual functions in different situations. The result would be something like the following class hierarchy:

using namespace std::literals;

struct Node
{
    virtual ~Node() = default;
    virtual std::string StrAltText() = 0;
    virtual std::string StrDebug() = 0;
};

struct SoftBreak : public Node {
    std::string StrAltText() { return " "s; }
    std::string StrDebug() { return " "s; }
};

struct HtmlBlock : public Node {
    std::string content;
    std::string StrAltText() { return content; }
    std::string StrDebug() { return std::format("HtmlBlock = {}", content); }
};

An array of such objects would turn into a string like this:

std::string GetAltText(const std::vector<Node *>& nodes) {
    std::string result;
    for (const auto& node : nodes)
        result += node->StrAltText();
    return result;
}

The code above requires manual memory management and is unsafe in general. We can fix this by storing smart pointers, such as std::unique_ptr, in the vector. Another option is to use std::any.

However, another issue remains: the serialization code is spread out among many different classes. You can put the processing in one place by determining the object type in the array using dynamic_cast. That's slow.

So, the standard approach is to store the object identifier in the base class so that its type can be quickly determined:

enum NodeType
{
    kSoftBreak,
    kHtmlBlock
};

struct Node
{
    const NodeType type;
    Node(NodeType t) : type(t) {}
    virtual ~Node() = default;
};

struct SoftBreak : public Node {
    SoftBreak() : Node(kSoftBreak) {}
};

struct HtmlBlock : public Node {
    std::string content;
    HtmlBlock() : Node(kHtmlBlock) {}
};

Now, we can put the processing in one place, even if it's not very elegant:

std::string GetAltText(const std::vector<std::unique_ptr<Node>>& nodes)
{
    std::string result;
    for (const auto& node : nodes)
        switch(node->type)
        {
            case kSoftBreak:
            {
                // There is no need to retrieve the object itself.
                // SoftBreak &n = *static_cast<SoftBreak *>(node);
                result += ' ';
                break;
            }
            case kHtmlBlock:
            {
                const HtmlBlock &n = static_cast<const HtmlBlock &>(*node);
                result += n.content;
                break;
            }
    }
    return result;
}

To achieve a more elegant solution, we can use the std::variant data type, as Claude Opus did. The std::variant type is a smart, type-safe version of union. There's just one catch, though, and a really big one.

For starters, AI has created a strange, redundant entity. It introduced the NodeType enumeration, which is required for the schema we discussed above:

enum class NodeType {
  ....
  kHtmlBlock,
  kBlockQuote,
  kList,
  kListItem,
  kText,
  kSoftBreak,
  ....
};

It added the meaningless numeric identifier, kType, to the structures:

struct SoftBreak {
  static constexpr NodeType kType = NodeType::kSoftBreak;
};

struct HtmlBlock {
  static constexpr NodeType kType = NodeType::kHtmlBlock;
  std::pmr::string content;
  int block_type = 0;  // CommonMark HTML block type (1-7)
};

The enumeration and identifier are redundant in the structures. kType isn't used anywhere, since it's declared as static and doesn't take up memory in every object.

The enumeration is used only once, in the NodeTypeToString function:

// Convert NodeType to string for debugging
inline std::string_view NodeTypeToString(NodeType type) {
  switch (type) {
    case NodeType::kDocument:
      return "Document";
    case NodeType::kParagraph:
      return "Paragraph";
    case NodeType::kHeading:
      return "Heading";
    case NodeType::kThematicBreak:
      return "ThematicBreak";
   ....
    case NodeType::kHtmlInline:
      return "HtmlInline";
  }
  return "Unknown";
}

The only unclear things are how and why to use this function. What's it for? How can it help? It doesn't seem like it will help anyone with anything. All in all, you can remove all of this and shorten the project code by another 100 lines.

It's just some redundant code, so it's no big deal. The worst is literally yet to come.

A key feature of the std:variant type is that it must be large enough to hold the largest alternative. In our case, the option should include the following classes:

using InlineNode = std::variant<Text, SoftBreak, HardBreak, Code, Emphasis,
                                Strong, Link, Image, HtmlInline>;

These classes vary greatly in size, starting with the smallest ones, such as SoftBreak:

struct SoftBreak {
  static constexpr NodeType kType = NodeType::kSoftBreak;
};

This class is 1 byte in size. Even though the class is empty, it still has to occupy at least one byte.

The largest class is Image:

struct Image {
  static constexpr NodeType kType = NodeType::kImage;
  std::pmr::string destination;
  std::pmr::string title;
  std::pmr::string alt_text;
};

In a 64-bit program, it can occupy 120 bytes. The thing is that std::pmr::string, for example, takes up 40 bytes in the STL implementation. Do you already see what that means?

The option also stores information about the type it currently holds. The total size of a single InlineNode object is 128 bytes! Take a look here.

Overall, the program works with arrays where each element is 128 bytes in size. The memory usage "efficiency" is ridiculous.

Even just storing empty classes that represent separators and usually convert to spaces requires so much memory:

} else if constexpr (std::is_same_v<T, SoftBreak>) {
  result += ' ';

Knowing where to print a space requires 128 bytes.

While memory overuse may not be a significant issue, it raises concerns about performance. The processor cache is also used less efficiently when working with such a spanned array.

One might argue that using smart pointers adds an extra layer of indirection when accessing an object, which also harms performance. It's also necessary to check the object type (see the example with switch above).

It's a no. Indirect access via a pointer isn't a big deal considering there will be a lot of memory accesses later on for reading and concatenating strings. Memory overuse sounds like a much more serious issue. The object type check is still there; it's just hidden within std::variant.

Just imagine how painful it is to extend the vector if the next element no longer fits in the reserved buffer. Here's an example:

std::pmr::vector<InlineNode> link_content;
for (size_t i = opener->pos + 1; i < result.size(); ++i) {
  link_content.push_back(std::move(result[i]));
}

When the vector is expanded, objects are moved to new locations. Yikes... The version using smart pointers to the base class std::vector<std::unique_ptr<Node>> is much lighter.

Once again, we have bad code wrapped up in a pretty package.

I'm not saying that smart pointers or std::any are the best solution here. There are other options as well. When the goal is to write efficient code, it's important to consider how to store objects and iterate over them for serialization.

What else?

I don't know. I've already spent more time reviewing the code and writing this article than I had planned.

This once again proves the growing value of reviewers. Generating code is getting easier and easier. My question is, where does one find the energy to review it and determine whether it turned out as expected? Who will take responsibility for the outcome? After all, this is C++, and the whole point of using it is efficiency. If someone doesn't need performance at all, what are they even doing here?

I'm finishing up the detailed review, but before concluding the article, I'll go over a few of PVS-Studio warnings.

Forgetfulness

The analyzer warning: V560 [CWE-571] A part of conditional expression is always true: !input.empty(). markus.h 2228

explicit LineBuffer(std::string_view input) {
  if (input.empty()) [[unlikely]] {
    return;
  }
  ....
  // input is not modified
  ....
  if (input.size() > line_start ||
      (!input.empty() && (input.back() == '\n' || input.back() == '\r'))) {
    line_offsets_.push_back({line_start, input.size() - line_start});
  }
  ....
}

The AI meticulously added [[unlikely]] to alert the compiler that the event is unlikely. However, after some time, it forgets that the string is never empty and ends up performing an unnecessary recheck with !input.empty().

You're only as good as the company you keep

The analyzer warning: V1048 [CWE-1164] The 'prev_line_had_content' variable was assigned the same value. markus.h 4090

void ExtractLinkReferences(Document& doc) {
  ....
  bool prev_line_had_content = false;
  ....
  if (prev_line_had_content) {
    prev_line_had_content = true;  // This line also has content
    Advance();
    continue;
  }
  ....
}

The true value is assigned to the prev_line_had_content variable, even though the check already confirms that this value is stored there. The analyzer is right—we're looking at a meaningless line of code. But the warning isn't exciting: I wouldn't have paid it any attention if it weren't for the comment.

Have you heard stories about programmers writing accurate but useless comments? It seems the AI had some "good" teachers. This is exactly the case where the comment simply describes the performed action without adding any new information.

See also "Terrible tip N53. Answer the question "what?" in code comments" in the book "60 terrible tips for a C++ developer."

Creating temporary proxy objects

The analyzer warning: V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2597

std::pmr::vector<InlineNodeId> ParseInlines() {
  std::pmr::vector<InlineNode> result;
  ....
  result.push_back(Text(text_.substr(text_start, pos_ - text_start)));
  ....
}

A temporary object of the Text type is created and placed in a container. The first idea is to use emplace_back:

result.emplace_back(Text(text_.substr(text_start, pos_ - text_start)));

However, switching to emplace_back in this case will not improve performance at all, and it will also place an extra load on the compiler because it will have to instantiate the emplace_back function template. The created object will be perfectly passed in it and moved by the move constructor.

More sophisticated code is necessary here to ensure the content is placed directly inside the container. This is an container of std::variant, so we need to instruct it which alternative we want to create and pass it's std::string_view.

Here's what it'll look like:

result.emplace_back(std::in_place_type<Text>, text.substr(....));

In this case, we can perfectly pass the hint about the constructed type and its constructor arguments. Take a look at the fifth overload of the std::variant constructor.

With this change, you'll get rid of one call to the move constructor.

This isn't a big deal, but if code contains many similar fragments, it's best to get it right from the start.

V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2610
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2632
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2650
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2733
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2751
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2754
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2855
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2931
V823 Decreased performance. Object may be created in-place in the 'quote_lines' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 5239
V823 Decreased performance. Object may be created in-place in the 'quote_lines' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 5241
V823 Decreased performance. Object may be created in-place in the 'item_lines' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 5641
V823 Decreased performance. Object may be created in-place in the 'item_lines' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 5643
V823 Decreased performance. Object may be created in-place in the 'para_lines' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 6045
V823 Decreased performance. Object may be created in-place in the 'para_lines' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 6047
V823 Decreased performance. Object may be created in-place in the 'delimiter_stack' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2729
V823 Decreased performance. Object may be created in-place in the 'delimiter_stack' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2746
V823 Decreased performance. Object may be created in-place in the 'line_offsets_' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2193
V823 Decreased performance. Object may be created in-place in the 'line_offsets_' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2196
V823 Decreased performance. Object may be created in-place in the 'line_offsets_' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2208
V823 Decreased performance. Object may be created in-place in the 'line_offsets_' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2218
V823 Decreased performance. Object may be created in-place in the 'line_offsets_' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2221
V823 Decreased performance. Object may be created in-place in the 'line_offsets_' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2229
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2637
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2947
V823 Decreased performance. Object may be created in-place in the 'result' container. Consider replacing methods: 'push_back' -> 'emplace_back'. markus.h 2958

A copy instead of a move

The analyzer warning: V820 The 'content' variable is not used after copying. Copying can be replaced with move/swap for optimization. markus.h 4670

struct Heading {
  static constexpr NodeType kType = NodeType::kHeading;
  int level = 1;  // 1-6
  std::pmr::vector<InlineNodeId> children;
  std::pmr::string raw_content;  // Temporary storage for inline parsing
};

std::optional<Heading> TryParseAtxHeading() {
  ....
  std::pmr::string content(trimmed.substr(i));
  ....
  Heading heading;
  heading.level = level;
  heading.raw_content = content;    // <=
  return heading;
}

The string in the content variable is copied, even though it could be moved, since it's no longer needed. We can write it like this:

heading.raw_content = std::move(content);

Or we could skip the temporary heading object altogether and go with something like this:

return Heading { .level = level,
                 .raw_content = std::move(content)
               };

Again, this is a minor issue, but little things like this accumulate over time.

V820 The 'info_string' variable is not used after copying. Copying can be replaced with move/swap for optimization. markus.h 4761
V820 The 'content' variable is not used after copying. Copying can be replaced with move/swap for optimization. markus.h 4762
V820 The 'content' variable is not used after copying. Copying can be replaced with move/swap for optimization. markus.h 5101
V820 The 'content' variable is not used after copying. Copying can be replaced with move/swap for optimization. markus.h 5821

Manual sunset

The analyzer warning: V837 The 'insert' function does not guarantee that arguments will not be copied or moved if there is no insertion. Consider using the 'try_emplace' function. markus.h 5274

using LinkRefMap =
    std::pmr::unordered_map<std::pmr::string,
                            std::pair<std::pmr::string, std::pmr::string>>;

std::optional<BlockQuote> TryParseBlockQuote() {
  .... 
  LinkRefMap* parent_link_refs_ = nullptr;
  ....
  for (const auto& [label, dest_title] : nested_doc.link_references) {
    if (parent_link_refs_->find(label) == parent_link_refs_->end()) {
      parent_link_refs_->insert({label, dest_title});
    }
  }
  ....
}

PVS-Studio issues a warning about inefficiency when an element with the same key is already included in the container. In our case, the protection against this was implemented manually, but it's inefficient. Two searches occur in the code:

First, we search for an element by key to check the condition.
We repeat the same operation when inserting, attempting to find the correct insertion position in the hash table.

In this case, calling try_emplace performs one search and inserts the element if it isn't already there:

for (const auto& [label, dest_title] : nested_doc.link_references) {
    parent_link_refs_->try_emplace(label, dest_title);
}

Conclusion

The conclusions are the same as those in the previous article, "Let's dg into some vibe code." No matter how you write your code, there are ways to ensure that your software is high-quality and reliable.

It's important to value architects and developers who know how to break down and design modular architectures.
Remember that security is an integral part of the system from the moment its design begins.
Praise developers who can distinguish between efficient and slow code, as well as between safe and dangerous code. Praise those who can determine whether the created code is suitable for inclusion in the project.
Use static code analyzers.
Use dynamic code analyzers and FAST analyzers for front-end applications.
Use software composition analysis tools. This is particularly relevant if you use AI, as its hallucinations can be exploited to attack supply chains.
Use coding standards. Nothing is worse than disorganized code.
Apply other SSDLC practices as needed.

DEV Community