Anastasiia Ogneva

Posted on Nov 1, 2023

A deep look into YTsaurus. Availability, reliability, open source

#cpp #programming #opensource

We have checked YTsaurus with the PVS-Studio analyzer — let's see the results of this check and the errors found. It's been over half a year since YTsaurus, a powerful BigData system, became open source. It has been developed to assist companies in expanding their IT infrastructure and tackling business initiatives. These days, YTsaurus is a frequent topic for discussion. According to GitHub statistics, the project is still growing in popularity. That is why we are here to inspect it.

Meet YTsaurus
In fact, YTsaurus is frequently and extensively discussed at various conferences which our team attends. For example, we recently attended the HighLoad++ and TeamLead Conf conferences, and YTsaurus-related topics were everywhere.

Considering the popularity of the project and my interests in related fields, I started exploring it. YTsaurus, being an open-source solution available to every user, happens to have a complete documentation, and I turned to it. The home page has both a brief description of the project and a full-fledged Overview section. It explains the YTsaurus purpose, describes key features of the platform and the structure of the YTsaurus system.

So, if you're curious, read the section in full — it's going to be even more interesting. Well, I will not say anything new about the project, except for the results of the code check, of course. By the way, the project source code is available here.

Even before I finished checking the project, GitHub was stuffed with new commits. This is a clear indication that the project is under active development.

The commit I cloned the project on was 55cd85f.

This article is not intended to devalue the labor of the YTsaurus developers. The aim is to popularize static code analyzers, which can be of great assistance even for high-quality and successful projects.

Difficulties we faced
We didn't, actually. "Why have such a catchy headline?", you may ask.

I would like to express my special thanks to the developers of the project! What detailed instructions YTsaurus has! And how simple YTsaurus is to deploy. I do think that even children would be able to deploy YTsaurus if they wanted to.

Well, let's go straight to what we are all looking forward to — code analysis.

In the beginning, there was nothing...

...but a bunch of different warnings.

If the project is large enough, the analyzer may find many suspicious code fragments. This may spoil the first impression of the tool. A scared user will immediately uninstall the analyzer from their machine.

How can we avoid it?

It is necessary to fine-tune the analyzer before starting the analysis. For example, enable the most relevant diagnostic groups and disable the less relevant ones. There is no point in enabling MISRA/AUTOSAR diagnostic groups if your code is not required to comply with these standards.
After you start the analyzer for the first time, you can display the most interesting warnings. This will help you start working with the report much faster and easier.
All these warnings must be handled in a certain way to get all the benefits of regular code analysis. There are several solutions. Our team provides a mechanism for suppressing warnings with suppress files. Once all warnings have been suppressed, the next time the analysis is run, the report will have no warnings.
Developers do not need to fully analyze the entire project on their machines, as there are usually hundreds of files being changed. The incremental analysis mode is a great way to analyze only the files that have been changed. In this mode, the analyzer issues warnings only for newly written code, and they are few. If the warning is false, it can be suppressed directly.
Full project analysis can be integrated into CI during nightly builds. If you employ the pull/merge request model in project development, you can also integrate the analysis into your development process.
Sometimes it's worth revisiting old warnings, pulling them out of suppress files, and fixing the code.

When checking projects from time to time, we usually don't go through all of these steps. However, we do perform some sort of configuration before starting the analysis: we exclude third-party libraries, enable only General Analysis and micro-optimizations diagnostic groups. Sometimes even the finely tuned analyzer may issue a lot of warnings, but it didn't take me long to analyze the errors found in YTsaurus.

While reviewing and examining the analysis results, I found many curious and not-so-curious code snippets that I want to share with you.

Analysis results

Fragment N1

TAsyncSignalsHandler *SIGNALS_HANDLER = nullptr;

void SetAsyncSignalHandler(int signum, TAutoPtr<TEventHandler> handler)
{
  static TAdaptiveLock lock;

  if (Y_UNLIKELY(SIGNALS_HANDLER == nullptr))       // N1
  {
    TGuard dnd(lock);

    if (SIGNALS_HANDLER == nullptr)                 // N2
    {
      // NEVERS GETS DESTROYED
      SIGNALS_HANDLER = new TAsyncSignalsHandler(); // N3
    }
  }

  SIGNALS_HANDLER->Install(signum, handler);        // N4
}

The analyzer warning: V547 Expression 'SIGNALS_HANDLER == nullptr' is always true. async_signals_handler.cpp:200

Although the analyzer issued the V547 warning (it didn't get the custom locking mechanism quite right), I detected a curious example of the Double-checked locking pattern.

What can go wrong? Firstly, we have a small chance that the compiler may cache the value of the SIGNALS_HANDLER variable in a register and not read the new value again. At least if unusual locking is used. The best solution is to declare the variable volatile or use std::atomic.

That's not all. Let's figure out what's wrong in this case. Let us assume that the A and B threads operate with this code.

The A thread is the first to reach the N1 point. The pointer is null, so the control flow reaches the first if and catches the lock object.

The A thread reaches the N2 point. The pointer is also null, so the control flow reaches the second if.

Then, a little magic can happen at the N3 point. The pointer initialization can be roughly represented as follows:

auto tmp = (TAsyncSignalsHandler *) malloc(sizeof(TAsyncSignalsHandler));
new (tmp) TAsyncSignalsHandler();
SIGNALS_HANDLER = tmp;

A clever processor can reorder the instructions, and SIGNALS_HANDLER will be initialized before placement new is called.

And then the B thread joins the battle — exactly when the A thread hasn't yet called placement new. The control flow enters the N1 point, detects the initialized pointer, and moves to the N4 point.

In the B thread, the Install member function is called on the object that has not yet started its lifetime. Undefined behavior...

How to fix this pattern? We need to make reading and writing to SIGNALS_HANDLER be executed as atomic operations:

std::atomic<TAsyncSignalsHandler *> SIGNALS_HANDLER { nullptr };

void SetAsyncSignalHandler(int signum, TAutoPtr<TEventHandler> handler)
{
  static TAdaptiveLock lock;

  auto tmp = SIGNALS_HANDLER.load(std::memory_order_acquire); // <=

  if (Y_UNLIKELY(tmp == nullptr))
  {
    TGuard dnd(lock);

    tmp = SIGNALS_HANDLER.load(std::memory_order_relaxed);    // <=
    if (tmp == nullptr)
    {
      // NEVERS GETS DESTROYED
      tmp = new TAsyncSignalsHandler();
      SIGNALS_HANDLER.store(tmp, std::memory_order_release);  // <=
    }
  }

  tmp->Install(signum, handler);
}

Fragment N2

TSimpleTM& TSimpleTM::Add(EField f, i32 amount)
{
  ....
  case F_YEAR:
  {
    i32 y = amount + (i32)Year;
        y = ::Min<i32>(Max<i32>(y, 0), 255 /*max year*/);

    // YDay may correspond to different MDay if
    // it's March or greater and the years have different leap status
    if (Mon > 1)
    {
      YDay += (i32)LeapYearAD(RealYear()) - (i32)LeapYearAD(RealYear()); // <= 
    }

    Year = y;
    IsLeap = LeapYearAD(RealYear());
    return RegenerateFields();
  }
  ....
}

The analyzer warning: V501 There are identical sub-expressions '(i32) LeapYearAD(RealYear())' to the left and to the right of the '-' operator. datetime.cpp:154:1

It seems that left and right expressions should not be identical. But they are, and it results in YDay += 0 always occurring.

Sure, we don't know the whole context and therefore can't say with certainty what should be going on in that line and what code should be there. But in this case, the analyzer is right.

I guess the developer got sidetracked while coding.

Fragment N3

bool operator == (const TCompositePendingJobCount& lhs,
                  const TCompositePendingJobCount& rhs)
{
  if (lhs.DefaultCount != rhs.DefaultCount)
  {
    return false;
  }

  if (lhs.CountByPoolTree.size() != lhs.CountByPoolTree.size()) // <=
  {
    return false;
  }

  for (const auto& [tree, lhsCount] : lhs.CountByPoolTree)
  {
    ....
  }
  return true;
}

The analyzer warning: V501 There are identical sub-expressions 'lhs.CountByPoolTree.size()' to the left and to the right of the '!=' operator. structs.cpp:191

This is another error caused by distraction. The size of the same container is being compared, which is why the expression is always false.

The correct code looks like this:

if (lhs.CountByPoolTree.size() != rhs.CountByPoolTree.size()){....}

Actually, comparisons often run into errors. Not all of these errors detected by the analyzer are described in this article. Errors in comparisons are thoroughly described in my colleague's article: "The evil within the comparison functions". I strongly recommend you read it.

Fragment N4

template <class TOptions>
class TSimpleOperationCommandBase
  : public virtual TTypedCommandBase<TOptions>
{
....
public:
  TSimpleOperationCommandBase()
  {
    ....
    if (!OperationId.IsEmpty() &&  OperationAlias.operator bool() ||
         OperationId.IsEmpty() && !OperationAlias.operator bool()  )
    {
      ....
    }
    ....
  }
};

The analyzer warning: V728 – Message: An excessive check can be simplified. The '(A && !B) || (!A && B)' expression is equivalent to the 'bool(A) != bool(B)' expression. scheduler_commands.h:36:1

It may be easier for the programmer who wrote the code to perceive the code above. Basically, this expression implies the use of the mutually exclusive OR operator for two Boolean operands. In C and C++, in addition to the bitwise mutually exclusive OR operator, there is also an implicit logical comparison. This comparison is implemented via the "!=" operator, and we can use it to rewrite the expression in the following way:

if (OperationId.IsEmpty() != OperationAlias.operator bool())

The mutually exclusive OR operator returns true only when operands are unequal. And this is exactly the case where we need the inequality (!=) operator. As a result, the code becomes more concise.

Here is... another example.

Fragment N5

int TComparator::CompareKeyBounds(const TKeyBound& lhs,
                                  const TKeyBound& rhs, 
                                  int lowerVsUpper) const
{
  ....
  // Prefixes coincide. Check if key bounds are indeed at the same point.
  {
    auto lhsInclusivenessAsUpper = (lhs.IsUpper && lhs.IsInclusive) ||
                                   (!lhs.IsUpper && !lhs.IsInclusive); 
    auto rhsInclusivenessAsUpper = (rhs.IsUpper && rhs.IsInclusive) ||
                                   (!rhs.IsUpper && !rhs.IsInclusive); 
    if (lhsInclusivenessAsUpper != rhsInclusivenessAsUpper)
    {
      return lhsInclusivenessAsUpper - rhsInclusivenessAsUpper;
    }
  }
  ....
}

The analyzer warning: V728 – Message: An excessive check can be simplified. The '(A && B) || (!A && !B)' expression is equivalent to the 'bool(A) == bool(B)' expression. comparator.cpp:194:1

This code can be shortened to:

auto lhsInclusivenessAsUpper = lhs.IsUpper == lhs.IsInclusive;
auto rhsInclusivenessAsUpper = rhs.IsUpper == rhs.IsInclusive;

Such warnings may be frequent in any project (YTSaururs has eight of them). It's up to the developers to decide what to do with them. Some may think that the version before shortening is clearer, others say the shortened code is better.

That's mostly about personal preferences. However, the PVS-Studio analyzer can detect such code, which is certainly handy. If you don't need such warnings in your code, you can always disable them.

The full article is here

DEV Community

A deep look into YTsaurus. Availability, reliability, open source

In the beginning, there was nothing...

Analysis results

Top comments (0)

Read next

Exploring Graph Databases: Neo4j

Conditional Expressions

The story behind GitHub logo

Rust Cheatsheet