Michael Mirosnichenko

Posted on Dec 15, 2021

ReFS file system structure and data recovery algorithm

#beginners #testing #tutorial #security

ReFS or Resilient File System is a new file system based on NTFS code. As any system of the kind, it has both advantages and disadvantages, but the essential fact is that ReFS is meant to address the major issues that NTFS suffers from. It is more resilient to data damage, can handle heavy workloads better and is easily scalable for very big file systems.

Introduction

The new file system, ReFS, is the product of further development of its predecessor, NTFS. It supports reparse points, the technology which was previously included into NTFS only. The reparse points help to implement support for symbolic links and mounting points in Windows.

Main functions:

Metadata integrity with checksums.
Integrity streams: the method of writing data to disk for additional protection of information in cases when a part of the disk gets damaged.
Allocate on write transactional model (also known as copy on write).
Higher size limits for partitions (volumes), files and directories.
Storage pooling and virtualization for easier creation of volumes and file system management.
Segmentation of serial data known as data striping for better performance, redundancy for fault tolerance.
Support of background disk cleaning known as disk scrubbing for protection against latent disk errors.
Data salvage around the damaged area of the disk.
Shared storage pools across machines for additional failure tolerance and load balancing.
Compatible with the widely used features of NTFS.
Data verification and auto-correction.
Maximal scalability.
The file system cannot be disabled due to isolation of bad blocks sectors.
Flexible architecture using the Storage Spaces feature which was designed and implemented specifically for ReFS.

In addition, ReFS inherits many features from NTFS, including BitLocker encryption, access control lists – ACL, USN journal, changes notifications, symbolic links, junction points, mount points, reparse points, volume snapshots, file IDs and oplocks.

Of course, the data from ReFS will be available for clients through the same APIs currently used in all operating systems to access partitions formatted in NTFS.

Peculiarities

YouTube:

Peculiarities of ReFS file system:

The file system uses checksums for metadata, and it can also use checksums for file data. When reading or writing a file, the system examines the checksum to make sure it is correct. In this way, data distortion can be monitored in real-time mode.

If the file system detects damaged data which has no alternative copy for recovery, ReFS will remove such data from the disk immediately. In such case, you don’t need to restart the computer or disconnect the media – which is required if you’re using NTFS.

You no longer need to use the chkdsk utility as the file system is corrected automatically the moment an error appears. The new system is also resilient to other cases when data becomes corrupt.

Better reliability for data storage. ReFS uses B+ trees for all on-disk structures, including metadata and file data. The file size, number of files in a folder, total volume size and number of folders in a volume are limited by 64-bit numbers. Free disk space is counted by a hierarchical allocator which includes three separate tables for large, medium, and small chunks. File name and path name length is limited to 32 Kibibytes of Unicode characters.

The new file system is also more resilient to damage that can be caused to your data in any other way. For example, when you update file metadata – for example, a file name – NTFS will edit the file metadata directly. If your computer breaks down, crashes or there is a power cut in the middle of the process, data could get damaged. On the contrary, when you update file metadata in ReFS, it will create a new copy of metadata, and the updated metadata will be assigned to the file only after all new information is written. This way, there is no danger for the file metadata to become corrupt. This approach is known as Copy-on-write.

ReFS is integrated with the virtualization technology known as Storage Spaces, which enables mirroring and combining several physical storage devices within one computer or network.

However, this file system doesn’t support named streams, short names, compression and encryption at the file level, Encrypting File System, as well as NTFS transactions, hard links, extended attributes, and disk quotas.

How it differs from NTFS

ReFS is newer and supports larger volumes and longer file names than NTFS. In the long-term prospect, these are very important developments.

In NTFS file paths are limited to 255 characters. Meanwhile, ReFS supports over 30 thousand characters (32 768) in a file name.

NTFS has a theoretical maximum capacity of 16 exabyte, while ReFS boasts the unbelievable 262 144 exabyte. Most of the time, it doesn’t change the current situation too much, but it’s a good reserve for the future.

In ReFS you won’t find some of NTFS functions such as data compression, encrypting file system, hard links, extended attributes, data deduplication and disk quotas. Nevertheless, ReFS is compatible with various features. For example, if you can’t encrypt certain data at the file system level, ReFS still supports BitLocker encryption.

Windows 10 won’t let you format any partition into ReFS, and at the moment, ReFS can be used only for storage spaces where its features help protect your data from any damage. In Windows Server 2016, you can format volumes with ReFS instead of NTFS. However, you can’t use ReFS for a boot volume, as Windows can only boot from an NTFS disk.

These days, ReFS is only used on server versions of Windows and on Windows Enterprise (also known as LTSC).

File system architecture
In spite of ReFS and NTFS often mentioned as being similar, the actual thing they share is compatibility of some metadata structures. The way how ReFS disk structure is implemented differs completely from other Microsoft file systems.

The main structural elements of the new file system are B+ trees. All elements of the file system structure can be of single-level (leaves) or multi-level (trees) type. Such approach allows for greater scalability for almost any element of the file system. Together with real 64-bit addressing for all system elements, it excludes possible bottlenecks if the file system is to be scaled any further.

In addition to the B+ tree root record, all other records have the metadata block size of 16 KB. Intermediate (address) nodes have a small size (about 60 bytes). That is why we usually need a small number of tree levels to describe even very large structures, which certainly improves overall system performance.

The main structural element of the file system is the Directory presented in the form of a B+ tree with the key as a number of the folder object. Contrary to other similar file systems, a file in ReFS is not a separate key element of the Directory but it only exists as a record in the folder which contains it. Perhaps, this architectural feature explains why ReFS doesn’t support hard links.

Leaf directories are typified records. For a folder object, there are three main types of records: a directory descriptor, an index record and a nested object descriptor. All such records are packaged as a separate B+ tree with a folder identifier. The root of this tree is a leaf of the Directory B+ tree. It allows to pack almost any number of records. At the lower level in the leaves of the B+ tree there is primarily a directory descriptor record containing basic information about the directory such as name, standard information, file name attribute etc.

Further in the directory are the so-called index entries: short structures with directory elements’ data. Compared with NTFS, these records are considerably shorter which means the volume has to store less metadata. The last elements are directory items’ records. For folders these elements contain the name of the folder as well as the folder identifier in the Directory and the structure of the standard information. For files, the identifier is missing but instead, the structure contains all the basic data about the file including file fragments of the root tree. Hence, a file can consist of almost any number of fragments (chunks).

Files on disk are located in 64KB blocks. They are addressed in exactly the same way as metadata blocks (in 16 KB clusters). The residency of file data on ReFS is not supported so a file of 1 byte on disk will take up a whole block of 64 KB which results in significant redundancy of storage on small files. On the other hand, it simplifies the management of free space and a new file allocation process is much faster.

The metadata size of an empty file system is about 0.1% of the size of the file system itself (i.e., about 2 GB on a 2 TB volume). Some basic metadata is duplicated which improves failure resilience.

ReFS file system structure

You can identify a file system as ReFS by the following signature at the beginning of the partition:


        00 00 00 5265 46 53 0000 00 00 0000 00 00 00 ...ReFS.........
        46 53 52 53XX XX XX XXXX XX XX XXXX XX XX XX FSRS

ReFS pages are 0x4000 bytes in length.

On all inspected systems, the first page number is 0x1e (0x78000 bytes after the start of the partition containing the file system). This is in line with Microsoft documentation which states that the first metadata directory is at a fixed offset on the disk.

Other pages contain various system, directory, and volume structures and tables as well as journaled versions of each page.

The first byte of each page is its page number.

The first 0x30 bytes of every metadata page form the Page Header which looks as follows:


        byte0: XX XX 00 0000 00 00 00YY 00 00 0000 00 00 00
        byte16: 00 00 00 0000 00 00 00ZZ ZZ 00 0000 00 00 00
        byte32: 01 00 00 0000 00 00 0000 00 00 0000 00 00 00

dword 0 (XX XX) is the page number which is sequential and corresponds to the 0x4000 offset of the page;

dword 2 (YY) is the journal number or sequence number;

dword 6 (ZZ ZZ) is the Virtual Page Number, which is non-sequential

The Object Table, virtual page number 0x02 associates object identifiers with the pages on which they reside. Here we can see AttributeList consisting of records of Key / Value pairs.

We can use them to look up the object ID of the root directory and retrieve the page where it resides:


        50 00 00 00 10 00 10 00 00 00 20 00 30 00 00 00 – total length / key and value borders
        00 00 00 00 00 00 00 00 00 06 00 00 00 00 00 00 – object identifier
        F4 0A 00 00 00 00 00 00 00 00 02 08 08 00 00 00 – page identifier / flags
        CE 0F 85 14 83 01 DC 39 00 00 00 00 00 00 00 00 – checksum
        08 00 00 00 08 00 00 00 04 00 00 00 00 00 00 00

The object table entry for the root directory, containing its page (0xAF4)

When retrieving pages by ID or virtual page number, look for the ones with the highest sequence number as those are the latest copies of the shadow-write mechanism.

Directories, from the root directory down, follow a consistent pattern. They are comprised of sequential lists of data structures whose length is determined by the first word value (attributes and attribute lists).

List are often prefixed with a header attribute defining the total length of the attributes that follow, which make up the list.

In either case, attributes may be parsed by iterating over the bytes after the directory page header, reading and processing the first word to determine the next number of bytes to read.

Various attributes take on different semantics including references to subdirectories and files as well as branches to additional pages containing more directory contents.

The structures in a directory listing have one of the following formats:

Base Attribute

The simplest basic attribute consisting of a block whose length is given at the very start.

Below, there is an example of a typical attribute:


        a8 00 00 0028 00 01 0000 00 00 0010 01 00 00
        10 01 00 0002 00 00 0000 00 00 0000 00 00 00
        00 00 00 0000 00 00 00a9 d3 a4 c327 dd d2 01
        5f a0 58 f327 dd d2 015f a0 58 f327 dd d2 01
        a9 d3 a4 c327 dd d2 0120 00 00 0000 00 00 00
        00 06 00 0000 00 00 0003 00 00 0000 00 00 00
        5c 9a 07 ac01 00 00 0019 00 00 0000 00 00 00
        00 00 01 0000 00 00 0000 00 00 0000 00 00 00
        00 00 00 0000 00 00 0000 00 00 0000 00 00 00
        00 00 00 0000 00 00 0001 00 00 0000 00 00 00
        00 00 00 0000 00 00 00

Here you can find a section of 0xA8 length containing the following four file timestamps. See more below:


        a9 d3 a4 c327 dd d2 01 - 2017-06-04 07:43:20
        5f a0 58 f327 dd d2 01 - 2017-06-04 07:44:40
        5f a0 58 f327 dd d2 01 - 2017-06-04 07:44:40
        a9 d3 a4 c327 dd d2 01 - 2017-06-04 07:43:20

It is safe to assume that either:

one of the first fields in any given attribute contains an identifier detailing how the attribute should be parsed, or
the context is given by the attribute’s position in the list.
attributes corresponding to the given meaning are referenced by this address or identifier

Records

Key / Value pairs – their values are given in the first 0x20 bytes of the attribute. These are used for associated metadata sections with files whose names are recorded in the keys and contents are recorded in the value.