Here, I'm going to show you how you can keep track of file changes in a directory and store them in Reduct Storage by using its C++ client SDK. You can find the full working example here.
Running Reduct Storage
If you're a Linux user, the easiest way to run the storage engine is Docker. This is an example of a docker-compose.yml
file:
services:
reduct-storage:
image: reductstorage/engine:v1.0.1
volumes:
- ./data:/data
environment:
RS_LOG_LEVEL: DEBUG
ports:
- 8383:8383
You also can download binaries and run them:
RS_DATA_PATH=./data reduct-storage
If everything is ok, you should see Web Console on http://127.0.0.1:8383.
Installing Reduct Storage SDK for C++
Currently, you can only build and install the library manually. Follow this instruction.
File Watcher in C++
The SDK provides cmake find script, so you can easily integrate it in your CMake project. This is the example of your CMakeLists.txt:
cmake_minimum_required(VERSION 3.23)
project(file_watcher_example)
set(CMAKE_CXX_STANDARD 20)
find_package(ReductCpp 1.0.1)
find_package(ZLIB)
find_package(OpenSSL)
add_executable(file_watcher main.cc)
target_link_libraries(file_watcher
${REDUCT_CPP_LIBRARIES} ${ZLIB_LIBRARIES}
OpenSSL::SSL OpenSSL::Crypto)
Now we're ready to write C++ code. Our main.cc
file:
#include <reduct/client.h>
#include <filesystem>
#include <fstream>
#include <iostream>
#include <map>
#include <regex>
#include <thread>
constexpr std::string_view kReductStorageUrl = "http://127.0.0.1:8383";
constexpr std::string_view kWatchedPath = "./";
namespace fs = std::filesystem;
int main() {
using ReductClient = reduct::IClient;
using ReductBucket = reduct::IBucket;
auto client = ReductClient::Build(kReductStorageUrl);
auto [bucket, err] = client->GetOrCreateBucket(
"watched_files", ReductBucket::Settings{
.quota_type = ReductBucket::QuotaType::kFifo,
.quota_size = 100'000'000, // 100Mb
});
if (err) {
std::cerr << "Failed to create bucket" << err << std::endl;
return -1;
}
std::cout << "Create bucket" << std::endl;
std::map<std::string, fs::file_time_type> file_timestamp_map;
for (;;) {
for (auto& file : fs::directory_iterator(kWatchedPath)) {
bool is_changed = false;
// check only files
if (!fs::is_regular_file(file)) {
continue;
}
const auto filename = file.path().filename().string();
auto ts = fs::last_write_time(file);
if (file_timestamp_map.contains(filename)) {
auto& last_ts = file_timestamp_map[filename];
if (ts != last_ts) {
is_changed = true;
}
last_ts = ts;
} else {
file_timestamp_map[filename] = ts;
is_changed = true;
}
if (!is_changed) {
continue;
}
std::string alias = filename;
std::regex_replace(
alias.begin(), filename.begin(), filename.end(), std::regex("\\."),
"_"); // we use filename as an entyr name. It can't contain dots.
std::cout << "`" << filename << "` is changed. Storing as `" << alias
<< "` ";
std::ifstream changed_file(file.path());
if (!changed_file) {
std::cerr << "Failed open file";
return -1;
}
auto file_size = fs::file_size(file);
auto write_err = bucket->Write(
alias, std::chrono::file_clock::to_sys(ts),
[file_size, &changed_file](ReductBucket::WritableRecord* rec) {
rec->Write(file_size, [&](size_t offest, size_t size) {
std::string buffer;
buffer.resize(size);
changed_file.read(buffer.data(), size);
std::cout << "." << std::flush;
return std::pair{offest + size <= file_size, buffer};
});
});
if (write_err) {
std::cout << " Err:" << write_err << std::endl;
} else {
std::cout << " OK (" << file_size / 1024 << " kB)" << std::endl;
}
}
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
return 0;
}
Okay, it has quite many lines but don't worry this is a simple program. Let's look at the code in detail.
Creating a Bucket
To start writing to the database, we must create a bucket:
auto client = ReductClient::Build(kReductStorageUrl);
auto [bucket, err] = client->GetOrCreateBucket(
"watched_files", ReductBucket::Settings{
.quota_type = ReductBucket::QuotaType::kFifo,
.quota_size = 100'000'000, // 100Mb
});
if (err) {
std::cerr << "Failed to create bucket" << err << std::endl;
return -1;
}
Here we build a client which should use a storage engine with the kReductStorageUrl
URL. Then we create a bucket with the watched_files
name or get an existing one. Pay attention, we provide some settings as well to limit it size with 100Mb, so that the storage engine starts removing old data when we reach this quota.
The SDK doesn't throw any exceptions. Each method returns reduct::Error
or reduct::Result<T>
, so you can easily check the result in your code and print error messages.
Watching Files
We implement the file watcher in a straightforward way:
std::map<std::string, fs::file_time_type> file_timestamp_map;
for (;;) {
for (auto& file : fs::directory_iterator(kWatchedPath)) {
bool is_changed = false;
// check only files
if (!fs::is_regular_file(file)) {
continue;
}
const auto filename = file.path().filename().string();
auto ts = fs::last_write_time(file);
if (file_timestamp_map.contains(filename)) {
auto& last_ts = file_timestamp_map[filename];
if (ts != last_ts) {
is_changed = true;
}
last_ts = ts;
} else {
file_timestamp_map[filename] = ts;
is_changed = true;
}
if (!is_changed) {
continue;
}
// Storing a changed file...
std::this_thread::sleep_for(
std::chrono::milliseconds(100));
}
We travel through a given directory fs::directory_iterator(kWatchedPath)
and keep the last modification time of each file in the file_timestamp_map
map. If it is new (wasn't in the map) or it is changed (timestamp is different), we set the is_changed
flag to start storing the changed file.
Don't forget to sleep a while at the end of each cycle to avoid overloading a CPU.
Storing Files
A history of a file is represented as an entry in Reduct Storage. Because an entry name can't have "." we should replace them in our file names:
std::string alias = filename;
std::regex_replace(
alias.begin(), filename.begin(), filename.end(), std::regex("\\."),
"_"); // we use filename as an entyr name. It can't contain dots.
std::cout << "`" << filename << "` is changed. Storing as `" << alias
<< "` ";
Then we open the changed file for reading:
std::ifstream changed_file(file.path());
if (!changed_file) {
std::cerr << "Failed open file";
return -1;
}
And write it chunkwise to the storage engine:
auto file_size = fs::file_size(file);
auto write_err = bucket->Write(
alias, std::chrono::file_clock::to_sys(ts),
[file_size, &changed_file](ReductBucket::WritableRecord* rec) {
rec->Write(file_size, [&](size_t offest, size_t size) {
std::string buffer;
buffer.resize(size);
changed_file.read(buffer.data(), size);
std::cout << "." << std::flush;
return std::pair{offest + size <= file_size, buffer};
});
});
As you can see, it's quite verbose, but we send files with little chunks, and we can send terabytes without any worries about memory! If you put a huge file into your watched directory, you can see how fast Reduct Storage is.
Getting Data
You can get the data by using Bucket::Query method. You also can use Python or JavaScript Client SDKs, or even wget
:
wget http://127.0.0.1/api/v1/b/watched_files/<File-Name>
I hope it was helpful! Thanks!
Top comments (0)