A summary of achievements made through the Google Summer of Code program with CNCF on the TiKV project
The project proposal I had submitted aimed to take TiKV closer to full cloud native support. A vague understanding of the project allowed me to put forward the idea of supporting cloud based data stores such as AWS S3 to the TiKV storage engine, i.e. rocksdb, by making use of the rocksdb-cloud library. The ideas put forward by the team at TiKV also helped to better formulate a plan that has been worked on through various PRs to the sister projects that associate with the storage component of the Database. The main issue being worked on was initiated well before the GSoC project started and can be found at tikv/tikv#6506.
during the community boding period, Yi Wu(@yiwu-arbug) ideated further on the project with Liu Wei(@little-wallace), who compiled a document with details about the desired results from the project, while I was making use of the time to go through the documentation and do compilations, tests, etc, getting familiarized with the code as well as resulting in a few PRs to fix write-ups at tikv#6740 and tikv/website#161.
I also started to maintain a GitHub project to organise the work plans and provide direction to my GSoC work, you can find a list of memos(weekly reports) that have been written by me to help document work done on a weekly basis as well as the sub-tasks and other achievements made over the stretch of the project timeline. In the meantime I had also updated details relating to the TiKV project on the CNCF landscape by making a PR, cncf/landscape#1677
This is a diagram that concisely depicts my understanding of how the task was to achieve cloud nativity, how different it really was from the task can be depicted in another diagram
Phase 1
While working on the refined proposal mentioned earlier I ended up contributing tikv#8066, which updated the rusoto crate and utilised a single interface for identity management, leading to clearing up a few lines of code. The merger of this PR provided me a boost, but an update from Liu alerted me to how off track I had been with respect to the proposal document. Considering Liu was busy and unable to provide the necessary guidance for me to get started in contributing to the project, it was decided that a better plan had to be put in place and thus Yi took back full charge of the project and drafted the issue tikv/rust-rocksdb#514 which we utilised most of June-July to work on.
There were a lot of firsts for me in this, including my first try at writing a CMakeLists.txt
and creating a CPP-rust FFI interface, which essentially contributed to tikv/rust-rocksdb#517. Through July the project started to catch pace as Yi helped me in figuring out how to solve issues and experiment with various methods of achieving the allotted tasks, there were a lot of new learnings from the project. The editions made to sister projects also include tikv/rocksdb#181 which provides a solution to the problems in compiling changes added by rust-rocksdb#517. The changes related to the #517 also includes the creation of the 6.4.cloud
branch in tikv/rocksdb. Most of these PRs were merged at the end of July, closing phase 1 of the project.
Phase 2
The second stage started with Yi Wu opening tikv#8367 to provide checkpoints for adding the newly availed rust-rocksdb cloud
feature to tikv. I have opened a PR tikv#8383 to solve the same and have achieved the task to a respectable point. This is where we found out that the code doesn't really build as we intended it to, issues largely with linking the cloud code with the AWS-CPP-SDK and to some extent the compilation of cloud features that merged with #517, solving which I have opened rust-rocksdb#529 and a few associated PRs in tikv/rocksdb, but this will take a lot more work.
Phase 3
The final stage of the project had been initiated on the back of building in S3 as a storage engine for TiKV. The test phase as I'd like to refer to it is focused on providing documentation of the observable performance benchmarks for S3, achieved using the fio tool, one can find the related findings as follows:
- random-read performance with block_size equal to 4k, 8k, 16k, up to 1M.
- sequential-write performance for block_size=1M , with number of concurrent jobs 1,2,4 and 8.
Pull Requests made to various TiKV repositories
Plans from here on
I plan to take the project to compilation as the hurdles we are currently facing don't seem to be insurmountable. But I also intend to get back to academics for now, I am currently a senior at college and as part of my final year project, I intend to work on the learning I have attained from this project and apply it in the multimedia field, hope to keep everyone updated on it in the coming days!
If you are interested in helping out with the project and taking it forward, please consider giving the GitHub Project a look.
Top comments (0)