TON block compression
Создано: 23 декабря 2024 г.
Создано: 23 декабря 2024 г.
time limit per test2 seconds
memory limit per test256 megabytes
A block is a fundamental data structure in the TON blockchain. It contains a list of transactions, messages, and the difference between the previous and next blockchain states.
To reduce network load, efficient block compression is essential. In this contest, your task is to implement a compression algorithm that will be evaluated on blocks from the real TON blockchain.
Everything in TON is composed of cells. If you participated in FunC contests, you might already be familiar with this concept. A cell is a data structure that contains up to 1023
bits and up to 4
references to other cells. A root cell, together with all cells reachable from it, forms a bag of cells. A block is represented as a bag of cells, and its structure can be formally described using TL-B schema (see links in the Notes section).
The block is serialized to a byte string using the bag of cells serialization algorithm. This byte string is the representation of the block that serves as the input data in this problem.
Interaction
For each test, your program is executed twice: once for compression and once for decompression.
First run (compression)
The first line of the input contains one word: "compress". The second line of the input contains a base64-encoded block.
Output base64-encoded compressed block.
Second run (decompression)
The first line of the input contains one word: "decompress". The second line of the input contains the base64-encoded compressed block from the output of the first run.
Output base64-encoded decompressed block.
Limits
The maximum block size is 2⋅220
bytes. The compressed block size should be at most 2⋅220
bytes.
Note: These limits refer to the size of the decoded data, not its base64-encoded representation.
Scoring
For each test, your solution will be scored based on the compression efficiency.
The decompressed block must be equal to the original block: otherwise, the solution will receive WA and zero points for that test.
If the size of the original and compressed blocks are x
and y
, respectively, then the score for that test will be 1000⋅2⋅xx+y
.
Testsets
During the contest, your solution will be evaluated on a closed preliminary test set. This set consists of various recent blocks from the TON mainnet. The results will be available on a public scoreboard.
The first 25 tests from this test set are publicly available in ton-sample-tests.zip. You can also download more blocks from the TON blockchain for local testing.
After the contest, your solution will be evaluated on a different final test set. This set will consist of blocks from the TON mainnet that will be generated after the end of the contest. For this final testing, your latest submission with non-zero score on the preliminary testing will be taken.
Note
Your solution will be linked with the C++ library ton_crypto_lib. This library is based on the main TON repository and contains utilities for working with cells, parsing blocks, and the LZ4 compression algorithm. When submitting your solution, select the language "C++17 (clang 18-64 + ton_crypto_lib)". You can find instructions for downloading and linking the library here.
Here is the solution example. It demonstrates using ton_crypto_lib to decode and encode base64, serialize and deserialize cells, LZ4-compress and decompress data.
Local build and ton_crypto_lib examples
Download ton-compress-contest-examples.zip. It contains code examples that explain TON concepts and show how to work with the library, as well as the scripts for building your solution for Linux and MacOS. Usage:
Linux: Extract ton_crypto_lib-x86_64-linux.zip, run ./run-linux.sh solution.cpp
MacOS: Extract ton_crypto_lib-x86_64-macos.zip, run ./run-macos.sh solution.cpp
Windows: Download ton-compress-contest-windows.zip with the configured MSVC++ project.
Instructions for manual configuration can be found here.
Local testing
You can use the docker image in ton-local-tester.zip to test your solutions:
Put your solution to solution.cpp
tests directory contains the 25 publicly available tests. You can add your own tests to this directory.
Install Docker and run:
To solve this problem, we'll create a C++ program that handles both compression and decompression of TON blockchain blocks using the LZ4 compression algorithm. The program will read input commands ("compress"
or "decompress"
) and perform the corresponding operation on the base64-encoded data provided.
Base64 Encoding/Decoding: Since the input and output are in base64 format, we'll need functions to decode base64 strings into binary data and encode binary data back into base64 strings.
LZ4 Compression/Decompression: We'll utilize the LZ4 library for efficient compression and decompression of the binary data.
Program Flow:
Base64 Handling: We'll implement base64 encoding and decoding functions based on standard algorithms. These functions map binary data to ASCII characters and vice versa.
LZ4 Integration: We'll use the LZ4 library's API for compression and decompression. Ensure that the LZ4 library is linked correctly during compilation.
Error Handling: The program will include basic error handling to ensure that the compression and decompression processes are successful. If any step fails, the program will exit with an error message.
Below is the complete C++ program that accomplishes the above tasks. Make sure to link the LZ4 library (-llz4
) when compiling the program.
cpp#include <bits/stdc++.h> #include <lz4.h> // Base64 encoding and decoding static const std::string base64_chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789+/"; std::string base64_encode(const std::vector<unsigned char>& bytes_to_encode) { std::string ret; int i = 0; int j = 0; unsigned char char_array_3[3]; unsigned char char_array_4[4]; int in_len = bytes_to_encode.size(); int pos = 0; while (in_len--) { char_array_3[i++] = bytes_to_encode[pos++]; if (i ==3) { char_array_4[0] = (char_array_3[0] & 0xfc) >> 2; char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4); char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6); char_array_4[3] = char_array_3[2] & 0x3f; for(i = 0; (i <4) ; i++) ret += base64_chars[char_array_4[i]]; i = 0; } } if (i) { for(j = i; j <3; j++) char_array_3[j] = '\0'; char_array_4[0] = ( char_array_3[0] & 0xfc ) >> 2; char_array_4[1] = ( ( char_array_3[0] & 0x03 ) << 4 ) + ( ( char_array_3[1] & 0xf0 ) >> 4 ); char_array_4[2] = ( ( char_array_3[1] & 0x0f ) << 2 ) + ( ( char_array_3[2] & 0xc0 ) >> 6 ); char_array_4[3] = char_array_3[2] & 0x3f; for (j = 0; (j < i +1); j++) ret += base64_chars[char_array_4[j]]; while((i++ <3)) ret += '='; } return ret; } std::vector<unsigned char> base64_decode(const std::string& encoded_string) { int in_len = encoded_string.size(); int i = 0; int j = 0; int in_ = 0; unsigned char char_array_4[4], char_array_3[3]; std::vector<unsigned char> ret; auto is_base64 = [](unsigned char c) -> bool { return (isalnum(c) || (c == '+') || (c == '/')); }; while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_])) { char_array_4[i++] = encoded_string[in_]; in_++; if (i ==4) { for (i = 0; i <4; i++) char_array_4[i] = base64_chars.find(char_array_4[i]) & 0xff; char_array_3[0] = ( char_array_4[0] << 2 ) + ( ( char_array_4[1] & 0x30 ) >> 4 ); char_array_3[1] = ( ( char_array_4[1] & 0xf ) << 4 ) + ( ( char_array_4[2] & 0x3c ) >> 2 ); char_array_3[2] = ( ( char_array_4[2] & 0x3 ) << 6 ) + char_array_4[3]; for (i = 0; (i < 3); i++) ret.push_back(char_array_3[i]); i = 0; } } if (i) { for (j = i; j <4; j++) char_array_4[j] = 0; for (j = 0; j <4; j++) char_array_4[j] = base64_chars.find(char_array_4[j]) & 0xff; char_array_3[0] = ( char_array_4[0] << 2 ) + ( ( char_array_4[1] & 0x30 ) >> 4 ); char_array_3[1] = ( ( char_array_4[1] & 0xf ) << 4 ) + ( ( char_array_4[2] & 0x3c ) >> 2 ); char_array_3[2] = ( ( char_array_4[2] & 0x3 ) << 6 ) + char_array_4[3]; for (j = 0; (j < i - 1); j++) ret.push_back(char_array_3[j]); } return ret; } int main() { std::ios::sync_with_stdio(false); std::cin.tie(nullptr); std::string mode; if (!std::getline(std::cin, mode)) { std::cerr << "Failed to read mode\n"; return 1; } std::string b64_input; if (!std::getline(std::cin, b64_input)) { std::cerr << "Failed to read input data\n"; return 1; } // Decode base64 input std::vector<unsigned char> input_data = base64_decode(b64_input); if (mode == "compress") { // Compress using LZ4 int max_compressed_size = LZ4_compressBound(input_data.size()); std::vector<char> compressed_data(max_compressed_size); int compressed_size = LZ4_compress_default( reinterpret_cast<const char*>(input_data.data()), compressed_data.data(), input_data.size(), max_compressed_size ); if (compressed_size <= 0) { std::cerr << "Compression failed\n"; return 1; } compressed_data.resize(compressed_size); // Encode to base64 std::vector<unsigned char> compressed_bytes(compressed_data.begin(), compressed_data.end()); std::string b64_output = base64_encode(compressed_bytes); std::cout << b64_output; } else if (mode == "decompress") { // Decompress using LZ4 // Since we don't know the original size, we need to make an assumption or embed size information // For simplicity, let's assume the original size is 2 * 220 bytes as per the problem limits // Adjust as necessary based on actual data // Alternatively, if size information is embedded, extract it. For this problem, let's assume maximum size. // Since the original size can be up to 2 * 2^20 bytes, set a buffer accordingly // 2 * 2^20 = 2097152 bytes int max_original_size = 2 * (1 << 20); // 2 * 1048576 = 2097152 std::vector<char> decompressed_data(max_original_size); int decompressed_size = LZ4_decompress_safe( reinterpret_cast<const char*>(input_data.data()), decompressed_data.data(), input_data.size(), max_original_size ); if (decompressed_size < 0) { std::cerr << "Decompression failed\n"; return 1; } decompressed_data.resize(decompressed_size); // Encode to base64 std::vector<unsigned char> decompressed_bytes(decompressed_data.begin(), decompressed_data.end()); std::string b64_output = base64_encode(decompressed_bytes); std::cout << b64_output; } else { std::cerr << "Unknown mode: " << mode << "\n"; return 1; } return 0; }
To compile this program, ensure that you have the LZ4 library installed on your system. You can install it using package managers or build it from source.
For Linux:
Install LZ4 Library:
bashsudo apt-get update sudo apt-get install liblz4-dev
Compile the Program:
bashg++ -std=c++17 -O2 -o ton_compress_decompress solution.cpp -llz4
For macOS:
Install LZ4 Library using Homebrew:
bashbrew install lz4
Compile the Program:
bashg++ -std=c++17 -O2 -o ton_compress_decompress solution.cpp -llz4
For Windows:
Install LZ4:
Compile the Program:
The program expects two lines of input:
"compress"
or "decompress"
.Example for Compression:
bashecho -e "compress\nSGVsbG8gV29ybGQh" | ./ton_compress_decompress
Example for Decompression:
bashecho -e "decompress\n...base64_compressed_data..." | ./ton_compress_decompress
Assumptions:
2 * 2^20
bytes (as per the problem constraints). If the actual size differs, you might need to adjust the buffer size or embed size information within the compressed data.Error Handling:
stderr
and exit with a non-zero status code.Performance:
stdio
and untieing cin
and cout
. Ensure that your environment supports these optimizations.Base64 Implementation:
You can test the program locally using the provided examples and ensure that compression followed by decompression returns the original data.
bash# Original data echo -n "Hello World!" | base64 # Outputs: SGVsbG8gV29ybGQh # Compress echo -e "compress\nSGVsbG8gV29ybGQh" | ./ton_compress_decompress # Outputs compressed base64 string # Decompress echo -e "decompress\n<compressed_base64>" | ./ton_compress_decompress # Should output: SGVsbG8gV29ybGQh
Ensure that after decompression, the original data is accurately retrieved.
This solution efficiently handles the compression and decompression of TON blockchain blocks using LZ4, adhering to the problem's constraints and requirements. Make sure to test the program thoroughly with various inputs to ensure correctness and robustness.