TON block compression

Question

time limit per test2 seconds
memory limit per test256 megabytes
A block is a fundamental data structure in the TON blockchain. It contains a list of transactions, messages, and the difference between the previous and next blockchain states.

To reduce network load, efficient block compression is essential. In this contest, your task is to implement a compression algorithm that will be evaluated on blocks from the real TON blockchain.

Everything in TON is composed of cells. If you participated in FunC contests, you might already be familiar with this concept. A cell is a data structure that contains up to 1023
bits and up to 4
references to other cells. A root cell, together with all cells reachable from it, forms a bag of cells. A block is represented as a bag of cells, and its structure can be formally described using TL-B schema (see links in the Notes section).

The block is serialized to a byte string using the bag of cells serialization algorithm. This byte string is the representation of the block that serves as the input data in this problem.

Interaction
For each test, your program is executed twice: once for compression and once for decompression.

First run (compression)

The first line of the input contains one word: "compress". The second line of the input contains a base64-encoded block.

Output base64-encoded compressed block.

Second run (decompression)

The first line of the input contains one word: "decompress". The second line of the input contains the base64-encoded compressed block from the output of the first run.

Output base64-encoded decompressed block.

Limits

The maximum block size is 2⋅220
bytes. The compressed block size should be at most 2⋅220
bytes.

Note: These limits refer to the size of the decoded data, not its base64-encoded representation.

Scoring
For each test, your solution will be scored based on the compression efficiency.

The decompressed block must be equal to the original block: otherwise, the solution will receive WA and zero points for that test.

If the size of the original and compressed blocks are x
and y
, respectively, then the score for that test will be 1000⋅2⋅xx+y
.

Testsets

During the contest, your solution will be evaluated on a closed preliminary test set. This set consists of various recent blocks from the TON mainnet. The results will be available on a public scoreboard.
The first 25 tests from this test set are publicly available in ton-sample-tests.zip. You can also download more blocks from the TON blockchain for local testing.
After the contest, your solution will be evaluated on a different final test set. This set will consist of blocks from the TON mainnet that will be generated after the end of the contest. For this final testing, your latest submission with non-zero score on the preliminary testing will be taken.
Note
Your solution will be linked with the C++ library ton_crypto_lib. This library is based on the main TON repository and contains utilities for working with cells, parsing blocks, and the LZ4 compression algorithm. When submitting your solution, select the language "C++17 (clang 18-64 + ton_crypto_lib)". You can find instructions for downloading and linking the library here.

Here is the solution example. It demonstrates using ton_crypto_lib to decode and encode base64, serialize and deserialize cells, LZ4-compress and decompress data.

Local build and ton_crypto_lib examples

Download ton-compress-contest-examples.zip. It contains code examples that explain TON concepts and show how to work with the library, as well as the scripts for building your solution for Linux and MacOS. Usage:

Linux: Extract ton_crypto_lib-x86_64-linux.zip, run ./run-linux.sh solution.cpp
MacOS: Extract ton_crypto_lib-x86_64-macos.zip, run ./run-macos.sh solution.cpp
Windows: Download ton-compress-contest-windows.zip with the configured MSVC++ project.
Instructions for manual configuration can be found here.

Local testing

You can use the docker image in ton-local-tester.zip to test your solutions:

Put your solution to solution.cpp
tests directory contains the 25 publicly available tests. You can add your own tests to this directory.
Install Docker and run:

Answer 1

To solve this problem, we'll create a C++ program that handles both compression and decompression of TON blockchain blocks using the LZ4 compression algorithm. The program will read input commands ("compress" or "decompress") and perform the corresponding operation on the base64-encoded data provided.

Overview of the Solution

Base64 Encoding/Decoding: Since the input and output are in base64 format, we'll need functions to decode base64 strings into binary data and encode binary data back into base64 strings.
LZ4 Compression/Decompression: We'll utilize the LZ4 library for efficient compression and decompression of the binary data.
Program Flow:
- Compression Mode:
  - Decode the base64 input to obtain the original binary block.
  - Compress the binary block using LZ4.
  - Encode the compressed data back into a base64 string for output.
- Decompression Mode:
  - Decode the base64 input to obtain the compressed binary data.
  - Decompress the data using LZ4.
  - Encode the decompressed data back into a base64 string for output.

Implementation Details

Base64 Handling: We'll implement base64 encoding and decoding functions based on standard algorithms. These functions map binary data to ASCII characters and vice versa.
LZ4 Integration: We'll use the LZ4 library's API for compression and decompression. Ensure that the LZ4 library is linked correctly during compilation.
Error Handling: The program will include basic error handling to ensure that the compression and decompression processes are successful. If any step fails, the program will exit with an error message.

Complete C++ Solution

Below is the complete C++ program that accomplishes the above tasks. Make sure to link the LZ4 library (-llz4) when compiling the program.

cpp
#include <bits/stdc++.h>
#include <lz4.h>

// Base64 encoding and decoding
static const std::string base64_chars =
             "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
             "abcdefghijklmnopqrstuvwxyz"
             "0123456789+/";

std::string base64_encode(const std::vector<unsigned char>& bytes_to_encode) {
    std::string ret;
    int i = 0;
    int j = 0;
    unsigned char char_array_3[3];
    unsigned char char_array_4[4];
    int in_len = bytes_to_encode.size();
    int pos = 0;

    while (in_len--) {
        char_array_3[i++] = bytes_to_encode[pos++];
        if (i ==3) {
            char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
            char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
            char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
            char_array_4[3] = char_array_3[2] & 0x3f;

            for(i = 0; (i <4) ; i++)
                ret += base64_chars[char_array_4[i]];
            i = 0;
        }
    }

    if (i)
    {
        for(j = i; j <3; j++)
            char_array_3[j] = '\0';

        char_array_4[0] = ( char_array_3[0] & 0xfc ) >> 2;
        char_array_4[1] = ( ( char_array_3[0] & 0x03 ) << 4 ) + ( ( char_array_3[1] & 0xf0 ) >> 4 );
        char_array_4[2] = ( ( char_array_3[1] & 0x0f ) << 2 ) + ( ( char_array_3[2] & 0xc0 ) >> 6 );
        char_array_4[3] = char_array_3[2] & 0x3f;

        for (j = 0; (j < i +1); j++)
            ret += base64_chars[char_array_4[j]];

        while((i++ <3))
            ret += '=';

    }

    return ret;
}

std::vector<unsigned char> base64_decode(const std::string& encoded_string) {
    int in_len = encoded_string.size();
    int i = 0;
    int j = 0;
    int in_ = 0;
    unsigned char char_array_4[4], char_array_3[3];
    std::vector<unsigned char> ret;

    auto is_base64 = [](unsigned char c) -> bool {
        return (isalnum(c) || (c == '+') || (c == '/'));
    };

    while (in_len-- && ( encoded_string[in_] != '=') && is_base64(encoded_string[in_])) {
        char_array_4[i++] = encoded_string[in_]; in_++;
        if (i ==4) {
            for (i = 0; i <4; i++)
                char_array_4[i] = base64_chars.find(char_array_4[i]) & 0xff;

            char_array_3[0] = ( char_array_4[0] << 2 ) + ( ( char_array_4[1] & 0x30 ) >> 4 );
            char_array_3[1] = ( ( char_array_4[1] & 0xf ) << 4 ) + ( ( char_array_4[2] & 0x3c ) >> 2 );
            char_array_3[2] = ( ( char_array_4[2] & 0x3 ) << 6 ) + char_array_4[3];

            for (i = 0; (i < 3); i++)
                ret.push_back(char_array_3[i]);
            i = 0;
        }
    }

    if (i) {
        for (j = i; j <4; j++)
            char_array_4[j] = 0;

        for (j = 0; j <4; j++)
            char_array_4[j] = base64_chars.find(char_array_4[j]) & 0xff;

        char_array_3[0] = ( char_array_4[0] << 2 ) + ( ( char_array_4[1] & 0x30 ) >> 4 );
        char_array_3[1] = ( ( char_array_4[1] & 0xf ) << 4 ) + ( ( char_array_4[2] & 0x3c ) >> 2 );
        char_array_3[2] = ( ( char_array_4[2] & 0x3 ) << 6 ) + char_array_4[3];

        for (j = 0; (j < i - 1); j++) ret.push_back(char_array_3[j]);
    }

    return ret;
}

int main() {
    std::ios::sync_with_stdio(false);
    std::cin.tie(nullptr);

    std::string mode;
    if (!std::getline(std::cin, mode)) {
        std::cerr << "Failed to read mode\n";
        return 1;
    }

    std::string b64_input;
    if (!std::getline(std::cin, b64_input)) {
        std::cerr << "Failed to read input data\n";
        return 1;
    }

    // Decode base64 input
    std::vector<unsigned char> input_data = base64_decode(b64_input);

    if (mode == "compress") {
        // Compress using LZ4
        int max_compressed_size = LZ4_compressBound(input_data.size());
        std::vector<char> compressed_data(max_compressed_size);

        int compressed_size = LZ4_compress_default(
            reinterpret_cast<const char*>(input_data.data()),
            compressed_data.data(),
            input_data.size(),
            max_compressed_size
        );

        if (compressed_size <= 0) {
            std::cerr << "Compression failed\n";
            return 1;
        }

        compressed_data.resize(compressed_size);

        // Encode to base64
        std::vector<unsigned char> compressed_bytes(compressed_data.begin(), compressed_data.end());
        std::string b64_output = base64_encode(compressed_bytes);
        std::cout << b64_output;
    }
    else if (mode == "decompress") {
        // Decompress using LZ4
        // Since we don't know the original size, we need to make an assumption or embed size information
        // For simplicity, let's assume the original size is 2 * 220 bytes as per the problem limits
        // Adjust as necessary based on actual data

        // Alternatively, if size information is embedded, extract it. For this problem, let's assume maximum size.

        // Since the original size can be up to 2 * 2^20 bytes, set a buffer accordingly
        // 2 * 2^20 = 2097152 bytes
        int max_original_size = 2 * (1 << 20); // 2 * 1048576 = 2097152
        std::vector<char> decompressed_data(max_original_size);

        int decompressed_size = LZ4_decompress_safe(
            reinterpret_cast<const char*>(input_data.data()),
            decompressed_data.data(),
            input_data.size(),
            max_original_size
        );

        if (decompressed_size < 0) {
            std::cerr << "Decompression failed\n";
            return 1;
        }

        decompressed_data.resize(decompressed_size);

        // Encode to base64
        std::vector<unsigned char> decompressed_bytes(decompressed_data.begin(), decompressed_data.end());
        std::string b64_output = base64_encode(decompressed_bytes);
        std::cout << b64_output;
    }
    else {
        std::cerr << "Unknown mode: " << mode << "\n";
        return 1;
    }

    return 0;
}

Compilation Instructions

To compile this program, ensure that you have the LZ4 library installed on your system. You can install it using package managers or build it from source.

For Linux:

Install LZ4 Library:

bash
sudo apt-get update
sudo apt-get install liblz4-dev

Compile the Program:

bash
g++ -std=c++17 -O2 -o ton_compress_decompress solution.cpp -llz4

For macOS:

Install LZ4 Library using Homebrew:
```
bash
brew install lz4
```

Compile the Program:

bash
g++ -std=c++17 -O2 -o ton_compress_decompress solution.cpp -llz4

For Windows:

Install LZ4:
- Download the precompiled binaries or build from source.
- Ensure that the LZ4 library is accessible to your compiler.
Compile the Program:
- Use your preferred C++ compiler (e.g., MSVC) and link against the LZ4 library.

Usage

The program expects two lines of input:

The first line specifies the mode: either "compress" or "decompress".
The second line contains the base64-encoded data to be processed.

Example for Compression:

bash
echo -e "compress\nSGVsbG8gV29ybGQh" | ./ton_compress_decompress

Example for Decompression:

bash
echo -e "decompress\n...base64_compressed_data..." | ./ton_compress_decompress

Notes

Assumptions:
- During decompression, the program assumes that the original size does not exceed 2 * 2^20 bytes (as per the problem constraints). If the actual size differs, you might need to adjust the buffer size or embed size information within the compressed data.
Error Handling:
- The program includes basic error handling. If compression or decompression fails, it will output an error message to stderr and exit with a non-zero status code.
Performance:
- The program is optimized for efficiency by disabling synchronization with C's stdio and untieing cin and cout. Ensure that your environment supports these optimizations.
Base64 Implementation:
- The base64 encoding and decoding functions are implemented manually. For production code, consider using optimized libraries or built-in functions for better performance and reliability.

Testing

You can test the program locally using the provided examples and ensure that compression followed by decompression returns the original data.

bash
# Original data
echo -n "Hello World!" | base64
# Outputs: SGVsbG8gV29ybGQh

# Compress
echo -e "compress\nSGVsbG8gV29ybGQh" | ./ton_compress_decompress
# Outputs compressed base64 string

# Decompress
echo -e "decompress\n<compressed_base64>" | ./ton_compress_decompress
# Should output: SGVsbG8gV29ybGQh

Ensure that after decompression, the original data is accurately retrieved.

Conclusion

This solution efficiently handles the compression and decompression of TON blockchain blocks using LZ4, adhering to the problem's constraints and requirements. Make sure to test the program thoroughly with various inputs to ensure correctness and robustness.

TON block compression

Вопрос

Мысль

Ответ