DP优化分组问题

Question

题目描述
众所周知，李哥在富人岛上养了很多神奇的东西，比如说可以预测赛果的只因。而这次来度假的陈哥就看上了李哥在岛上养的会唱歌的奶牛，想让它们组一个合唱团尝试一下愉悦的旋律。

现在李哥能给陈哥提供N头奶牛，共有M种音符。但奶牛们都十分刁钻，i号奶牛只会唱音符Ai(1≤Ai≤M)。

陈哥要把排好队的奶牛按顺序分成若干个声部。简单来说，就是分成若干个组（每组最少一头奶牛），对于分好的每个组，如果组内的奶牛们一共会唱K种音符，则组内的不和谐值为K*K。

注意：只能将连续的奶牛分为一组，具体可以参考样例解释。

请你帮忙计算一下，陈哥怎么样分组才能使合唱团内不和谐值的总和最小。请计算这个最小不和谐值。为了合唱团能尽量和谐而愉悦地演奏，也为了李哥的钱包：）

关于输入
第一行两个整数 N,M（1≤N≤50000, 1≤M≤N）。
接下来 N 行，每行一个整数Ai，表示第i头奶牛唱的音符种类。

关于输出
一个整数，表示最小不和谐值。

例子输入
13 4
1
2
1
3
2
2
3
4
3
4
3
1
4
例子输出
11
提示信息
样例解释：
如此划分：1\2\1\3\22\34343\1\4，1+1+1+1+1+2*2+1+1=11

善良的陈哥提醒你：
1.用你聪明的脑瓜想想，其实答案是不是有一个明显的上界？
2.那么可以转移的状态数量应该是有限的（受这个上界约束的），考虑如何维护这个集合
用C++设计一个算法完成这道题，要求时间复杂度严格优于O(N^2),O(NM),O(M^2)
你需要保证你算法的正确性！！

Answer 1

To solve this problem efficiently, we can use a Divide and Conquer Dynamic Programming (DP) approach combined with two pointers to maintain the number of unique elements in any given interval. Here's a step-by-step explanation of the approach and the corresponding C++ implementation.

Approach Overview

Dynamic Programming (DP) Setup:
- DP State: dp[i] represents the minimum total discordance for the first i cows.
- Recurrence Relation: $dp[i] = \min_{0 \leq j < i} (dp[j] + \text{cost}(j+1, i))$ where cost(j+1, i) is the discordance value of the group from cow j+1 to cow i.
Cost Calculation:
- The cost of a group [j+1, i] is the square of the number of unique notes in that group, i.e., K^2, where K is the number of distinct notes.
Optimization with Divide and Conquer:
- The straightforward DP approach has a time complexity of $O(N^2)$ , which is too slow for $N \leq 50,000$ .
- We can optimize this using Divide and Conquer DP, which reduces the time complexity to $O(N \log N)$ under certain conditions.
- The key idea is to divide the problem into smaller subproblems and solve them recursively, leveraging the fact that the optimal partition point for a midpoint lies within a specific range.
Maintaining Unique Elements with Two Pointers:
- To efficiently calculate the number of unique elements (K) in any interval [j+1, i], we use two pointers (current_l and current_r) and a frequency array.
- As we adjust the pointers, we update the frequency of each note and maintain a count of unique notes.

Implementation Details

Global Variables:
- freq[]: Frequency array to keep track of the count of each note in the current window.
- current_l and current_r: Pointers defining the current window [current_l, current_r].
- current_K: Number of unique elements in the current window.
Functions:
- add(int x): Adds a note to the current window and updates current_K if it's a new unique note.
- remove(int x): Removes a note from the current window and updates current_K if a unique note is completely removed.
- compute_cost(int j_plus_1, int m): Adjusts the window to [j_plus_1, m] and returns the current discordance value K^2.
Divide and Conquer Function:
- divide_and_conquer(int l, int r, int opt_l, int opt_r, vector<ll> &dp_new, const vector<ll> &dp_prev, const vector<int> &A): Recursively computes the optimal dp_new[m] for m in [l, r] by searching for the optimal partition point j within [opt_l, opt_r].

Complete C++ Implementation

cpp
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
const ll INF = 1e18;

// Global variables for frequency and unique count
int freq_val[50005];
int current_l = 1, current_r = 0;
int current_K = 0;
int A_global[50005];
ll current_cost = 0;

// Function to add a value to the current window
void add(int x){
    freq_val[x]++;
    if(freq_val[x] == 1){
        current_K++;
    }
}

// Function to remove a value from the current window
void remove_val(int x){
    freq_val[x]--;
    if(freq_val[x] == 0){
        current_K--;
    }
}

// Function to compute the cost for the current window [new_l, new_r]
ll compute_cost(int new_l, int new_r){
    // Expand or contract the window to [new_l, new_r]
    while(current_l > new_l){
        current_l--;
        add(A_global[current_l]);
    }
    while(current_r < new_r){
        current_r++;
        add(A_global[current_r]);
    }
    while(current_l < new_l){
        remove_val(A_global[current_l]);
        current_l++;
    }
    while(current_r > new_r){
        remove_val(A_global[current_r]);
        current_r--;
    }
    // The cost is K squared
    return (ll)current_K * (ll)current_K;
}

// Divide and Conquer DP function
void divide_and_conquer(int l, int r, int opt_l, int opt_r, vector<ll> &dp_new, const vector<ll> &dp_prev, const vector<int> &A){
    if(l > r) return;
    int m = (l + r) / 2;
    ll best = INF;
    int best_j = -1;
    // Iterate through possible j to find the optimal partition point
    for(int j = opt_l; j <= min(m-1, opt_r); j++){
        ll cost = compute_cost(j+1, m);
        ll current = dp_prev[j] + cost;
        if(current < best){
            best = current;
            best_j = j;
        }
    }
    dp_new[m] = best;
    // Recurse on the left half
    divide_and_conquer(l, m-1, opt_l, best_j, dp_new, dp_prev, A);
    // Recurse on the right half
    divide_and_conquer(m+1, r, best_j, opt_r, dp_new, dp_prev, A);
}

int main(){
    ios::sync_with_stdio(false);
    cin.tie(0);
    int N, M;
    cin >> N >> M;
    vector<int> A(N+1);
    for(int i=1; i<=N; i++) cin >> A[i];
    // Initialize global array
    for(int i=1; i<=N; i++) A_global[i] = A[i];
    // Initialize DP arrays
    vector<ll> dp_prev(N+1, INF);
    vector<ll> dp_new(N+1, INF);
    dp_prev[0] = 0;
    // Perform Divide and Conquer DP
    // Since it's a single stage (no multiple partitions stages), we perform one D&C
    divide_and_conquer(1, N, 0, N-1, dp_new, dp_prev, A);
    // The answer is dp_new[N]
    cout << dp_new[N] << "\n";
    return 0;
}

Explanation of the Implementation

Initialization:
- We read the number of cows N and the number of note types M.
- We store the notes each cow can sing in the array A.
Global Window Management:
- We use current_l and current_r to define the current window of cows being considered.
- The frequency array freq_val keeps track of how many times each note appears in the current window.
- current_K maintains the number of unique notes in the current window.
Cost Computation:
- The compute_cost function adjusts the window to the desired range [j+1, m] by moving the current_l and current_r pointers and updating the frequency and unique count accordingly.
- It then calculates the cost as K^2.
Divide and Conquer DP:
- The divide_and_conquer function recursively computes the optimal dp_new[m] for each midpoint m in the range [l, r].
- For each midpoint m, it searches for the optimal partition point j in the range [opt_l, opt_r] that minimizes dp_prev[j] + cost(j+1, m).
- It updates dp_new[m] with the minimum cost found and recursively processes the left and right halves.
Final Output:
- After processing, the minimum total discordance is stored in dp_new[N], which is then printed as the answer.

Time and Space Complexity

Time Complexity: The Divide and Conquer DP approach reduces the time complexity to approximately $O(N \log N)$ , which is efficient for the given constraints ( $N \leq 50,000$ ).
Space Complexity: The space complexity is $O(N)$ due to the DP arrays and frequency array.

Conclusion

This approach efficiently partitions the cows into groups to minimize the total discordance by leveraging Divide and Conquer DP and two-pointer techniques to manage unique elements within intervals. The implementation ensures that the solution scales well with the problem constraints.