Grouping Names Solution C++

Question

Here is the problem statement extracted from the images:

Problem: Grouping Names Based on Unique Alphabets

You are given an array of N names. Each name consists of uppercase English letters. You need to form groups of names based on the set of unique alphabets they contain.

A name A can be in the same group as name B if the set of unique characters in A and B differs by at most one character in one of the following ways:

One unique character in A is missing in B.
One unique character in B is missing in A.
One unique character in A is replaced by another unique character in B.

Your task is to determine:

The maximum possible size of any group.
The minimum number of groups required to cover all N names.

Input Format

The first line contains an integer T, the number of test cases.
For each test case:
- The first line contains an integer N, the number of names.
- The second line contains N space-separated strings, representing the names.

Output Format

For each test case, print two space-separated integers:

The maximum possible size of any group.
The minimum number of groups required to cover all N names.

Constraints

$1 \leq T \leq 10$
$1 \leq N \leq 10^4$
$1 \leq |S| \leq 20$ , where $|S|$ denotes the length of each name.

Example

Input

text
1
5
HARI HARRY LATA SARAI

Output

3 2

Notes

1-based indexing is followed.
The order of the alphabets in names does not matter.

Would you like a solution in a specific programming language? code in c++

Answer 1

We can solve the problem by “compressing” each name into a 26‐bit mask (one bit per uppercase letter) that indicates which letters occur in the name. Then two names are “similar” (and hence can be in the same group) if their bitmasks differ by at most one allowed operation. In our case the allowed operations are:

Addition: One name’s letter set is exactly the other’s plus one extra letter. (For example, if $S_1 = \{A,B,C\}$ and $S_2 = \{A,B,C,D\}$ .)
‒ In terms of bitmasks: if $m_2 = m_1 \,|\, (1<<x)$ for some letter $x$ not in $m_1$ .
Removal: The reverse of addition (one letter is “missing”).
‒ That is, $m_1$ is $m_2$ with one extra letter.
Replacement: The two names have the same number of letters but exactly one letter is “swapped”.
‒ In bitmask terms, if $m_1$ and $m_2$ have the same popcount and if one letter is removed and a different one is added (so the symmetric difference has size 2).

Notice that if the two masks are identical (i.e. symmetric difference is 0) they are automatically “similar” and all names with the same mask will belong together.

The idea is to build a graph where each vertex is a unique bitmask (with a frequency count equal to how many names produced that mask). We add an edge between two vertices if one of the allowed operations transforms one mask into the other. Then, grouping the names is the same as finding the connected components in this graph.

A few implementation details:

Since every name has at most 20 characters (and letters are from A to Z) the bitmask fits in an integer.
We “compress” names with identical masks into one vertex (keeping a frequency count).
We then generate “neighbors” for each mask by:
- Adding: For each letter not in the mask.
- Removing: For each letter in the mask.
- Replacing: For each letter in the mask (remove it) and for each other letter that is not in the reduced mask (add it). (Make sure not to “replace” a letter by itself.)
Finally, we use a union–find (disjoint set union, DSU) to quickly merge connected vertices.
After processing, the maximum group size is the maximum frequency sum over all components and the number of groups is just the number of connected components.

Below is the complete C++ solution:

cpp
#include <bits/stdc++.h>
using namespace std;
 
// Disjoint Set Union (Union–Find) structure
struct DSU {
    vector<int> parent, rank;
    DSU(int n) : parent(n), rank(n,0) {
        for (int i = 0; i < n; i++) 
            parent[i] = i;
    }
    int find(int a) {
        return parent[a] = (parent[a] == a ? a : find(parent[a]));
    }
    void unite(int a, int b) {
        a = find(a); 
        b = find(b);
        if(a == b) return;
        if(rank[a] < rank[b]) swap(a, b);
        parent[b] = a;
        if(rank[a] == rank[b]) 
            rank[a]++;
    }
};
 
// Convert a name (string) into a bitmask representing its unique letters.
int nameToMask(const string &s) {
    int mask = 0;
    for (char c : s)
        mask |= 1 << (c - 'A');
    return mask;
}
 
int main(){
    ios::sync_with_stdio(false);
    cin.tie(nullptr);
 
    int T;
    cin >> T;
    while(T--){
        int N;
        cin >> N;
 
        // Compress names by their unique letter bitmask.
        unordered_map<int, int> freq; // mask -> frequency count
        vector<int> uniqueMasks;      // list of unique masks
        for (int i = 0; i < N; i++){
            string name;
            cin >> name;
            int mask = nameToMask(name);
            if(freq.find(mask) == freq.end()){
                freq[mask] = 0;
                uniqueMasks.push_back(mask);
            }
            freq[mask]++;
        }
 
        int sz = uniqueMasks.size();
        // Map each unique mask to an index (for DSU).
        unordered_map<int,int> maskToIndex;
        for (int i = 0; i < sz; i++){
            maskToIndex[ uniqueMasks[i] ] = i;
        }
 
        DSU dsu(sz);
 
        // For each unique mask, generate all neighbors by allowed operations.
        for (int i = 0; i < sz; i++){
            int m = uniqueMasks[i];
            // Operation 1: Add a letter (if not already present).
            for (int bit = 0; bit < 26; bit++){
                if (!(m & (1 << bit))){
                    int candidate = m | (1 << bit);
                    if(maskToIndex.find(candidate) != maskToIndex.end()){
                        dsu.unite(i, maskToIndex[candidate]);
                    }
                }
            }
            // Operation 2: Remove a letter (if present).
            for (int bit = 0; bit < 26; bit++){
                if (m & (1 << bit)){
                    int candidate = m & ~(1 << bit);
                    if(maskToIndex.find(candidate) != maskToIndex.end()){
                        dsu.unite(i, maskToIndex[candidate]);
                    }
                }
            }
            // Operation 3: Replace a letter.
            // For each letter present, remove it then add a different letter.
            for (int bit = 0; bit < 26; bit++){
                if(m & (1 << bit)){
                    int removed = m & ~(1 << bit);
                    for (int bit2 = 0; bit2 < 26; bit2++){
                        // We must add a letter that is not already in the removed mask.
                        // (Note: When bit2 == bit, candidate would become the original m.)
                        if(removed & (1 << bit2))
                            continue;
                        int candidate = removed | (1 << bit2);
                        if(maskToIndex.find(candidate) != maskToIndex.end()){
                            dsu.unite(i, maskToIndex[candidate]);
                        }
                    }
                }
            }
        }
 
        // Now compute the connected components.
        // For each component, add up the frequencies (i.e. number of names with those masks).
        vector<long long> compFreq(sz, 0);
        unordered_set<int> compRoots;
        for (int i = 0; i < sz; i++){
            int root = dsu.find(i);
            compFreq[root] += freq[ uniqueMasks[i] ];
            compRoots.insert(root);
        }
 
        int numGroups = compRoots.size();
        long long maxGroupSize = 0;
        for (int root : compRoots)
            maxGroupSize = max(maxGroupSize, compFreq[root]);
 
        cout << maxGroupSize << " " << numGroups << "\n";
    }
    return 0;
}

Explanation

Input and Mask Conversion:
We read each name and compute its bitmask using the nameToMask function. Names with the same mask are “compressed” (we record their frequency).
Graph Construction (via DSU):
For every unique mask we generate candidate masks using:
- Addition: Loop over all 26 bits and if the bit is not set, add it.
- Removal: Loop over bits that are set and remove each one.
- Replacement: For every set bit, remove it and then try adding every other letter (skip if it would just restore the original mask).
For every candidate that exists in our dictionary (i.e. was produced by some name), we unite the two nodes in our DSU.
Computing the Answer:
After union–find, each DSU “root” represents one group. We sum the frequencies (number of names) in each group. Finally, we output:
- The maximum group size (largest sum over a component).
- The total number of groups (number of connected components).

This solution meets the problem’s constraints efficiently.

Would you like further clarification on any part of the code?