Amazon is developing a string matching library. Yo...

Question

Amazon is developing a string matching library. You are to develop a service that finds the longest substring that matches a given regex. More formally, you are given two strings: A text string: sourceString A regex expression: pattern The string pattern contains exactly one wildcard character (). A wildcard character () matches any sequence of zero or more lowercase English characters. A regex matches some string if it is possible to replace the wildcard character with some sequence of characters such that the regex becomes equal to the string. No other character can be changed.

Answer 1

Below is a practical, linear-time approach you can drop into a micro-service (or use directly in-process) when the regex contains exactly one asterisk *.

1 Algorithm (high level)

Split the pattern at the *

text
pattern =  P★S
           └┬┘ └┬┘
          prefix suffix

prefix = pattern[:starIndex]
suffix = pattern[starIndex+1:]

Scan once left→right
Record every index i where source[i:].startswith(prefix).
Scan once right→left
Record every index j where source[j:].startswith(suffix), but at the same time build an array
maxSuffixStartFrom[pos] = max { j | j ≥ pos and suffix matches at j }
(-1 if none).
Choose the longest pair
For each prefix start position i, look up j = maxSuffixStartFrom[i].
If j is valid (≥ i) the substring [i, j+len(suffix)) is a candidate of length
L = (j + |suffix|) – i.
Keep the maximal-length substring; when ties matter, pick the first one (or whatever your spec decides).
Edge cases

Case Result
pattern == "*" entire source
prefix == "" “largest tail” that ends with suffix
suffix == "" “largest head” that starts with prefix
No match empty string (or 404/422, per API design)

This is O(n) time and O(n) memory, where n = len(source).

2 Reference implementation (Python 3)

python
def longest_wildcard_match(source: str, pattern: str) -> str:
    star = pattern.find('*')
    if star == -1:
        raise ValueError("pattern must contain exactly one '*'")
    pre, suf = pattern[:star], pattern[star + 1:]

    # Fast paths when one side is empty ----------------------------
    if pattern == '*':
        return source
    if not pre:            # pattern "*suffix"
        idx = source.rfind(suf)
        return source[idx:] if idx != -1 else ''
    if not suf:            # pattern "prefix*"
        idx = source.find(pre)
        return source[idx:] if idx != -1 else ''

    n, lp, ls = len(source), len(pre), len(suf)

    # 1) prefix starts --------------------------------------------
    pre_starts = []
    for i in range(n - lp + 1):
        if source[i:i + lp] == pre:
            pre_starts.append(i)

    if not pre_starts:
        return ''

    # 2) suffix starts & right-to-left max table -------------------
    max_suf_from = [-1] * (n + 1)          # one past the end sentinel
    last = -1
    for i in range(n - ls, -1, -1):        # inclusive, right→left
        if source[i:i + ls] == suf:
            last = i
        max_suf_from[i] = last
    # positions after n-ls default to -1

    # 3) sweep for best substring ---------------------------------
    best_i = best_len = -1
    for i in pre_starts:
        j = max_suf_from[i]                # farthest suffix that begins ≥ i
        if j == -1:                        # no suffix after this prefix
            continue
        cur_len = (j + ls) - i
        if cur_len > best_len:
            best_len, best_i = cur_len, i

    return '' if best_len == -1 else source[best_i:best_i + best_len]

Complexities

Aspect	Cost
Time	`O(n)`
Extra memory	`O(n)` (can be `O(1)` with two passes and no table)

3 Service skeleton (REST/HTTP)

Method	Route	Body (JSON)	2xx Response
`POST`	`/match/longest`	`{ "source": "...", "pattern": "..." }`	`{ "match": "..." }`

Notes

Validate that pattern contains exactly one *.
4xx on malformed input or “no match”.
Memoise or pool a compiled helper if throughput is critical.
For very large strings consider streaming chunks and KMP-style overlap handling to keep memory under control.

4 Possible follow-ups

Add unit tests (edge-case coverage, fuzzing).
Port to Java/Kotlin using the same logic—no JVM regex needed, so no exponential pitfalls.
Benchmark with Amazon’s internal perf harness (string lengths 10 B+).
Wrap in AWS Lambda / API Gateway for serverless deployment.

Let me know if you’d like any of those pieces fleshed out!

Answer 2

Here’s a clean & linear-time C++ solution that preserves your I/O scaffold but fixes the logic and makes the algorithm O(n) instead of O(n²).

cpp
#include <bits/stdc++.h>
using namespace std;

/*  Return the length of the longest substring of `source`
    that can be matched by the single-wildcard pattern `pattern`.
    If nothing matches, return -1.                                    */
int findMaxRegexMatch(const string& source, const string& pattern) {
    size_t starPos = pattern.find('*');
    if (starPos == string::npos)              // safety: should not happen
        return -1;

    const string prefix = pattern.substr(0, starPos);
    const string suffix = pattern.substr(starPos + 1);

    const int n  = static_cast<int>(source.size());
    const int lp = static_cast<int>(prefix.size());
    const int ls = static_cast<int>(suffix.size());

    /* ---------- fast paths when one side is empty ---------------- */
    if (pattern == "*")                // pattern is only the wildcard
        return n;

    if (prefix.empty()) {              // "*suffix"
        size_t pos = source.rfind(suffix);
        return (pos == string::npos) ? -1 : n - static_cast<int>(pos);
    }

    if (suffix.empty()) {              // "prefix*"
        size_t pos = source.find(prefix);
        return (pos == string::npos) ? -1 : n - static_cast<int>(pos);
    }

    /* ---------- collect all prefix-match start positions --------- */
    vector<int> preStarts;
    for (int i = 0; i + lp <= n; ++i)
        if (source.compare(i, lp, prefix) == 0)
            preStarts.push_back(i);

    if (preStarts.empty()) return -1;          // no prefix → no match

    /* ---------- build right-to-left table of farthest suffix ----- */
    vector<int> farSuffixFrom(n + 1, -1);      // sentinel at n
    int last = -1;
    for (int i = n - ls; i >= 0; --i) {
        if (source.compare(i, ls, suffix) == 0)
            last = i;
        farSuffixFrom[i] = last;               // farthest ≥ i
    }

    /* ---------- sweep for the best span -------------------------- */
    int bestLen = -1;
    for (int i : preStarts) {
        int j = farSuffixFrom[i + lp];         // suffix must start ≥ i+lp
        if (j == -1) continue;                 // none after this prefix
        int len = (j + ls) - i;                // total substring length
        if (len > bestLen) bestLen = len;
    }
    return bestLen;
}

/* ------------------------ driver code --------------------------- */
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string sourceString;
    getline(cin, sourceString);

    string pattern;
    getline(cin, pattern);

    int result = findMaxRegexMatch(sourceString, pattern);
    cout << result << '\n';
    return 0;
}

Why this version is correct & fast

Step	Purpose	Complexity
Collect prefix starts	Find every index where `prefix` matches	O(n)
Build `farSuffixFrom`	For every position `i`, store the farthest suffix-start ≥ `i`	O(n) (single right-to-left pass)
Sweep for answer	Combine each prefix with farthest valid suffix	O(n) (worst-case every char is a prefix start)

Memory is O(n) for the helper table, which keeps the main pass simple and branch-free. If you ever need O(1) memory, you can replace the table with a second two-pointer pass, but for typical limits (1 e6–1 e7 chars) this layout is both fast and cache-friendly.

Answer 3

Below is a tighter O(n) solution that keeps every inner operation constant-time and needs only one auxiliary array.
It comfortably fits a 2 s limit even with ≈ 10 MB input.

cpp
#include <bits/stdc++.h>
using namespace std;

/*  Return the length of the longest substring of `src`
    that can match a single-wildcard pattern `pat`
    (exactly one ‘*’).  –1 if no match exists.            */
int longestWildcardMatch(const string& src, const string& pat) {
    const int n = static_cast<int>(src.size());
    const size_t star = pat.find('*');

    const string pre = pat.substr(0, star);          // part before *
    const string suf = pat.substr(star + 1);         // part after  *
    const int lp = static_cast<int>(pre.size());
    const int ls = static_cast<int>(suf.size());

    /* ---------- trivial fast paths --------------------------- */
    if (pat == "*")            return n;                         // * matches all
    if (lp + ls > n)           return -1;                        // impossible
    if (lp == 0) {                                             // "*suffix"
        size_t pos = src.rfind(suf);
        return pos == string::npos ? -1 : n - static_cast<int>(pos);
    }
    if (ls == 0) {                                             // "prefix*"
        size_t pos = src.find(pre);
        return pos == string::npos ? -1 : n - static_cast<int>(pos);
    }

    /* ---------- build “next suffix start ≥ i” table ---------- */
    vector<int> next(n + 1, -1);          // next[i] valid for all i∈[0..n]
    int nxt = -1;
    for (int i = n - ls; i >= 0; --i) {
        if (src.compare(i, ls, suf) == 0) nxt = i;
        next[i] = nxt;                    // earliest suffix that starts ≥ i
    }

    /* ---------- sweep for best span -------------------------- */
    int best = -1;
    for (int i = 0; i + lp <= n; ++i) {
        if (src.compare(i, lp, pre) != 0) continue;             // no prefix
        int j = next[i + lp];                                   // suffix ≥ prefix-end
        if (j == -1)            continue;                       // none available
        best = max(best, j + ls - i);                           // update answer
    }
    return best;
}

/* ---------------------------- driver ------------------------- */
int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string source, pattern;
    getline(cin, source);
    getline(cin, pattern);

    cout << longestWildcardMatch(source, pattern) << '\n';
    return 0;
}

Why this version beats the 2 s limit

Single forward/backward pass over the input string → strictly O(n) time.
Only one auxiliary array (next) — 4 bytes × (n + 1) ≈ 4 MB for n = 1 000 000, well within typical memory caps.
All substring tests use string::compare, which compiles down to memcmp, i.e. they execute a few CPU instructions on a cache-hot block.
Fast-path exits handle the common patterns *suffix / prefix* / bare * in O(1).

Drop-in replacement: keep the same stdin/stdout contract your judge expects — no ofstream indirection needed — and it should pass comfortably.

Answer 4

Here’s a drop-in replacement that keeps the required ofstream fout(getenv("OUTPUT_PATH")) I/O pattern and uses a strictly O(n) algorithm, so it meets a 2 s time-limit even for very large inputs.

cpp
#include <bits/stdc++.h>
using namespace std;

/*  Return the length of the longest substring of `src`
    that matches a single-wildcard pattern `pat` (exactly one '*').
    If no substring matches, return –1.                           */
int findMaxRegexMatch(const string& src, const string& pat) {
    const int n      = static_cast<int>(src.size());
    const size_t pos = pat.find('*');

    const string pre = pat.substr(0, pos);          // part before *
    const string suf = pat.substr(pos + 1);         // part after  *
    const int lp = static_cast<int>(pre.size());
    const int ls = static_cast<int>(suf.size());

    /* ---------- trivial fast paths --------------------------- */
    if (pat == "*")            return n;                     // lone *
    if (lp + ls > n)           return -1;                    // impossible
    if (lp == 0) {                                           // "*suffix"
        size_t p = src.rfind(suf);
        return p == string::npos ? -1 : n - static_cast<int>(p);
    }
    if (ls == 0) {                                           // "prefix*"
        size_t p = src.find(pre);
        return p == string::npos ? -1 : n - static_cast<int>(p);
    }

    /* ---------- build table: next suffix-start ≥ i ----------- */
    vector<int> next(n + 1, -1);     // next[i] valid for all i ∈ [0..n]
    int nxt = -1;
    for (int i = n - ls; i >= 0; --i) {
        if (src.compare(i, ls, suf) == 0) nxt = i;
        next[i] = nxt;
    }

    /* ---------- sweep for best span -------------------------- */
    int best = -1;
    for (int i = 0; i + lp <= n; ++i) {
        if (src.compare(i, lp, pre) != 0) continue;      // no prefix here
        int j = next[i + lp];                            // suffix after prefix
        if (j == -1) continue;                           // none available
        best = max(best, j + ls - i);
    }
    return best;
}

int main() {
    ios::sync_with_stdio(false);
    cin.tie(nullptr);

    string sourceString;
    getline(cin, sourceString);

    string pattern;
    getline(cin, pattern);

    int result = findMaxRegexMatch(sourceString, pattern);

    /* required judge-specific output file */
    ofstream fout(getenv("OUTPUT_PATH"));
    fout << result << '\n';
    fout.close();

    return 0;
}

Why it passes

Single linear pass (plus one reverse pass) → O(n) time.
One auxiliary array of n + 1 integers → ≈ 4 MB for n = 1 000 000, within typical limits.
Fast-path branches handle *suffix, prefix*, and lone * in O(1).

Answer 5

Below is a drop-in replacement for your findMaxRegexMatch that runs in strict O(n) time and avoids the nested substring calls that were causing the TLE.

cpp
int findMaxRegexMatch(const string& src, const string& pat) {
    const size_t starPos = pat.find('*');
    const string pre = pat.substr(0, starPos);        // prefix before '*'
    const string suf = pat.substr(starPos + 1);       // suffix after  '*'
    const int n  = static_cast<int>(src.size());
    const int lp = static_cast<int>(pre.size());
    const int ls = static_cast<int>(suf.size());

    /* ---------- fast paths for special shapes ------------------ */
    if (pat == "*")            return n;                       // “*”
    if (lp + ls > n)           return -1;                      // impossible
    if (lp == 0) {                                             // "*suffix"
        size_t pos = src.rfind(suf);
        return pos == string::npos ? -1 : n - static_cast<int>(pos);
    }
    if (ls == 0) {                                             // "prefix*"
        size_t pos = src.find(pre);
        return pos == string::npos ? -1 : n - static_cast<int>(pos);
    }

    /* ---------- build next-suffix table (right-to-left) -------- */
    vector<int> next(n + 1, -1);      // next[i] = earliest suf-start ≥ i
    int nxt = -1;
    for (int i = n - ls; i >= 0; --i) {
        if (src.compare(i, ls, suf) == 0) nxt = i;
        next[i] = nxt;
    }

    /* ---------- sweep all prefix occurrences ------------------- */
    int best = -1;
    for (int i = 0; i + lp <= n; ++i) {
        if (src.compare(i, lp, pre) != 0) continue;   // no prefix here
        int j = next[i + lp];                         // suffix must start ≥ end-of-prefix
        if (j == -1) continue;                        // none available
        best = max(best, j + ls - i);                 // candidate length
    }
    return best;
}

Why this passes the 2 s limit

Step	Work done	Complexity
Build `next`	one backward pass over `src`	O(n)
Scan prefixes	one forward pass over `src`	O(n)
Memory	`next` array (`n+1` ints)	O(n) (≈ 4 MB per 1 M chars)

All substring checks use string::compare, which compiles to a memcmp against cache-hot memory, making each check constant-time.

Just replace your old findMaxRegexMatch with this one—no other changes needed.

Answer 6

cpp
int findMaxRegexMatch(string sourceString, string pattern) {
    int starPos = pattern.find('*');                 // position of the single '*'
    string prefix = pattern.substr(0, starPos);      // part before '*'
    string suffix = pattern.substr(starPos + 1);     // part after  '*'

    int n  = static_cast<int>(sourceString.size());
    int lp = static_cast<int>(prefix.size());
    int ls = static_cast<int>(suffix.size());

    /* ---------- trivial fast paths ---------- */
    if (pattern == "*")            return n;                 // lone '*'
    if (lp + ls > n)               return -1;                // impossible
    if (lp == 0) {                                           // "*suffix"
        size_t pos = sourceString.rfind(suffix);
        return pos == string::npos ? -1 : n - static_cast<int>(pos);
    }
    if (ls == 0) {                                           // "prefix*"
        size_t pos = sourceString.find(prefix);
        return pos == string::npos ? -1 : n - static_cast<int>(pos);
    }

    /* ---------- table: next suffix-start ≥ i ---------- */
    vector<int> next(n + 1, -1);          // next[i] = earliest suffix start ≥ i
    int nxt = -1;
    for (int i = n - ls; i >= 0; --i) {
        if (sourceString.compare(i, ls, suffix) == 0) nxt = i;
        next[i] = nxt;
    }

    /* ---------- sweep all prefix occurrences ---------- */
    int best = -1;
    for (int i = 0; i + lp <= n; ++i) {
        if (sourceString.compare(i, lp, prefix) != 0) continue; // no prefix here
        int j = next[i + lp];                                   // suffix after prefix
        if (j == -1) continue;                                  // none available
        best = max(best, j + ls - i);                           // update best length
    }
    return best;
}

Amazon is developing a string matching library. Yo...

Question

Thought

Considering possible solution approach

Clarifying substring search strategy

Answer

1 Algorithm (high level)

2 Reference implementation (Python 3)

3 Service skeleton (REST/HTTP)

4 Possible follow-ups

Question

Thought

Clarifying user request

Improving code efficiency

Answer

Why this version is correct & fast

Question

Thought

Optimizing regex code

Optimizing code memory and runtime

Optimizing for TLE with improved approach

Designing a more efficient algorithm

Answer

Why this version beats the 2 s limit

Question

Thought

Adjusting code for file output

Providing full code with output file path

Answer

Question

Thought

Answer

Why this passes the 2 s limit

Question

Thought

Providing optimized function

Answer

Share this Q&A

Case	Result
`pattern == "*"`	entire `source`
`prefix == ""`	“largest tail” that ends with `suffix`
`suffix == ""`	“largest head” that starts with `prefix`
No match	empty string (or 404/422, per API design)