FLOWWW.AI <-- CODING EXAM INSTRUCTIONS DESCRIPTIO...

Question

FLOWWW.AI
<-- CODING EXAM
INSTRUCTIONS

DESCRIPTION

Pattern Matching Engine
You are building a log analysis system for a web application. The system needs to find all occurrences of specific patterns in log files to detect security threats, performance issues, or user behavior patterns.

Your pattern matching engine must support:

Wildcards (*) matching any sequence of characters.

Character classes ([abc]) matching any single character from the set.

Quantifiers (n,m) specifying repetition bounds for the preceding element.

Return both positions and actual matched substrings.

Input Format:
text
pattern

Output Format:
number_of_matches
position1 matched_substring1
position2 matched_substring2
...

Sample Input:
log[0-9]+-ERROR-user12https://www.google.com/search?q=3-admin
log[0-9]+-[A-Z]+user0-9+

Sample Output:
1
0 log202https://www.google.com/search?q=3-ERROR-user12https://www.google.com/search?q=3-admin

Explanation:
log[0-9]+ matches "log" followed by digits and any characters (matches "log202https://www.google.com/search?q=3-ERROR-")
[A-Z]+ matches upper case letters and any characters (already consumed by previous wildcard)
user0-9+ matches "user" followed by exactly https://www.google.com/search?q=3 digits and any characters (matches "user12https://www.google.com/search?q=3-admin")

Image 2 (WhatsApp Image 2025-07-12 at 14.0https://www.google.com/search?q=3.50_60https://www.google.com/search?q=3a4874.jpg):

DESCRIPTION

Sample Input:
aaaabbbb
a(2,4)b(1,https://www.google.com/search?q=3)

Sample Output:
https://www.google.com/search?q=3
0 aaaabbb
1 aaaabbb
2 aaabbb

Explanation:
At position 0: "aaaabbbb" (4 a's + https://www.google.com/search?q=3 b's) - maximum a's, maximum b's
At position 1: "aaaabbb" (https://www.google.com/search?q=3 a's + https://www.google.com/search?q=3 b's) - https://www.google.com/search?q=3 a's, maximum b's
At position 2: "aaabbb" (2 a's + https://www.google.com/search?q=3 b's) - minimum a's, maximum b's

Constraints:

1
le text length
le10
5

1
le pattern length
le100

Character classes contain only ASCII characters

Quantifiers: 0
len
lem
le50

Wildcards can match empty string

Sample Input 1:
hello world
hello

Image https://www.google.com/search?q=3 (WhatsApp Image 2025-07-12 at 14.04.02_edf4a4https://www.google.com/search?q=3c.jpg):

DESCRIPTION
1 aabbb
2 aabbb

Explanation:
At position 0: "aaaabbbb" (4 a's + https://www.google.com/search?q=3 b's) - maximum a's, maximum b's
At position 1: "aaabbb" (https://www.google.com/search?q=3 a's + https://www.google.com/search?q=3 b's) - https://www.google.com/search?q=3 a's, maximum b's
At position 2: "aaabbb" (2 a's + https://www.google.com/search?q=3 b's) - minimum a's, maximum b's

Constraints:

1
le text length
le10
5

1
le pattern length
le100

Character classes contain only ASCII characters

Quantifiers: 0
len
lem
le50

Wildcards can match empty string

Sample Input 1:
hello world
hello

Sample Output 1:
1
0 hello

Sample Input 2:
user123admin
user[0-9]{3}[a-z]*

Sample Output 2:
1
0 user123admin, give a python code for this make sure it works and try with the test cases i sent

Answer 1

Below is a compact, self-contained Python reference implementation for the “pattern-matching engine” plus a quick demonstration on the three sample cases you shared.

python
import re


def translate_pattern(p: str) -> str:
    """
    Convert the assignment’s mini-language to a Python regular expression.

    • *           →  .*        (wildcard for any character sequence)
    • [abc]       →  kept as-is (character class)
    • (n,m) / (n,)→  {n,m}/{n,} (custom quantifier → regex quantifier)

    All other characters are passed through unchanged so that ordinary
    regex operators such as +, ?, {k}, etc. keep working.
    """
    out, i, in_class = [], 0, False
    while i < len(p):
        c = p[i]

        # Preserve escapes verbatim
        if c == "\\":
            out.append(c)
            i += 1
            if i < len(p):
                out.append(p[i])
                i += 1
            continue

        # Inside […] character class
        if in_class:
            if c == "]":
                in_class = False
            out.append(c)
            i += 1
            continue

        if c == "[":
            in_class = True
            out.append(c)
            i += 1
            continue

        # Stand-alone wildcard *
        if c == "*":
            out.append(".*")
            i += 1
            continue

        # Custom quantifier: (n,m)  or (n,)
        if c == "(":
            j = i + 1
            n1 = ""
            while j < len(p) and p[j].isdigit():
                n1 += p[j]
                j += 1
            if n1 and j < len(p) and p[j] == ",":
                j += 1
                n2 = ""
                while j < len(p) and p[j].isdigit():
                    n2 += p[j]
                    j += 1
                if j < len(p) and p[j] == ")":
                    out.append("{" + n1 + ("," + n2 if n2 else ",") + "}")
                    i = j + 1
                    continue
            # Not a quantifier → treat ‘(’ literally
            out.append(r"\(")
            i += 1
            continue

        # Any other character
        out.append(c)
        i += 1

    return "".join(out)


def pattern_match(text: str, pattern: str):
    """
    Return a list of (position, matched_substring) for **all** (over-lapping)
    matches of `pattern` in `text`.
    """
    regex = translate_pattern(pattern)
    # (?=(…))  gives an overlapping, zero-width look-ahead; group 1 is the match
    overlapped = re.compile(rf"(?=({regex}))", re.DOTALL)
    matches = [
        (m.start(), m.group(1))
        for m in overlapped.finditer(text)
        if m.group(1) != ""  # avoid infinite loops on empty matches
    ]
    return matches


# ----------------------- demo on your samples -----------------------
if __name__ == "__main__":
    samples = [
        ("hello world", "hello"),
        ("user123admin", "user[0-9]{3}[a-z]*"),
        ("aaaabbbb", "a(2,4)b(1,)"),
    ]

    for idx, (txt, pat) in enumerate(samples, 1):
        ms = pattern_match(txt, pat)
        print(f"Sample {idx}:")
        print(len(ms))
        for pos, sub in ms:
            print(pos, sub)
        print()

How it works

Translator – translate_pattern( ) walks through the pattern once, replacing
- * → .*,
- custom “quantifier” parentheses → regex {n,m} or {n,},
  while keeping character classes […] intact.
  Escaped characters (\*, \(, etc.) are preserved verbatim.
Over-lapping search – re.finditer() with a look-ahead (?=(…)) locates every possible match, even when matches share characters (required for the -aaabbb- example).
Output – For each match we print its 0-based start index and the substring itself, preceded by the total match count – exactly the format shown in the assignment.

Run-time results on the three provided cases

text
Sample 1:
1
0 hello

Sample 2:
1
0 user123admin

Sample 3:
3
0 aaaabbbb
1 aaabbbb
2 aabbbb

You can drop the pattern_match() helper straight into your own project, or keep the full script for a stand-alone command-line tool. It meets the stated constraints (handles text up to 10⁵, pattern up to 100) and relies only on Python’s standard library.

FLOWWW.AI <-- CODING EXAM INSTRUCTIONS DESCRIPTIO...

質問

思考

Designing a pattern matching engine

Parsing quantifiers and wildcards

Building the pattern engine

Designing regex converter

Python

回答

How it works

Run-time results on the three provided cases

このQ&Aを共有