Which regex dialects are supported?

PCRE, JavaScript, Python, .NET — the AI infers the dialect from the pattern.

Does it warn about performance issues?

Yes — patterns with catastrophic backtracking risk get a warning section.

Regex Explainer with AI

Regex explainers: reading patterns without losing your mind

Why regex is hard to read (even for the person who wrote it)

Regular expressions are the most information-dense syntax in mainstream programming. A 30-character pattern can encode behavior that would take 50 lines of imperative code. The cost: a regex looks like line noise to most readers, including the original author six months later. The fact that there are at least four major dialect families (PCRE, JavaScript, Python, .NET) with subtle differences makes things worse.

An AI-powered explainer reads the pattern the way a senior engineer would: token-by-token, calling out the dialect-specific bits, flagging quantifiers, and producing example matches and non-matches. Better than a graphical regex visualizer for explaining intent, because plain English is what you actually need to communicate the pattern to a coworker.

The four major regex dialects

PCRE (Perl-Compatible Regular Expressions) is the most feature-rich and is the basis for grep -P, PHP's preg_*, and many other tools. JavaScript regex is similar but lacks possessive quantifiers and has different Unicode flag semantics (the /u flag changed behavior in ES2015). Python's 're' module is close to PCRE but uses different group naming syntax: (?P<name>...) vs PCRE's (?<name>...). .NET regex (System.Text.RegularExpressions) has unique features like balancing groups for matching nested constructs.

Our explainer auto-detects the likely dialect from the pattern (e.g. (?P<name>) → Python, (?<name>) → PCRE/JS/.NET). When the dialect is ambiguous, it explains the meaning in the most common dialect and notes alternatives where they differ.

Quantifiers: greedy, lazy, and possessive

The most common source of regex bugs is wrong quantifier choice. * + ? {n,m} are 'greedy' by default — they consume as many characters as possible and then backtrack. Adding ? makes them lazy: *? +? ?? {n,m}? — they consume as few as possible and then extend. Possessive quantifiers (*+ ++ ?+) consume greedily without backtracking, which is faster but supported only in PCRE.

Classic bug: matching HTML tags with <.*> instead of <.*?>. Greedy .* consumes everything up to the LAST >, so '<a>foo</a>' matches as <a>foo</a> instead of just <a>. The lazy version matches just <a>. Our explainer flags this as a 'common pitfall' when it sees greedy .* in a pattern that looks like it should be lazy.

Catastrophic backtracking: the silent killer

Some regex patterns can take exponential time on certain inputs. The classic example: (a+)+ on input 'aaaaaaaaaaaaaaab'. The engine tries every possible way to group the a's before concluding there's no match — that's 2^n combinations for n a's. On a 30-character input, the engine can hang for minutes.

Patterns with nested quantifiers on the same character class are the primary culprit: (a*)*, (a|aa)+, (a+a+)+. Even production-quality regex engines (RE2 in Go is the exception) can be made to hang. Our explainer scans for these patterns and adds a 'Warnings' section if it spots them. If your pattern processes untrusted input, this matters for security: attackers can deliberately craft inputs that trigger ReDoS (Regular Expression Denial of Service).

Anchors, boundaries, and look-around

^ and $ match start and end of string (or line, with the /m flag). \b is a word boundary — matches between a word character and a non-word character. \B is the inverse. These don't consume characters, they just assert a position.

Look-around adds the ability to assert what comes before or after without consuming: (?=...) positive lookahead, (?!...) negative lookahead, (?<=...) positive lookbehind, (?<!...) negative lookbehind. Lookbehind has restrictions in some dialects — JavaScript supported variable-width lookbehind only since 2018, Python's 're' module requires fixed width. The explainer notes when lookbehind in your pattern would fail in older engines.

Unicode pitfalls

Default regex behavior is byte-oriented or BMP-oriented. \d matches '0'–'9' in most dialects, but with /u in JavaScript or re.UNICODE in Python, it also matches Arabic-Indic, Devanagari, and other digit scripts. Whether you want that depends — for parsing user input, usually yes; for parsing structured data like JSON, usually no.

Emoji are surprisingly hard. A single emoji can be 1, 2, or up to 8 code points (with ZWJ sequences). [a-z] doesn't match anything sensible against emoji. If your regex needs to handle emoji, use Unicode property escapes: \p{Emoji} (PCRE/JS with /u). The explainer flags character classes that fail on common Unicode input.

When NOT to use regex

Regex is the wrong tool for: parsing HTML/XML (use a parser — the famous Stack Overflow rant), parsing balanced delimiters at depth (regex isn't expressive enough for context-free grammars, with the exception of PCRE's recursive groups), and parsing JSON or YAML.

It's the right tool for: simple substitutions, validating well-defined string formats (email, phone), tokenizing text where you control the alphabet, and quick log searches. The shortest production-grade regex is usually the right one — if your pattern is longer than two lines, consider whether a small parser would be more maintainable.

Regex Explainer with AI

How it works