[pmwiki-users] Re: Modified (:markup:)

Joachim Durchholz jo at durchholz.org
Sun Mar 20 16:38:41 CST 2005


chr at home.se wrote:
> Could this be made to nest properly?

Nesting is a *very* hairy issue if parsing is regex-based.

With standard regexes, it's impossible. (The CS slogan is "regexes can't 
count". To do proper nesting, they'd have to count opening and closing 
parentheses and suspend matching until the count reaches zero.)

Perl-style regexes aren't standard regexes. However, even these can't 
count well enough to do nesting.

There's an experimental "recursive pattern" feature, but the 
descriptions on 
http://www.php.net/manual/de/reference.pcre.pattern.syntax.php sounded 
quite unattractive to me:
* needs PHP >= 5.0 (Debian woody currently is at 4.1.3)
* seems to require merging all delimiter pairs into a single regex
* inner structures need to be reparsed
   (substructures aren't capturable in $1, $2, ... variables)
* needs once-only subpattern hackery to avoid inefficiencies

Alternatively, we could roll our own parsing machinery. The ruleset 
machinery would remain largely unchanged, just the nestable rules would 
need to have the replacement substitution to be deferred until after the 
parser had a chance to establish the nesting structure. ("Parser" is one 
of those scare words, but parsing for parenthese-style constructs isn't 
really difficult.)
Nothing in this is exactly rocket science, but getting the details right 
requires some careful design, and actually implementing it would also 
take some serious effort.

Personally, I'd go for it despite the workload.
But that's just my personal preference, others might find other things 
more relevant.

Just my 5c.

Regards,
Jo




More information about the pmwiki-users mailing list