[pmwiki-devel] regex question

Peter Bowers pbowers at pobox.com
Thu Aug 27 10:04:43 CDT 2009


On Thu, Aug 27, 2009 at 5:42 AM, Hans<design5 at softflow.co.uk> wrote:
> I'd like to ask another question, which is a bit like the reversal of
> the former:
>
> I like to fix orphaned @] and =] strings in a text row, i.e.
> if no previous [= or [@ is found to a =] or @]
> then I like to add the corresponding [= or [@ to the beginning of the
> row.
>
> I found one solution involving two preg_match expressions,
> the first checks if the row has no [@.....@] or [=....=],
> the second than checks for a @] or =],
> but I hope there is a simpler way than this:
>
> if(!preg_match("/\\[([=@])(.*?)\\1\\]/", $row)
>   && preg_match("/([=@])\\]/", $row, $m))
>        $row = "[".$m[1].$row;

(The above will take a line like "abc @] def [@ ghi" and see it as
properly matched - not sure if that is an issue or not)

There is the complicating factor that there can be multiple [@...@]
pairs on a single line and it may not be the first one that is
orphaned.  There is also the complicating factor that these pairs can
(in fact, most often do) span multiple lines so the last one will
"look like" an orphan but will, in fact, be OK.  But I'm assuming you
have functional specs that make exceptions like that irrelevant.  So
if I can restate the problems as this then the below solution might
help: "Find lines containing @] which are not preceded by [@ on that
line."

It would be nice to just use a negative lookbehind, but in most
flavors of regex those have to be fixed length (or, at most,
alternating within fixed length patterns).

$foo = array("[@ abc @]", "asdf [@ asdf @]", "asdf @] asdf [@ asdf", "asdf @]",
    "asdf [= asdf @] asdf");
foreach ($foo as $line) {
    echo "Testing: |$line| ";
    if (preg_match('/^(?:\[[^@]|[^[])*[@]\]/', $line)) echo "ORPHAN";
    else echo "GOOD";
    echo "<br>\n";
}

That only allows for the [@...@] pair and I'm not immediately clear on
how to generalize it to be either [@...@] or [=...=].  (Any
generalized attempts I've come up with using this solution --
[/^(?:\[[^@=]|[^[])*[@=]\]/ -- end up seeing [@...=] or [=...@] as a
properly matched pair.)

Explanation: Anchor at the beginning of the line; then search for
either (an open-square-bracket followed by a non-@ character OR a
character that is not an open-square-bracket) repeated * times and
followed by @].

But if you can't figure out how to generalize it you're going to end
up with 2 preg_match's anyway with this solution...

The offset argument to preg_match() would allow you to handle multiple
pairs and identify only the mis-matched ones (for instance, going
through multi-line $page['text]), but chances are the automated "fix"
in that case would not be what the user originally intended...

-Peter



More information about the pmwiki-devel mailing list