[Pmwiki-users] Help with PHP regexp
John Feezell
johnfeezell
Tue Jan 6 04:32:02 CST 2004
Thanks Pm for taking the time to walk through these it helps a WHOLE BUNCH.
-JF
On Mon, 5 Jan 2004 20:04:56 -0700, Patrick R. Michaud <pmichaud at pobox.com>
wrote:
> On Mon, Jan 05, 2004 at 11:39:01AM -0600, John Feezell wrote:
>> I recently began studying PHP regular expressions so that I could use
>> them with PmWiki and FTS. I have material from the PHP manual but would
>> like to know how others on the list have gained knowledge of these -
>> websites, books, etc..
>
> Practice, and just playing with them.
>
>> It would be helpful to see an analysis of one or two of them as they
>> relate to PmWiki.
>
> Gladly! PmWiki is largely based on regular expression matching. In
> fact,
> I've often thought that I could potentially write PmWiki's text
> processing
> engine as a sequence of regular expression match/replacement actions, but
> decided that was a bad idea (feels too much like Sendmail's
> configuration...)
>
> I'll explain each of the patterns below as best I can...
>
>> For example I'm studying the following from PmWiki.php
>> $GroupNamePattern="[A-Z][A-Za-z0-9]+";
>
> A wiki group name starts with an uppercase letter and is followed by one
> or
> more letters or digits.
>
>> $WikiWordPattern="[A-Z][A-Za-z0-9]*(?:[A-Z][a-z0-9]|[a-z0-9][A-Z])[A-Za-
>> z0-9]*";
>
> A bit more complex. Essentially this pattern says that a WikiWord has to
> begin with an uppercase letter, and must have at least one more uppercase
> letter and one lowercase letter or digit (in any order). The ?: after
> the
> opening parenthesis says that the parens are for grouping only and are
> not
> a capturing subpattern. The part within the parens matches an uppercase
> letter followed by a lowercase letter or digit, or vice-versa.
>
>> $FreeLinkPattern="{{(?>([A-Za-z][A-Za-z0-9]*(?:(?:[\\s_]*|-)[A-Za-z0-
>> 9]+)*) (?:\\|((?:(?:[\\s_]*|-)[A-Za-z0-9])*))?)}}((?:-?[A-Za-z0-9]+)*)";
>
> Ths is probably the most difficult pattern in PmWiki--it took me a while
> to build this one. I'll take out some of the optimizing paren constructs
> to explain it. A freelink consists of two curly braces, {{
> followed by a word, [A-Za-z][A-Za-z0-9]* followed
> by zero or more words delimited by whitespace, underscores,
> or single hyphens, (([\\s_]*|-)[A-Za-z0-9]+)*
> optionally followed by a vertical brace
> and zero or more words delimited by
> whitespace, underscores, or single
> hyphens, (\\|(([\\s_]*|-)[A-Za-z0-9]*))?
> followed by two curly braces, }}
> followed by any sequence of letters. (-?[A-Za-z0-9]+)*
>
> Again, the ?: after a paren indicates a non-capturing subpattern, and
> the ?> after the first parenthesis helps to optimize the regex match.
>
>> $FragmentPattern="#[A-Za-z][-.:\\w]*";
>
> A simple one--a link fragment consists of a '#', followed by a letter,
> followed by any sequence of hyphens, dots, colons, or alphanumeric
> characters.
>
>> $PageTitlePattern="[A-Z][A-Za-z0-9]*(?:-[A-Za-z0-9]+)*";
>
> A page title is any sequence of words (can be separated by single
> hyphens).
>
>> $UrlPathPattern="[^\\s<>[\\]\"\'()]*[^\\s<>[\\]\"\'(),.?]";
>
> The path component of a URL contains any character EXCEPT whitespace,
> angle brackets <>, square brackets [], quotation marks "', or
> parenthesis.
> In addition, a URL doesn't end in a comma, period, or question mark.
>
> Questions and comments welcomed.
>
> Pm
>
>
>
--
More information about the pmwiki-users
mailing list