[pmwiki-users] Adding markup for Wikiwords for technical documentation

Joachim Durchholz jo at durchholz.org
Thu Jun 9 11:42:17 CDT 2005


Jan Jacobs wrote:
> 1) a word consisting entirely out of uppercase characters (with or 
> without numerics).

[A-Z0-9]+

i.e. something that consists of a sequence of uppercase letters and digits.

You probably don't want something that consists just of digits. That 
would be something that has one uppercase letter, and (possibly empty) 
sequences of letters/digits before and after:
   [A-Z0-9]*[A-Z][A-Z0-9]*

If the first character cannot be a digit, you get
   [A-Z][A-Z0-9]*

> 2) a sequence of characters, not separated by a whitespace character, 
> not starting with an uppercase character, but containing 1 or more 
> uppercase characters.  The entire sequence of characters should be 
> turned into a wikiword.

Um... you probably don't want to wikify character sequences that contain 
punctuation. Otherwise, in the previous sequence, you'd end up with 
"don't" wikified, and "m..." (that's the first word, without the capital 
U - a "character sequence" may be preceded with an uppercase character).

>           o tTable
>           o tTable_With_Underscores
>           o un_Filehandling

Seems like you're after sequences of letters and underscores (and, since 
this looks like programming language variables, digits as well). To 
avoid wikifying "lah" in "Blah", the pattern must not be applied 
immediately after an uppercase letter, so we need a "lookbehind 
assertion" (that's a condition that says "apply this pattern only after 
such-and-so"). The lookbehind should say "after anything but an 
uppercase letter", which is [^A-Z]. Putting this together we have
   (?<=[^A-Z])[a-zA-Z0-9_]+
IOW this will match all strings consisting of letters, digits, and 
underscores that follow anything but an uppercase letter.

Hope this gets you started.

Regards,
Jo




More information about the pmwiki-users mailing list