[Pmwiki-users] Re: null characters or pattern breaking characters

Patrick R. Michaud pmichaud
Mon Jan 12 10:55:40 CST 2004


On Mon, Jan 12, 2004 at 12:26:31PM +1300, John Rankin wrote:
> 
> I agree with Christian that this is slighlty different from what he 
> wants to accomplish, which is roughly: 'stop here'. As he says, this is
> in effect a zero width space or an invisible comma.
> 
> So I suggest `, (backtick comma) as the markup.

I've done a bit more research and several comments come to mind:

1.  It'd be really handy if the "null character" sequence began
with a character that's already not considered to be part of a valid
URI.  In PmWiki that set is currently
        space  <  >  [  ]  "  '  (  )

If the null character sequence were defined as something like "<,"
(angle bracket+comma, probably not a good choice but I'm using it for
sake of example), then there wouldn't need to be any changes made to 
the $UrlPathPattern, and one could make this a null character by simply
adding
        $InlineReplacements['<,'] = "";
to local.php (or pmwiki.php if we decide to adopt this).

2.  After re-reading RFC 2396 and RFC 2732, it's apparent that there are
a characters that are not allowed in URIs that PmWiki currently
allows.  In particular, the following characters are not allowed 
in the path component of URIs:
        space  <  >  "  {  }  |  \  ^  `
Of course, this doesn't mean that there aren't people and systems that
build URIs using these characters (e.g., the vertical brace)--it just
means that those URIs aren't technically valid.  So, there's a reasonable
argument to be made that PmWiki should add each of the above to the
URI delimiter, which would likely eliminate much of the need for the
null character sequence in the first place (unless I'm missing a case).

3.  On the other hand, PmWiki sometimes departs from rigorously following
a standard in order to be consistent with common practice or meet other 
goals.  For example, parentheses and single quotes *are* valid characters in
a URI, but PmWiki excludes them from the URI sequence because they're
more commonly used in PmWiki as delimiters than as components of other
URIs.  So, just as PmWiki disallows some characters that the URI spec
allows, there may be practical reasons that PmWiki should continue to
allow characters that the URI spec disallows.

4.  Finally, after writing #3 above it occurs to me that we already have
a null character sequence that would work:  ''''  (four single quotes).
PmWiki already excludes single quotes from the URI pattern, and four
single quotes becomes an empty italics sequence.  In fact, this is the
common "null character" sequence in many existing wikis, which use it
for pluralization and alternate endings of WikiWord''''s.  
http://www.pmichaud.com/wiki/Test/InterLinkPattern demonstrates that
this works as desired.

Pm




More information about the pmwiki-users mailing list