[Pmwiki-users] $UrlPathPattern defined/commented

Crisses crisses
Sun May 23 04:36:04 CDT 2004


On May 21, 2004, at 6:27 PM, Patrick R. Michaud wrote:

> On Fri, May 21, 2004 at 11:09:11PM +0200, Christian Ridderstr?m wrote:
>> What exactly do we want to allow in a URI? In the source I find
>> 	$UrlPathPattern="[^\\s<>[\\]\"\'()`|^]*[^\\s<>[\\]\"\'()`|^,.?]";
>> I think some comments next to this definition would be nice, or 
>> perhaps a
>> reference to a wiki page where it's discussed.
>
> I'll write it here if someone can cut-n-paste to an appropriate place
> on pmwiki.org:
>
> RFC2396 and RFC2732 (on uri syntax) basically say that a proper uri
> must not contain control characters, spaces, or any of the characters
>    <   >   "   {   }   |   \   ^   `
> All other characters can appear in a uri, although many have special
> meanings depending on where they are used in the uri.
>
> PmWiki's $UrlPathPattern syntax largely follows the RFCs, but also
> takes into consideration the contexts in which uris are likely to
> appear in markup.  The pattern breaks into two parts, the first
> part matches everything before the last character of the uri, and
> the second part matches the last character of the uri:
>
>       [^\\s<>[\\]\"\'()`|^]*         [^\\s<>[\\]\"\'()`|^,.?]
>
> In both parts, space, "<", ">", <">, "`", "|", and "^" are
> disallowed because of the RFC definition.  PmWiki incorrectly
> allows "{", "}", and "\", but this hasn't been an issue in
> practice and can be easily fixed if we want.
>
> Both parts also disallow things that the RFCs allow, such as
> parens, square brackets, and single quotes, under the theory
> that these are more likely to be markup than part of a uri.
>
> Finally, the second part of the pattern is used to prevent
> a trailing period, comma, or question mark from being included
> in the uri, since these will usually be the end of a sentence
> or phrase rather than the last character of a uri.

because people might link to incredibly stupidly long URLs from things 
like Mapquest or whatever, which might include some of these characters 
(the period, comma, question mark, and other things *strictly* allowed 
by convention) -- shouldn't they be allowed within [[URL]] markup, so 
that the wiki doesn't break links people want to "force"?  or PmWiki 
can check for , . ? strictly followed by a space character -- in which 
case they are not part of the link, but part of text syntax. (i.e. 
certain characters followed by '\s+')

are there languages in which period, comma, question mark, etc. are not 
followed by a space character?

Crisses




More information about the pmwiki-users mailing list