[Pmwiki-users] $UrlPathPattern defined/commented
Crisses
crisses
Sun May 23 04:36:04 CDT 2004
On May 21, 2004, at 6:27 PM, Patrick R. Michaud wrote:
> On Fri, May 21, 2004 at 11:09:11PM +0200, Christian Ridderstr?m wrote:
>> What exactly do we want to allow in a URI? In the source I find
>> $UrlPathPattern="[^\\s<>[\\]\"\'()`|^]*[^\\s<>[\\]\"\'()`|^,.?]";
>> I think some comments next to this definition would be nice, or
>> perhaps a
>> reference to a wiki page where it's discussed.
>
> I'll write it here if someone can cut-n-paste to an appropriate place
> on pmwiki.org:
>
> RFC2396 and RFC2732 (on uri syntax) basically say that a proper uri
> must not contain control characters, spaces, or any of the characters
> < > " { } | \ ^ `
> All other characters can appear in a uri, although many have special
> meanings depending on where they are used in the uri.
>
> PmWiki's $UrlPathPattern syntax largely follows the RFCs, but also
> takes into consideration the contexts in which uris are likely to
> appear in markup. The pattern breaks into two parts, the first
> part matches everything before the last character of the uri, and
> the second part matches the last character of the uri:
>
> [^\\s<>[\\]\"\'()`|^]* [^\\s<>[\\]\"\'()`|^,.?]
>
> In both parts, space, "<", ">", <">, "`", "|", and "^" are
> disallowed because of the RFC definition. PmWiki incorrectly
> allows "{", "}", and "\", but this hasn't been an issue in
> practice and can be easily fixed if we want.
>
> Both parts also disallow things that the RFCs allow, such as
> parens, square brackets, and single quotes, under the theory
> that these are more likely to be markup than part of a uri.
>
> Finally, the second part of the pattern is used to prevent
> a trailing period, comma, or question mark from being included
> in the uri, since these will usually be the end of a sentence
> or phrase rather than the last character of a uri.
because people might link to incredibly stupidly long URLs from things
like Mapquest or whatever, which might include some of these characters
(the period, comma, question mark, and other things *strictly* allowed
by convention) -- shouldn't they be allowed within [[URL]] markup, so
that the wiki doesn't break links people want to "force"? or PmWiki
can check for , . ? strictly followed by a space character -- in which
case they are not part of the link, but part of text syntax. (i.e.
certain characters followed by '\s+')
are there languages in which period, comma, question mark, etc. are not
followed by a space character?
Crisses
More information about the pmwiki-users
mailing list