[pmwiki-users] special characters revealing anchors

Patrick R. Michaud pmichaud at pobox.com
Tue Oct 28 10:03:42 CDT 2008


On Tue, Oct 28, 2008 at 09:42:38AM +0100, Jean-Fabrice [gmail] wrote:
> 2008/10/28 adam overton <a at plus1plus1plus.org>:
> > i recently discovered some broken/visible anchors on a user's page.
> > his use of a special character at the beginning of the anchor seems
> > to be the culprit, as it causes the anchor to become visible (special
> > characters within a word don't seem to cause problems). here is an
> > example;
> >
> >     [[#àdroite]]
>
> afaik, pmwiki respects w3c standards and recommendations while this
> syntax ([[#àdroite]]) does not.
> Take a look at http://www.w3.org/TR/REC-html40/types.html#type-name :
> ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
> followed by any number of letters, digits ([0-9]), hyphens ("-"),
> underscores ("_"), colons (":"), and periods (".").

This is correct -- PmWiki follows the w3c standards here, and only
recognizes A-Za-z, digits, hyphens, underscores, colons, and periods
in anchors.  Anything else causes PmWiki to not recognize the [[#...]]
as an anchor.

There has been some discussion of getting PmWiki to automatically fold
non-ASCII characters into the ASCII set, so that [[#àdroite]] would
generate "adroite" in the anchor tag, and thus be valid HTML.  But 
this would undoubtedly confuse people because a url ending with 
...#àdroite would not find the anchor.

It's also possible to redefine the anchor rule so that it
recognizes non-ASCII characters in anchors and uses them in the
output; this of course results in invalid HTML if used (at least
according to the current spec).

Pm



More information about the pmwiki-users mailing list