[Pmwiki-users] [Q] ?pagename=Main/WikiSandbox vs. /Main/WikiSandbox

Patrick R. Michaud pmichaud
Sat Apr 17 08:13:54 CDT 2004


On Sat, Apr 17, 2004 at 12:50:11AM -0500, John Feezell wrote:
> I'm trying to understand the difference between these two ways of 
> addressing pages in PmWiki.
> In particular, what http (GET), php, or PmWiki variable, or combinations 
> of them, needs to be set to cause /Main/WikiSandbox to be understood by 
> PmWiki as if ?pagename=Main/WikiSandbox had been entered in the URI?

Since this question continues to arise from time to time, I'm going to
take some time here to try to provide a complete answer.  The short answer
to the question above is that a URL such as

   http://www.pmichaud.com/wiki?pagename=Main.WikiSandbox       

works in most every PHP environment, while URLs in the form

   http://www.pmichaud.com/wiki/Main/WikiSandbox               

depend almost entirely on the configuration of the web server software
(e.g., Apache, IIS) and how PHP has been installed.  Basically, PmWiki
is entirely at the mercy of the web server administrator as far as making
URLs of the second form function--i.e., it has to be supported by the
webserver software itself and there aren't any PHP or PmWiki variables 
that can be used to get it to work if the webserver software isn't
configured to support it.

First, a bit of background about URLs and web server scripts in general.
Programs that run on a webserver and generate documents in response to
queries generally use what is called the "Common Gateway Interface", or
CGI [1].  CGI is the standard for interfacing applications, such as PmWiki,
with web servers such as Apache and IIS, and it specifies how information
from a URL is made available from the webserver to an application program.

CGI makes use of two environment variables to send parameters to an
application [2].  One is called QUERY_STRING, and this is the portion 
of a URL that follows the first '?'.  Thus, in the URL 

   http://www.pmichaud.com/wiki?pagename=Main.WikiSandbox

the QUERY_STRING portion would be "pagename=Main.WikiSandbox".  The other
environment variable for passing parameters to an application is 
PATH_INFO, and this is any "extra" information placed in the URL after 
the path to the application itself.  Thus, in a URL such as

   http://www.pmichaud.com/pmwiki/pmwiki.php/Main/WikiSandbox   

the webserver determines that /pmwiki/pmwiki.php is the application
to be executed in response to this request, and the remaining 
"/Main/WikiSandbox" portion of the path is placed in the PATH_INFO
environment variable for the application to use.

Just for completeness, note that a URL can contain both a PATH_INFO
and a QUERY_STRING, as in

   http://www.pmichaud.com/pmwiki/pmwiki.php/Main/WikiSandbox?action=edit

where PATH_INFO becomes "/Main/WikiSandbox" and QUERY_STRING becomes
"action=edit".

The above is how things are *supposed* to work according to the CGI
specification.  Unfortunately, while almost every webserver with PHP 
can correctly handle the QUERY_STRING information to PmWiki, many 
cannot understand or process the PATH_INFO parameter.  Some webservers look 
at a URL like http://www.pmichaud.com/pmwiki/pmwiki.php/Main/WikiSandbox and 
(incorrectly) treat it as a request for a file named "Sandbox" within
a directory called "pmwiki/pmwiki.php/Main".  Since no such file exists
by this name (as this is not how PmWiki stores/serves pages), the 
webserver returns a "404 Not Found" error response to this request.
Note that in such a configuration the webserver returns the error 
without ever calling PHP or pmwiki.php, thus there's no chance for PmWiki
to intercept or "fix" the request, or even to return a message to explain
what is going on.

IIS is generally shipped with PATH_INFO handling disabled by default,
while Apache 2 uses an "AcceptPathInfo" directive to control PATH_INFO
handling (and defaults to "No" for PHP scripts in many installations,
notably including Red Hat 9).  Apache 1.3 correctly processes PATH_INFO,
but many webhosting providers run customized versions of Apache 1.3 that
end up breaking PATH_INFO.

And, even if the webserver software is configured to handle PATH_INFO
correctly, some PHP installations still cannot understand PATH_INFO--
especially if the PHP interpreter itself is running as an external 
CGI handler (as opposed to having the PHP interpreter run as part of 
the webserver).  In such cases it's often possible to customize PmWiki
to correctly deduce PATH_INFO information even though PHP isn't providing
it according to the spec, but there are so many variations in the ways
I've seen PHP configured that it's nearly impossible to provide a generic
solution.  In addition, the PHP documentation describing how to execute 
PHP as a CGI script is particularly poor and often outright wrong [3,4].

So, confronted by all of this, what can one do to achieve URLs that
look like ".../Main/WikiSandbox" instead of "...?pagename=Main.WikiSandbox"?
The first thing to do is to see if your webserver will even support
URLs with PATH_INFO information.  One can do this test with a simple PHP
script--just create a script called 'phpinfo.php' containing the line

   <?php phpinfo(); ?>

Then, execute the phpinfo.php script twice--once just normally, and once
with a PATH_INFO request:

   http://myserver.com/path/to/phpinfo.php
   http://myserver.com/path/to/phpinfo.php/with/path/info

The first URL should always work.  If the second URL returns a
"page not found" (404) error, then your webserver either has PATH_INFO
disabled or otherwise doesn't support it, and the only way to get it
to work will be to play with the webserver configuration a bit.

If the second URL returns a phpinfo page and not an error, then it 
should be possible to get '.../Main/WikiSandbox' URLs to work.  
It just becomes a matter of figuring out which environment variable 
holds the correct page name and setting PmWiki's $pagename variable 
to that value in config.php.

Even if the webserver isn't configured to handle PATH_INFO, one can
often find workarounds.  For example, Apache has a mod_rewrite
module that allows URLs to be manipulated before they are converted
to resources.  In particular, mod_rewrite can be used to rewrite
a URL from the ".../Main/WikiSandbox" into the 
"...?pagename=Main.WikiSandbox" form before other processing is done.
However, anytime that URLs are being rewritten or aliased in the webserver,
one typically has to also set values for $ScriptUrl, $PubDirUrl, and
$UploadUrlFmt in config.php, since PmWiki can no longer deduce the
values of these variables from the URL.

Also, many webserver/PHP environments that cannot process PATH_INFO 
information for a PHP script can still do so for bash or Perl CGI scripts. 
In these cases it's often possible to write a short wrapper script
in bash or Perl that correctly sets the environment variables and calls
the PHP interpreter directly.  This approach often has some other
advantages, such as getting around safe_mode restrictions and allowing
PmWiki's files to be owned by the account holder instead of the webserver
(a discussion for another lengthy message).

Anyway, that's the story behind page addressing in PmWiki, I hope it
makes some sort of sense.  Of course, I'll be very happy to expand further
on any of the points/ideas raised above, or to provide help/advice in 
configuring specific environments.

Pm

References

1.  http://hoohoo.ncsa.uiuc.edu/cgi/ - Common Gateway Interface specification.
    Note that this specification is from 1996, and is arguably out of
    date with respect to many existing web practices and implementations.

2.  http://hoohoo.ncsa.uiuc.edu/cgi/primer.html

3.  http://www.php.net/manual/en/install.commandline.php

4.  http://www.php.net/security.cgi-bin    



More information about the pmwiki-users mailing list