[pmwiki-users] lots of problems when redirecting or rewriting URLs
DaveG
pmwiki at solidgone.com
Thu Jan 19 19:15:50 CST 2006
I quote a number of things from the apache docs below. Here is what I'm
referencing: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html
Joachim Durchholz wrote:
> DaveG schrieb:
>> *Addendum:* As I was writing this, a *lot* became clear. The main thing
>> I realized is that the intent of the .htaccess script is to change URLs
>> FROM simple format INTO complex format (refer to definitions below).
>> Pmwiki handles the conversion FROM complex format INTO simple format
>> that is shown on the browser address bar.
>
> Yes.
>
>> --- *Questions*
>> 1] What does $EnablePathInfo = 1; actually do? I think it:
>> a) rewrites URL's on wiki pages to the simple format;
>
> Yes.
>
>> b) rewrites the browser address bar URL to the simple format;
>
> No.
The .htaccess have converted to complex format, which includes a call to
pmiwki.php. Pmwiki must be changing http headers to get the simple url
format in the browser address bar.
> The full picture is:
>
> The browser sends the simple format;
> the rewrite rule transforms to the complex format.
>
> PmWiki doesn't care much what kind of URL led to it (it can't reliably
> infer that anyway). It simply looks first at complex-format info, and if
> that comes up empty, it tries the simple-format (pathinfo) one. (Or
> maybe it's the other way round. I don't really know or care.)
> After it got group and page name, PmWiki doesn't even take another look
> at the URL string (except for action=... and such).
>
> When PmWiki writes links, it doesn't care how it got the link. It simply
> looks at $EnablePathInfo, and if it's TRUE, it emits a simple-format
> URL, and if it's FALSE, it emits a complex-format one.
>
>> Thus, the rewrites only need to convert from incoming simple format
>> to the complex format so that PHP and thus Pmwiki can handle the request.
>
> Yes.
>
>> --- *Definitions*
>> In the text below, I'll use:
>> - "complex URL": URL's in the format "/pmwiki.php?n=Main.HomePage".
>>
>> - "simple URL": URL's in the format "/Main/HomePage".
>>
>>
>> --- *Background*
>> PHP needs incoming URL's to be of the complex format in order to process
>> them correctly.
>
> Yes.
>
> > PHP (and thus pmwiki) cannot handle or process simple
>> URLs, as there is no way to know which parts of the URL are parameters.
>
> No.
> Well, sort of. It turns out that there is no reliable way to extract the
> path info from the URL information that it has.
>
> With Apache, you can (usually!) take the information directly off
> $_ENV['PATHINFO']. You can even do without rewrite rules by inferring
> group and page name from that in your config.php.
> (Might be a good addition to the CleanUrls recipe.)
>
> With IIS, things get more complicated.
>
>> Thus, the rules in .htaccess convert from simple format to complex format.
>
> Yes.
>
>> --- *Analysis*
>> 2] Here's the line by line script analysis, assuming the user enters a
>> simple URL:
>> http://dom.com/~nepherim/pmwiki/Main/HomePage
>>
>> and here's the complex URL we need to convert into so PHP can process:
>> http://dom.com/~nepherim/pmwiki/pmwiki.php?n=Main.HomePage
>
> Yes.
>
>> Pmwiki will handle the the conversion of the "complex URL" back into the
>> simple format we will see in the browser address bar.
>
> Well, sort of - PmWiki just writes simple URLs when it emits its pages.
> After that, no further conversion is needed - the browser picks the
> simple URLs off the HTML pages and displays them.
What I meant here is that pmwiki is rewriting the http header with the
simple url format, as .htaccess has changed it to complex format.
>
> The result of the RewriteRule directive is never seen by the browser!
Because pmwiki rewrites the headers I suspect.
>> #
>> Options +FollowSymLinks
>> Follow existing symbolic links. (Need more detail here.)
>
> The rewrite engine will complain if FollowSymLinks isn't set.
> The rationale is that URL rewriting is strictly more powerful than
> following symbolic links, so if symbolic links are disallowed, rewriting
> should be even less allowed.
>
>> #
>> RewriteEngine on
>> Turn on the rewrite engine.
>
> Yes.
> Without that line, RewriteBase and RewriteRule will be ignored.
>
>> #
>> RewriteBase /~nepherim/pmwiki/
>> Strip out this part of the url, and leave us with whatever follows.
>> Thus, from "http://dom.com/~nepherim/pmwiki/Main/HomePage" we now have
>> "Main/HomePage".
>
> No.
Here I disagree. The Apache docs pretty clearly state that:
"i.e., the local directory prefix is stripped at this stage of
processing and your rewriting rules act only on the remainder. At the
end it is automatically added back to the path."
> I don't properly recall the details, so I have to refer everybody to the
> documentation on http://httpd.apache.org.
> Basically, it's that the rewrite engine is an ugly hack, and this is the
> place where this hackishness surfaces. IIRC it provides the rewrite
> engine with some prefix information that Apache stripped before.
>
>> #
>> RewriteCond %{QUERY_STRING} ^$
>> Something to enable searching. What I'm not sure of is what condition
>> needs to be satisfied to execute the following rewrite.
>
> Nonono.
>
> QUERY_STRING is a part of the URL - anything that goes after ?.
> I.e. ?action=edit would be a valid query string of a URL.
I understand. My point, and question was that RewriteCond is a
conditional diretive. So what I'm not sure of here is *what* condition
it's putting.
>
>> (RewriteCond is basically an IF statement -- if the condition evaluates
>> to true (AND RewriteCond's immediately following evaluate to true) then
>> execute the next RewruteRule directive.)
>>
>> #
>> RewriteRule ^/?$ ~nepherim/pmwiki/Main/HomePage/ [R=permanent,QSA,L]
>> Alter URLs with a trailing "/" or with no trailing "/" to the HomePage.
>> Thus, user entered URLs of "~nepherim/pmwiki" or "~nepherim/pmwiki/"
>> translate to "~nepherim/pmwiki/Main/HomePage/".
>>
>> R=permenant: tell the browser that this redirect is permenant. Default
>> is Temporary.
>
> No. Default is "internal", i.e. Apache immediately takes the newly
> generated URL and serves whatever is behind *that*.
It is internal, but again, this is what I got from the Apache docs:
"If no code is given a HTTP response of 302 (MOVED TEMPORARILY) is used."
>> QSA: Query String Append. If we have queries on the incomming URL, like
>> "?action=edit" then append them to our new URL.
>>
>> L: Stop processing. In this case the next RewriteRule doesn't get
>> processed, and we're done.
>> *** Question: I must be misunderstanding this parameter. Why do we stop?
>> If we stop here then we haven't converted to complex format.
>
> Because a permanent redirect directly returns to the browser, without
> giving it any HTML (but it does return the redirected-to URL).
Seems to be a little different to:
"Stop the rewriting process here and don't apply any more rewriting
rules ... Use this flag to prevent the currently rewritten URL from
being rewritten further by following rules. For example, use it to
rewrite the root-path URL ('/') to a real one, e.g., '/e/www/'."
*However,* if L does stop processing, I don't see how the simple url
gets converted to complex, as that comes in the next statement.
> The browser is then expected to display the new URL in the URL line and
> request it.
>
> IOW if somebody requests the directory, people will be automatically
> redirected to ~nepherim/pmwiki/Main/HomePage/ .
>
>> #
>> RewriteRule ^([^/a-z].*) pmwiki.php?n=$1 [QSA,L]
>> Here we're matching anything which is NOT a lowercase letter (a-z)
>> followed by anything else. What this means in practice is that we're
>> finding the pmwiki group and page name, since pmwiki groups always start
>> with an uppercase character.
>
> Actually with a non-lowercase letter.
Same as uppercase...?
> Such as: special character like $&=, or umlauts, or whatever.
>
>> (Differentiating between upper and lower case also prevents the
>> processing of any internal pmwiki paths that are being used to create
>> the page, like pub, upload, etc.)
>
> Yes.
>
>> The first non-lowercase character we find in "Main/HomePage" is the
>> first "M", so we take that and everything after: "Main/HomePage".
>
> Yes.
>
>> This string is referenced by "$1". So, the Rewrite Rule replaces our
>> "Main/HomePage" with "pmwiki.php?n=Main.HomePage".
>
> Yes.
>
>> As we are done with the script, the Base part of the URL
>> ("/~nepherim/pmwiki/") is now added back to what we created,
>
> That's the step that RewriteBase corrects.
>
> > so we end up with the complex URL PHP and pmwiki needs:
>> /~nepherim/pmwiki/pmwiki.php?n=Main.HomePage
>
> Yes.
>
> HTH :-)
>
> Regards,
> Jo
>
> _______________________________________________
> pmwiki-users mailing list
> pmwiki-users at pmichaud.com
> http://host.pmichaud.com/mailman/listinfo/pmwiki-users
>
More information about the pmwiki-users
mailing list