[pmwiki-users] Alphabetize by Title Without Leading Articles or Special Characters

Petko Yotov 5ko at 5ko.fr
Mon Mar 21 00:45:47 CDT 2016


The previous patterns didn't capture a quote that precedes words other 
than The/A/An. Here is another pattern:

   '/^ *((The|An?) +|")+/i'
or
   '/^ *(The +|An? +|")+/i'

It looks like yours, but adds a plus after the parentheses, meaning that 
the things in the parentheses can appear once or more times.

Alternatively, if the pattern becomes or feels too complex, you can 
separate it in different patterns that will be replaced one after 
another. In this case place the two patterns in array(...):

   $FmtPV['$TitleNoArticle'] =
     'preg_replace(array("/^ *\"/", "/^ *(?:The|An?) /i"), "", 
(@$page["title"] ? $page["title"] : $AsSpacedFunction($name)), 1)';

Wow that actually looks more complex than the combined pattern! :-)

Petko

On 2016-03-20 23:00, Petko Yotov wrote:
> You can use a search pattern like this:
> 
>   '/^ *"?(The|An?) +/i'
> 
> This assumes that the quote always comes before the rest of the title.
> 
> The question mark after the quote means that there can be zero or one
> quote, like the question mark after the An means that there can be a
> single "n" or none (both artcles A and An will be found).
> 
> In some languages (French) typographical rules may require to have a
> space between the quote and the text, in that case you could have
> 
>   '/^ *"? *(The|An?) +/i'
> 
> the asterisk after the space means that there can be zero or more 
> spaces.
> 
> The plus after the last space means that there can be one or more
> spaces after A, An, The.
> 
> In most cases, several regular expressions can be written to match the
> exact same strings.
> 
> The pattern on the cookbook has a tiny optimization but is harder to
> understand for a beginner. In most cases, several regular expressions
> can be written to match the exact same strings, you can have fun. :-)
> 
> Petko
> 
> On 2016-03-20 21:21, Jake Wartenberg wrote:
>> I suspect this is a pretty basic regex question.  I have quotation 
>> marks at
>> the beginning of some of my titles, so I modified the TitleNoArticle
>> function from the CustomPagelistSortOrderFunctions Cookbook page:
>> 
>> 'preg_replace("/^ *((?:The|An?) |\")/i", "", (@$page["title"] ?
>> $page["title"] : $AsSpacedFunction($name)), 1)';
>> 
>> This works great for discarding the leading quotation marks, but I run 
>> into
>> a problem when I have *both* a leading quotation mark and an article
>> (the/a/an) at the beginning of my title, in which case the page gets
>> alphabetized under the article.  I would be really grateful to anyone 
>> who
>> could show me how to modify the function to account for this.



More information about the pmwiki-users mailing list