[pmwiki-users] Alphabetize by Title Without Leading Articles or Special Characters

Jake Wartenberg jake at jakewartenberg.com
Tue Mar 22 01:01:08 CDT 2016


This works great.  Thanks, Petko.

On Mon, Mar 21, 2016 at 1:45 AM, Petko Yotov <5ko at 5ko.fr> wrote:

> The previous patterns didn't capture a quote that precedes words other
> than The/A/An. Here is another pattern:
>
>   '/^ *((The|An?) +|")+/i'
> or
>   '/^ *(The +|An? +|")+/i'
>
> It looks like yours, but adds a plus after the parentheses, meaning that
> the things in the parentheses can appear once or more times.
>
> Alternatively, if the pattern becomes or feels too complex, you can
> separate it in different patterns that will be replaced one after another.
> In this case place the two patterns in array(...):
>
>   $FmtPV['$TitleNoArticle'] =
>     'preg_replace(array("/^ *\"/", "/^ *(?:The|An?) /i"), "",
> (@$page["title"] ? $page["title"] : $AsSpacedFunction($name)), 1)';
>
> Wow that actually looks more complex than the combined pattern! :-)
>
> Petko
>
>
> On 2016-03-20 23:00, Petko Yotov wrote:
>
>> You can use a search pattern like this:
>>
>>   '/^ *"?(The|An?) +/i'
>>
>> This assumes that the quote always comes before the rest of the title.
>>
>> The question mark after the quote means that there can be zero or one
>> quote, like the question mark after the An means that there can be a
>> single "n" or none (both artcles A and An will be found).
>>
>> In some languages (French) typographical rules may require to have a
>> space between the quote and the text, in that case you could have
>>
>>   '/^ *"? *(The|An?) +/i'
>>
>> the asterisk after the space means that there can be zero or more spaces.
>>
>> The plus after the last space means that there can be one or more
>> spaces after A, An, The.
>>
>> In most cases, several regular expressions can be written to match the
>> exact same strings.
>>
>> The pattern on the cookbook has a tiny optimization but is harder to
>> understand for a beginner. In most cases, several regular expressions
>> can be written to match the exact same strings, you can have fun. :-)
>>
>> Petko
>>
>> On 2016-03-20 21:21, Jake Wartenberg wrote:
>>
>>> I suspect this is a pretty basic regex question.  I have quotation marks
>>> at
>>> the beginning of some of my titles, so I modified the TitleNoArticle
>>> function from the CustomPagelistSortOrderFunctions Cookbook page:
>>>
>>> 'preg_replace("/^ *((?:The|An?) |\")/i", "", (@$page["title"] ?
>>> $page["title"] : $AsSpacedFunction($name)), 1)';
>>>
>>> This works great for discarding the leading quotation marks, but I run
>>> into
>>> a problem when I have *both* a leading quotation mark and an article
>>> (the/a/an) at the beginning of my title, in which case the page gets
>>> alphabetized under the article.  I would be really grateful to anyone who
>>> could show me how to modify the function to account for this.
>>>
>>
> _______________________________________________
> pmwiki-users mailing list
> pmwiki-users at pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pmichaud.com/pipermail/pmwiki-users/attachments/20160322/9775a3fc/attachment.html>


More information about the pmwiki-users mailing list