[pmwiki-devel] lower casing diacritics

Dominique Faure dominique.faure at gmail.com
Sat Oct 22 02:51:11 PDT 2022


You should perhaps specify the 'UTF-8' encoding to the mb_strtolower call.

On Wed, Oct 19, 2022 at 10:35 AM Simon <nzskiwi at gmail.com> wrote:
>
> Some background.
> I am trying to update the SearchCloud recipe.
>
> The recipe grabs the q parameter of a search action.
>
> I want it to
> * make the search terms insensitive
> * handle characters with diacritics.
>
> Here is some debug output
>
> 2022-10-19 21:26:01
> q="SĀÉÎÖŬ-àęiøűd"
> $SCrq="SĀÉÎÖŬ-àęiøűd"
> tkey1="SÄ€ÉÎÖŬ-àÄ™iøűd"
> tkey2="sÄ ???Å­-?Ä™i?űd"
> tkey3="sÄ ???Å­-?Ä™i?űd"
>
> Generated from debug code
>       $convmap = array (0x80, 0xffff);
>       $q     = strval($_REQUEST['q']); # get search term
>       $SCrq  = trim (\stripmagic($q));
>       $tkey1 = html_entity_decode($SCrq); # remove html entities to allow lower case conversion
>       $tkey2 = mb_strtolower($tkey1); # convert to lower case
>       $tkey3 = mb_encode_numericentity ($tkey2, $convmap); # convert non-ascii to htmlentities
>       $fwritestatus = fwrite($logfilehandle, $logfiletime
>       . 'q="' . $q
>       . '" $SCrq="' . $SCrq
>       . '" tkey1="' . $tkey1
>       . '" tkey2="' . $tkey2
>       . '" tkey3="' . $tkey3 . '"'
>
> As you can see in the debug output it seems to fall apart at tkey2.
> I'd welcome more suggestions
>
>
>
> On Tue, 18 Oct 2022 at 23:39, Petko Yotov <5ko at 5ko.fr> wrote:
>>
>> You may be able to use:
>>
>>    $entity = mb_convert_encoding($decoded, 'HTML');
>>
>>
>> You may or may not need to specify a $from_encoding argument. From the
>> documentation it seems before PHP 8.0 $from_encoding was required.
>> Documentation:
>>
>>    https://php.net/mb_convert_encoding
>>
>> Petko
>>
>> --
>> If you upgrade :  https://www.pmwiki.org/Upgrades
>>
>>
>> On 18/10/2022 12:17, Simon wrote:
>> > Again, thanks heaps for answering these newbie questions, that works.
>> > What I think I have found is that while html_entity_decode('Ē')
>> > gives "Ē"
>> > htmlentities ("Ē") doesn't convert Ē back to  Ē
>> >
>> > Simon
>> >
>> > On Tue, 18 Oct 2022 at 19:18, Petko Yotov <5ko at 5ko.fr> wrote:
>> >
>> >> You can use mb_strtolower():
>> >>
>> >> https://php.net/mb_strtolower
>> >>
>> >> Here is an example from the PHP interactive shell:
>> >>
>> >> php > $str = "e.g. Ā to ā, Ê to ê, Į to į, etc";
>> >> php > print_r(mb_strtolower($str));
>> >> e.g. ā to ā, ê to ê, į to į, etc
>> >> php > print_r(mb_strtoupper($str));
>> >> E.G. Ā TO Ā, Ê TO Ê, Į TO Į, ETC
>> >>
>> >> Petko
>> >>
>> >> --
>> >> If you upgrade :  https://www.pmwiki.org/Upgrades
>> >>
>> >> On 18/10/2022 06:46, Simon wrote:
>> >>> Can anyone suggest a means of converting diacritic [1]characters
>> >> to
>> >>> lower case,
>> >>> e.g. Ā to ā, Ê to ê, Į to į, etc
>> >>> other than creating a translation table?
>> >>>
>> >>> thanks
>> >>>
>> >>> Simon
>> >>>
>> >>>
>> >>>
>> >>> Links:
>> >>> ------
>> >>> [1] https://en.wikipedia.org/wiki/Diacritic
>> >>> _______________________________________________
>> >>> pmwiki-devel mailing list
>> >>> pmwiki-devel at pmichaud.com
>> >>> http://www.pmichaud.com/mailman/listinfo/pmwiki-devel
>
> _______________________________________________
> pmwiki-devel mailing list
> pmwiki-devel at pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-devel



More information about the pmwiki-devel mailing list