[pmwiki-devel] lower casing diacritics

Simon nzskiwi at gmail.com
Wed Oct 19 01:33:34 PDT 2022


Some background.
I am trying to update the SearchCloud recipe.

The recipe grabs the q parameter of a search action.

I want it to
* make the search terms insensitive
* handle characters with diacritics.

Here is some debug output

2022-10-19 21:26:01
*q*="SĀÉÎÖŬ-àęiøűd"
*$SCrq*="SĀÉÎÖŬ-àęiøűd"
*tkey1*="SÄ€ÉÎÖŬ-àÄ™iøűd"
*tkey2*="sÄ ???Å­-?Ä™i?űd"
*tkey3*="sÄ ???Å­-?Ä™i?űd"

Generated from debug code
      $convmap = array (0x80, 0xffff);
      $q     = strval($_REQUEST['q']); # get search term
      $SCrq  = trim (\stripmagic($q));
      $tkey1 = *html_entity_decode*($SCrq); # remove html entities to allow
lower case conversion
      $tkey2 = *mb_strtolower*($tkey1); # convert to lower case
      $tkey3 = *mb_encode_numericentity* ($tkey2, $convmap); # convert
non-ascii to htmlentities
      $fwritestatus = fwrite($logfilehandle, $logfiletime
      . 'q="' . $q
      . '" $SCrq="' . $SCrq
      . '" tkey1="' . $tkey1
      . '" tkey2="' . $tkey2
      . '" tkey3="' . $tkey3 . '"'

As you can see in the debug output it seems to fall apart at tkey2.
I'd welcome more suggestions



On Tue, 18 Oct 2022 at 23:39, Petko Yotov <5ko at 5ko.fr> wrote:

> You may be able to use:
>
>    $entity = mb_convert_encoding($decoded, 'HTML');
>
>
> You may or may not need to specify a $from_encoding argument. From the
> documentation it seems before PHP 8.0 $from_encoding was required.
> Documentation:
>
>    https://php.net/mb_convert_encoding
>
> Petko
>
> --
> If you upgrade :  https://www.pmwiki.org/Upgrades
>
>
> On 18/10/2022 12:17, Simon wrote:
> > Again, thanks heaps for answering these newbie questions, that works.
> > What I think I have found is that while html_entity_decode('Ē')
> > gives "Ē"
> > htmlentities ("Ē") doesn't convert Ē back to  Ē
> >
> > Simon
> >
> > On Tue, 18 Oct 2022 at 19:18, Petko Yotov <5ko at 5ko.fr> wrote:
> >
> >> You can use mb_strtolower():
> >>
> >> https://php.net/mb_strtolower
> >>
> >> Here is an example from the PHP interactive shell:
> >>
> >> php > $str = "e.g. Ā to ā, Ê to ê, Į to į, etc";
> >> php > print_r(mb_strtolower($str));
> >> e.g. ā to ā, ê to ê, į to į, etc
> >> php > print_r(mb_strtoupper($str));
> >> E.G. Ā TO Ā, Ê TO Ê, Į TO Į, ETC
> >>
> >> Petko
> >>
> >> --
> >> If you upgrade :  https://www.pmwiki.org/Upgrades
> >>
> >> On 18/10/2022 06:46, Simon wrote:
> >>> Can anyone suggest a means of converting diacritic [1]characters
> >> to
> >>> lower case,
> >>> e.g. Ā to ā, Ê to ê, Į to į, etc
> >>> other than creating a translation table?
> >>>
> >>> thanks
> >>>
> >>> Simon
> >>>
> >>>
> >>>
> >>> Links:
> >>> ------
> >>> [1] https://en.wikipedia.org/wiki/Diacritic
> >>> _______________________________________________
> >>> pmwiki-devel mailing list
> >>> pmwiki-devel at pmichaud.com
> >>> http://www.pmichaud.com/mailman/listinfo/pmwiki-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pmichaud.com/pipermail/pmwiki-devel/attachments/20221019/cb560f63/attachment.html>


More information about the pmwiki-devel mailing list