[pmwiki-devel] lower casing diacritics
Simon
nzskiwi at gmail.com
Wed Oct 19 01:33:34 PDT 2022
Some background.
I am trying to update the SearchCloud recipe.
The recipe grabs the q parameter of a search action.
I want it to
* make the search terms insensitive
* handle characters with diacritics.
Here is some debug output
2022-10-19 21:26:01
*q*="SĀÉÎÖŬ-àęiøűd"
*$SCrq*="SĀÉÎÖŬ-àęiøűd"
*tkey1*="SÄ€ÉÎÖŬ-àÄ™iøűd"
*tkey2*="sÄ ???Å-?Ä™i?űd"
*tkey3*="sÄ ???Å-?Ä™i?űd"
Generated from debug code
$convmap = array (0x80, 0xffff);
$q = strval($_REQUEST['q']); # get search term
$SCrq = trim (\stripmagic($q));
$tkey1 = *html_entity_decode*($SCrq); # remove html entities to allow
lower case conversion
$tkey2 = *mb_strtolower*($tkey1); # convert to lower case
$tkey3 = *mb_encode_numericentity* ($tkey2, $convmap); # convert
non-ascii to htmlentities
$fwritestatus = fwrite($logfilehandle, $logfiletime
. 'q="' . $q
. '" $SCrq="' . $SCrq
. '" tkey1="' . $tkey1
. '" tkey2="' . $tkey2
. '" tkey3="' . $tkey3 . '"'
As you can see in the debug output it seems to fall apart at tkey2.
I'd welcome more suggestions
On Tue, 18 Oct 2022 at 23:39, Petko Yotov <5ko at 5ko.fr> wrote:
> You may be able to use:
>
> $entity = mb_convert_encoding($decoded, 'HTML');
>
>
> You may or may not need to specify a $from_encoding argument. From the
> documentation it seems before PHP 8.0 $from_encoding was required.
> Documentation:
>
> https://php.net/mb_convert_encoding
>
> Petko
>
> --
> If you upgrade : https://www.pmwiki.org/Upgrades
>
>
> On 18/10/2022 12:17, Simon wrote:
> > Again, thanks heaps for answering these newbie questions, that works.
> > What I think I have found is that while html_entity_decode('Ē')
> > gives "Ē"
> > htmlentities ("Ē") doesn't convert Ē back to Ē
> >
> > Simon
> >
> > On Tue, 18 Oct 2022 at 19:18, Petko Yotov <5ko at 5ko.fr> wrote:
> >
> >> You can use mb_strtolower():
> >>
> >> https://php.net/mb_strtolower
> >>
> >> Here is an example from the PHP interactive shell:
> >>
> >> php > $str = "e.g. Ā to ā, Ê to ê, Į to į, etc";
> >> php > print_r(mb_strtolower($str));
> >> e.g. ā to ā, ê to ê, į to į, etc
> >> php > print_r(mb_strtoupper($str));
> >> E.G. Ā TO Ā, Ê TO Ê, Į TO Į, ETC
> >>
> >> Petko
> >>
> >> --
> >> If you upgrade : https://www.pmwiki.org/Upgrades
> >>
> >> On 18/10/2022 06:46, Simon wrote:
> >>> Can anyone suggest a means of converting diacritic [1]characters
> >> to
> >>> lower case,
> >>> e.g. Ā to ā, Ê to ê, Į to į, etc
> >>> other than creating a translation table?
> >>>
> >>> thanks
> >>>
> >>> Simon
> >>>
> >>>
> >>>
> >>> Links:
> >>> ------
> >>> [1] https://en.wikipedia.org/wiki/Diacritic
> >>> _______________________________________________
> >>> pmwiki-devel mailing list
> >>> pmwiki-devel at pmichaud.com
> >>> http://www.pmichaud.com/mailman/listinfo/pmwiki-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pmichaud.com/pipermail/pmwiki-devel/attachments/20221019/cb560f63/attachment.html>
More information about the pmwiki-devel
mailing list