|
| <prev next> |
Re: UTF-8 support?: msg#00129lang.perl.macosx
On Apr 30, 2005, at 6:06pm, Sherm Pendley wrote: OK. So does this mean that substr() just doesn't/can't handle wide characters as characters but only as bytes? OK, here's the code without the bug written into the example (which is inside a foreach loop that is looping through a long list of keywords), ... while ($articleWorkText =~ m/\b$kWord\b/igs) { $position = pos($articleWorkText) - length($kWord); $matchedText = substr($articleWorkText, $position, length($kWord)); $matchedText =~ s/ /_/g; substr($patternSpace, $position, length($matchedText)) = $matchedText; } ... Which works fine in most cases but, if there is a wide character in $articleWorkText before the matched text, then $position, as used by substr() ends up being in front of the $position as calculated from pos(). If I open the file in TextEdit, the pos() derived position seems to be correct while the position that substr() seems to use is one character earlier and this only happens when there is a wide character preceding the match. Now maybe this would be better written using the $1, $2, ... variables but I still don't understand the discrepancy between the pos() position and the substr() position John Blumel |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Re: UTF-8 support?: 00129, Sherm Pendley |
|---|---|
| Previous by Thread: | Re: UTF-8 support?i: 00129, Sherm Pendley |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |