D&C GLug - Home Page

[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]

[LUG] Locales and ££££'s and Perl

 

Anyone know of any good resource on this.....

I'm having fun with Perl and £ signs, and working through the Perl
documentation initially (it seems quite good so far).

After much pondering I set the locale manually for the current user
(export LANG=en_GB.UTF-8"), and relevant things have changed.

locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

vi test.pl
#!/usr/bin/perl
use strict;
use warnings;
use utf8;

binmode(STDOUT,":utf8");

print "£\n";

In "vim" that looks like a £ sign ('cat' and 'less' want to use the
hexagonal ? symbol).

./test.pl
Malformed UTF-8 character (unexpected continuation byte 0xa3, with no
preceding start byte) at ./test.pl line 8.

My first question is what is going wrong here? Is this the wrong way to
do Unicode string literal (pressing shift + "3"). I'm not so concerned
with the "right way" or a "working way", but I wanted to understand what
is going wrong (perl/vim/my brain(likely)/Debian).

With Perl 5.8 (on Debian Sarge) I understood that "use utf8" should
still be used to allow Unicode literals to be used.

It all started with a Java applet, and I'm just working my way through
trying to establish consistent handling, and I suspect there is more
than one bug, in fact I know there is more than one bug, but I'll try
fixing them one at a time.

Usually I muddle along in Posix or Latin-1, and rarely encounter any
issues with locale, but here we have web pages that have to be in UTF-8
and data coming from them in UTF-8 (we hope). So this locale is
generated, but not used much on this box.

We don't actually have Unicode literals in the  Perl code, but it is
going to have to handle them, and I was writing a new test case for one
of the Perl template toolkit filters, and I couldn't get it to run at
all because it needed some UTF-8 character (so it could be taught not
mangle them as badly as it does currently -- I suspect the real mangling
is done in Javascript).


-- 
The Mailing List for the Devon & Cornwall LUG
http://mailman.dclug.org.uk/listinfo/list
FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html