setlocale to allow UTF-8 characters on command line

Discussion related to AES Crypt, the file encryption software for Windows, Linux, Mac, and Java.
Post Reply
gc1
Posts: 4
Joined: Thu May 30, 2013 9:04 am

setlocale to allow UTF-8 characters on command line

Post by gc1 »

Can I suggest adding:

setlocale(LC_ALL, "");

at the beginning of main() in aescrypt.c etc. to set the locale from the environment.

Without this nl_langinfo(CODESET) always returns ANSI_X3.4-1968 (on both Linux and Cygwin) which then disallows non ASCII characters on the command line in the password argument.

Also in the Makefile, moving $(LIBS) to the end of the line helps it to link OK on some systems (such as Cygwin), e.g:

Code: Select all

$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: setlocale to allow UTF-8 characters on command line

Post by paulej »

I'll definitely add this local change to the to-do list. This problem had been reported to me a long time ago and I did not know what the solution was. I could never reproduce the issue, because my system is using UTF-8 and my input is UTF-8, so everything just worked perfectly.

Have you made the change and verified that it works? If so, I'll definitely include that in the next update.

For the LIBS item, we're presently only using LIBS when we compile on a Mac. Why is this an issue?
gc1
Posts: 4
Joined: Thu May 30, 2013 9:04 am

Re: setlocale to allow UTF-8 characters on command line

Post by gc1 »

Yes, I added the setlocale() call and it works on both Linux and Cygwin/Windows for UTF-8 passwords. I can encrypt a file on one O/S and decrypt it on the other O/S using the same UTF-8 password string.

The LIBS is also needed on Cygwin, but you get undefined references if you put it before the objects:

Code: Select all

$ gcc -Wall -D_FILE_OFFSET_BITS=64 -liconv -o aescrypt aescrypt.o aes.o sha256.o password.o keyfile.o
password.o:password.c:(.text+0x463): undefined reference to `_libiconv_open'
password.o:password.c:(.text+0x4a9): undefined reference to `_libiconv'
password.o:password.c:(.text+0x4f1): undefined reference to `_libiconv_close'
password.o:password.c:(.text+0x52b): undefined reference to `_libiconv_close'
password.o:password.c:(.text+0x53d): undefined reference to `_libiconv_close'
collect2: ld returned 1 exit status
Moving it to the end fixes this.
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: setlocale to allow UTF-8 characters on command line

Post by paulej »

OK.. I'll put this on my to-do list.
gc1
Posts: 4
Joined: Thu May 30, 2013 9:04 am

Re: setlocale to allow UTF-8 characters on command line

Post by gc1 »

I see that setlocale() can also query the current locale by using a NULL arg. So it might be worth querying it first and only setting it if was the default "C" locale - since if there is no LANG environment variable set then setlocale(LC_ALL,"") would set it to "C" when there might have been a better default to start with.

The Linux variant I am using is Fedora 17, Cygwin appears to behave the same, but others might be different.
a test program:

Code: Select all

#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

int main() {
	printf("original locale is %s\n", setlocale(LC_ALL, NULL));
	printf("code set is %s\n", nl_langinfo(CODESET));
	setlocale(LC_ALL, "");
	printf("locale is now %s\n", setlocale(LC_ALL, NULL));
	printf("code set is %s\n", nl_langinfo(CODESET));
	return 0;
}
produces:

Code: Select all

original locale is C
code set is ANSI_X3.4-1968
locale is now en_GB.UTF-8
code set is UTF-8
or:

Code: Select all

original locale is C
code set is ANSI_X3.4-1968
locale is now C
code set is ANSI_X3.4-1968
if LANG is unset
User avatar
paulej
Posts: 593
Joined: Sun Aug 23, 2009 7:32 pm
Location: Research Triangle Park, NC, USA
Contact:

Re: setlocale to allow UTF-8 characters on command line

Post by paulej »

If LANG is not set, then what would be the default, if not 'C''?
gc1
Posts: 4
Joined: Thu May 30, 2013 9:04 am

Re: setlocale to allow UTF-8 characters on command line

Post by gc1 »

Probably what I said is an unnecessary complication.

I was just thinking might there be some systems which get the locale information from elsewhere, such as MS Windows getting it from the registry (uses locale strings like English_USA.1252 etc. without needing anything in the environment).

So, yes probably just the simple
setlocale(LC_ALL, "");

ought to be sufficient for all systems as the empty string appears to mean get the information from wherever the system gets it from and that should always be as useful or more useful than the state when the program starts.
Post Reply