I've written a small java program, which I intend to GPL, that divides up
files into like groups based on what boils down to byte for byte
comparisons, though it tries to do it rapidly.

The program works like a champ if I build it with gcj, but fails if I run
it with OpenJDK. In both cases, I'm running it on Ubuntu Linux.

Or more specifically, the program works with gcj and OpenJDK if I only
feed it filenames that use characters with ordinal values <= 127, but the
OpenJDK version fails on filenames having characters with umlauts, IE >=
128. The gcj version is fine with characters >= 128.

In the gcj version, the offending character is coming up as 246 (o
umlaut), like I kind of expected. In the OpenJDK version, the character
is replaced with 65533, which a google search says represents an unknown
character.

I changed the program to spit out what encoding stdin is using, and both
the gcj version and the OpenJDK version believe stdin is UTF8.

I've tried getting stdin with these, but no dice:

// isr = (new InputStreamReader(System.in, "ISO-8859-1"));
// isr = (new InputStreamReader(System.in, "UTF-8"));

Has anyone encountered this before? Has anyone gotten around it using
OpenJDK or Sun JDK?

Yes, I could just use the gcj version, but I'm doing this largely as a
learning experience, not just because I want to compare files.

TIA!