July 30, 1996

Japanese Language Encoding

There are four major encodings to represent Japanese text. All of these are based upon ASCII (for alphabetic text) and Japanese Industrial Standard X0208 (JIS X0208), but the data is stored in different ways. An "octet" is an 8-bit data quantity, often incorrectly called a "byte" or "character".

It is very easy to translate between JIS7 and EUC. With a little more work, it is also possible to translate between S-JIS and JIS7 and EUC. It is also possible, by examining the data, to determine whether it is JIS7, S-JIS, or EUC. JIS8 is ambiguous but fortunately it is rare enough that you needn't worry about it.