Converting UTF-8 to ISO-8859-1

Mary ZhengMay 17th, 2024Last Updated: May 17th, 2024

0 272 4 minutes read

1. Introduction

ISO 8859 is an eight-bit extension to ASCII developed by the International Organization for Standardization (ISO). ISO 8859 includes the 128 ASCII characters and additional 128 characters. ISO-8859-1 (Latin-1) is the first version of ISO-8859 which supports most Western-European languages including Afrikaans, Basque, Catalan, Danish, Dutch, English, Faeroese, Finnish, French, Galician, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish. Unicode Transformation-8-bit (UTF-8) is a variable-length character encoding standard and each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte and they are the same as those in ASCII. Therefore, both ISO-8859-1 and UTF-8 are backwards compatible with ASCII. ISO-8859-1 is more memory-efficient than UTF-8 since it uses a single-byte for each character. If the applications support only Western-European languages and don’t require characters from other languages or special symbols, then ISO 8859-1 is a better choice. In this example, I will demonstrate UTF-8 to ISO-8859-1 conversion with Java applications.

2. Set up Java Project

In this step, I will create a simple Java project in an Eclipse IDE. In order to display the UTF-8 character in the console window, please select the “UTF-8” from with the “Other:” options under the “text file encoding” section as the screenshot shown here.

Figure 1 Eclipse IDE Text File Encoding Setting

3. UTF-8 to ISO-8859-1 Conversion via getBytes

In this step, I will create a ConvertViaBytes class which converts the bytes of the original UTF-8 string to a sequence of characters using UTF-8 encoding, and then encoding those characters into bytes using ISO-8859-1 encoding.

ConvertViaBytes.java

package org.zheng.demo;

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;

public class ConvertViaBytes {

	private static final String ISO_8859_1 = "ISO-8859-1";
	private static final String UTF_8 = "UTF-8";

	public static void main(String[] args) {
		System.out.println("Java default Charset: " + Charset.defaultCharset());

		Charset.availableCharsets().entrySet().stream()
				.filter(c -> c.getKey().startsWith(UTF_8) || c.getKey().startsWith(ISO_8859_1))
				.forEach(c -> System.out.println("Found Charset: " + c.getKey()));

		try {
			String utf8String = "UTF-8 Text: MaryZhengäöüß测试";

			// Convert UTF-8 string to byte array using UTF-8 encoding
			byte[] utf8Bytes = utf8String.getBytes(UTF_8);

			// Convert byte array to string using ISO-8859-1 encoding
			String iso88591String = new String(utf8Bytes, ISO_8859_1);

			System.out.println("Original UTF-8 string: " + utf8String);
			System.out.println("Converted ISO-8859-1 string: " + iso88591String);
		} catch (UnsupportedEncodingException e) {
			System.out.println("Unsupported encoding: " + e.getMessage());
		}
	}

}

line 12: prints out the default character setting. For this example, it should print out as “UTF-8”.
line 15, 16: prints out the supported character setting whose name starts with “UTF-8” and “ISO-8859-1”. You will see that there are several supported versions of ISO-8859-1.
line 19: defines a UTF-8 string which includes ASCII characters and two Chinese characters.
line 22: returns a byte array of the UTF-8 string.
line 25: creates a new string with the above byte array and encodes it with ISO-8859-1.
line 27, 28: prints the original UTF-8 string and converted string.

Execute the main program and capture the output.

ConvertViaBytes output

Java default Charset: UTF-8
Found Charset: ISO-8859-1
Found Charset: ISO-8859-13
Found Charset: ISO-8859-15
Found Charset: ISO-8859-16
Found Charset: UTF-8
Original UTF-8 string: UTF-8 Text: MaryZhengäöüß测试
Converted ISO-8859-1 string: UTF-8 Text: MaryZhengÃ¤Ã¶Ã¼Ãæµè¯

Note: as you saw at the last line, the converted string didn’t display the Chinese characters correctly.

4. UTF-8 to ISO-8859-1 Conversion via charArray

In this step, I will create a ConvertViaCharArrayclass which converts the original UTF-8 string to a char array and then create a string from byte[] with ISO-8859-1 encoding.

ConvertViaCharArray.java

package org.zheng.demo;

import java.nio.charset.Charset;

public class ConvertViaCharArray {

	private static final int LAST_CHAR = 0xFF;
	private static final String ISO_8859_1 = "ISO-8859-1";

	public static void main(String[] args) {

		String utf8String = "UTF-8 Text: MaryZhengäöüß测试";

		// Decode UTF-8 string to characters
		char[] utf8Chars = utf8String.toCharArray();

		// Encode characters to ISO-8859-1 bytes
		byte[] iso88591Bytes = new byte[utf8Chars.length];
		for (int i = 0; i < utf8Chars.length; i++) {
			char c = utf8Chars[i];
			
			if (c <= LAST_CHAR) {
				iso88591Bytes[i] = (byte) c;
			} else {
				iso88591Bytes[i] = '?'; // Replace characters not representable in ISO-8859-1
			}
		}

		// Create ISO-8859-1 string from bytes
		String iso88591String = new String(iso88591Bytes, Charset.forName(ISO_8859_1));

		System.out.println("Original UTF-8 string: " + utf8String);
		System.out.println("Converted ISO-8859-1 string: " + iso88591String);
	}

}

line 12: defines a UTF-8 string with some Chinese characters.
line 15: returns a charArray from the above UTF-8 string.
line 18: creates a new byte array with the same length as the original string.
line 22,23: reuses the same bytes if the character is less than the last ASCII 0xFF.
line 25: changes the character to ? for these non-represtable UTF-8 characters.
line 30: creates a new string with ISO-8859-1 encoding.
line 32, 33: prints out the original UTF-8 and converted string.

Execute the main program and capture the output:

ConvertViaCharArray output

Original UTF-8 string: UTF-8 Text: MaryZhengäöüß测试
Converted ISO-8859-1 string: UTF-8 Text: MaryZhengäöüß??

Note: as you see from the outline, the Chinese characters changed to the ? symbol.

5. Conclusion

Different operating systems choose a different default character encoding. For example, Microsoft Windows system default character encoding is set as UTF-16 while Linux and MasOS set UTF-8 as the default. Sometimes, character encoding conversion is necessary to ensure that text data is properly interpreted and processed. In this example, I demonstrated UTF-8 to ISO-8859-1 conversion with two java applications. The ConvertViaCharArray class converts a UTF-8 String to ISO-8859-1 and masks the not-supported characters with the question mark(?). The ConvertViaBytes class converts a UTF-8 string into ISO-8859-1 with the getBytes method.

6. Download

This was a Java example of converting UTF-8 to ISO-8859-1.

Download
You can download the full source code of this example here: Converting UTF-8 to ISO-8859-1

Mary ZhengMay 17th, 2024Last Updated: May 17th, 2024

0 272 4 minutes read

Converting UTF-8 to ISO-8859-1

1. Introduction

2. Set up Java Project

3. UTF-8 to ISO-8859-1 Conversion via getBytes

4. UTF-8 to ISO-8859-1 Conversion via charArray

5. Conclusion

6. Download

Thank you!

Mary Zheng

Thank you!

1. Introduction

2. Set up Java Project

3. UTF-8 to ISO-8859-1 Conversion via getBytes

4. UTF-8 to ISO-8859-1 Conversion via charArray

5. Conclusion

6. Download

Thank you!

Related Articles

Thank you!