Java: Type safe character set

If you are working on strings and want to make sure that they conform to a certain character set and are encoded with a certain encoding scheme you will come across the standard Java class Charset that outlines various character sets that should be available in every implementation of the JVM. Instead of scattering magic strings like “UTF-8” all over your code and catching UnsupportedEncodingException I came up with an enum that encapsulates all this.

Implementation

We will create an enum that contains all the standard character sets and return the correct magic string – like “UTF-8” – with a method. Here’s the enum:

public enum Charsets {

  ISO_8859_1("ISO-8859-1"),
  US_ASCII("US-ASCII"),
  UTF16("UTF-16"),
  UTF16BE("UTF-16BE"),
  UTF16LE("UTF-16LE"),
  UTF8("UTF-8");

  private final String charset;

  private Charsets(final String charset) {
    this.charset = charset;
  }

  public String getCharset() {
    return this.charset;
  }
}

For example, we want to get the bytes from a string based on UTF-8 we could do the following:

final String text = …
// Get text as UTF-8
byte[] bArray = text.getBytes(Charsets.UTF8.getCharset());

This way we don’t have to hard code “UTF-8” into the code when calling getBytes but can simply use the enum.

Testing

I created a simple unit test that makes sure that the magic strings in the enum are correct and do not contain any typos. Here goes:

@Test
public void test() {
  for (final Charsets charset : Charsets.values()) {
    try {
      Assert.assertNotNull("test".getBytes(charset.getCharset()));
    } catch (final UnsupportedEncodingException ex) {
      Assert.fail("Should not throw exception: " + ex.getMessage());
    }
  }
}

Conclusion

As with my previous post enums are a great way to encapsulate magic strings. There are quite a few instances where people have wondered why there aren’t any constants in the JDK itself – even though the reasoning is sound it’s quite annoying and the above solution seems to be just a nice and clean approach.