CharacterCounter

This tiny tool helps you to count the characters from various text files. It comes in handy if you’re into Information theory and want to find out the letter frequencies of a given text corpus. If you’d like to fine tune a Frequency analysis this tool helps you, too.

You can download the Eclipse project as tar or zip file or have a look at the code here.

Notes

You may put a bunch of text files into a directory and let this tool process all these files. It will read every file and iterate over every character in the text file. You can choose to convert the content to lower case first. Furthermore it’s possible to restrict the characters: you can set an alphabet and every character that’s not contained in there will be discarded.

Once the tool has read all text files it can output some statistics: the overall occurrence of the characters and the probability that an arbitrary string might contain this character. If the text corpus is big enough you’ll be able to study the occurrence of certain characters in different languages.

Iterating over the characters in a string

I was wondering what would be the fastest method to iterate over the characters in a string with Java. A small test implements the following things:

  • using an Iterable
  • toCharArray()
  • charAt()
  • calling charAt() on a CharSequence

You can download the Eclipse project as a tar or zip or browse the code online here; have a look at the StringIteratorTest class first.

Results

Without further ado here are the results:

Variant Time in ms.
Iterator 5.5
toCharArray() 1.1
charAt() 1.6
CharSequence.charAt() 2.2

Bar chart of the test results

Conclusion

If speed is what you want you shouldn’t use Iterators but one of the other solutions instead. On the other hand if you just like to play with Iterators and classes that implement Iterable you may want to choose the slower solution.