CharacterCounter | Chris Schenk

This tiny tool helps you to count the characters from various text files. It comes in handy if you’re into Information theory and want to find out the letter frequencies of a given text corpus. If you’d like to fine tune a Frequency analysis this tool helps you, too.

You can download the Eclipse project as tar or zip file or have a look at the code here.

Notes

You may put a bunch of text files into a directory and let this tool process all these files. It will read every file and iterate over every character in the text file. You can choose to convert the content to lower case first. Furthermore it’s possible to restrict the characters: you can set an alphabet and every character that’s not contained in there will be discarded.

Once the tool has read all text files it can output some statistics: the overall occurrence of the characters and the probability that an arbitrary string might contain this character. If the text corpus is big enough you’ll be able to study the occurrence of certain characters in different languages.