Using the size of the character to represent [proportionally] the
relevancy score is something I used this spring at the 16th Balkan and
South Slavic Conference in Banff, Canada. I used log likelyhood to
identify the interesting terms, and Z-score to discriminate the term
usage between two corpora. The corpora, by the way, are writings of an
Albanian writer (Ismail Kadare) before and after the fall of communism
in Albania. My paper demonstrated how language changed in such a short
period reflecting very quicky the big social upheaval of the period. At is the slide showing the differences between
"Koncert [n=C3=AB fund t=C3=AB dimrit]" 'The Concert at the End of Winter'=20
(1988) and "Lulet [e ftohta t=C3=AB marsit]" 'Spring Flowers, Spring Frost=20
[lit. The Cold Flowers of March]' (2000) - one in red and the other in blue.

It is both humbling and rewarding to find that similar things have been
done by others before...



