What’s it all about, Alfie

I’ve just launched a new tool at Hatmandu.net, a text content and keyword analyser – in theory useful for search engine optimisation, but also to get the general gist of a text.From the notes:

This text content and keyword analyser is intended to give a more precise indication of a text’s most important words than other tools available. Most keyword analysers use simple word frequency (which is also shown here anyway), but that doesn’t relate the specific text to the language in general – common terms such as ‘people’ and ‘time’, for example, appear in many documents, but do not necessarily indicate the essence of the particular text being analysed. This analyser uses the TF-IDF statistical method to relate the frequencies of words in the specific text to their general frequencies in the British National Corpus. I am indebted to Adam Kilgarriff‘s version of the BNC, which I have adapted considerably for this tool. This analyser mainly uses the nouns in the BNC, on the basis that these are the parts of speech that best indicate the subject matter of a text. (At some point I hope to produce a version using an American English corpus, though I’d be surprised if the results were very different.)

It works with Twitter accounts (though it only reads the last 200 tweets, which may not form a usefully large body of text), and URLs where my humble scraping tool is able to extract the text successfully – most useful is the ‘paste text’ field, which will accept up to 1Mb of text (about 200,000 words) – so will analyse entire books if desired. Livejournal users can enter their URL (http://username.livejournal.com) assuming their account is public.

It’s a bit experimental at the moment, but hopefully might migrate from ‘possibly fun’ to ‘possibly useful’ in due course!

Given the red light

I was honoured and delighted, on studying the search terms through which people have encountered my website (see left, but a second site is in development, which will relocate most of the weirdness below), to see they included:

prostitutes in Gosport

Alas, I’m not offering a Portsmouth pimpery, and the document which contained these terms was about a bit of 18th century history. One poor lonely Hampshire man must be very disappointed.

Other terms so far include

death of Napoleon
how the sandwich was born
Rousseau noble savage horses
cyanide in almonds
steps to finding a job
vonnegut tralfamadorean novels

And that’s all from the last week!