File /Humanist.vol22.txt, message 149

Date:         Mon, 28 Jul 2008 06:59:58 +0100
From: Humanist Discussion Group <willard.mccarty-AT-MCCARTY.ORG.UK>
Subject: 22.147 indexing
To: humanist-AT-Princeton.EDU

               Humanist Discussion Group, Vol. 22, No. 147.
       Centre for Computing in the Humanities, King's College London
                     Submit to:

         Date: Mon, 28 Jul 2008 06:55:46 +0100
         Subject: Re: 22.143 indexing

I am not sure more data will improve the situation. The fundamental
problem is
that when computers present "relevant" articles they are basing the
on computational heuristics which are vastly impoverished compared to human
judgements. The common methodology is to look for the same words as
those used
in the primary article or to look for similar sets of citations to those
of the
original article.

Word matches by computer are typically morphologically-related matches.
As the
age of articles grows with time, language changes its meaning. Searching for
"relevant" articles to a contemporary article using terminology invented
recently will not find articles using different terminology and effectively
could appear to have exhaustively found all relevant articles whereas in
the concept was discovered and explored far earlier using different

Citations go back as far as citation indexes do, but that isn't back to the
beginning of the literature. I do not believe citation indexes are extending
their coverage backwards into earlier and earlier years of publications.
may well be satisfied that they now have sufficient depth of coverage in
that earlier citations wouldn't improve their retrievals. Once again, the
related articles could simply dry up as one goes backwards and reaches the
digitization horizon. A good question is whether a resource such as JSTOR,
dedicated to the past, could benefit from citation indexing or minimally
act as
a set of milestones for conventional citation indexing to reach in
extending its
coverage backwards in time.

What can be done. First, I believe some new studies of paper-only research
should be undertaken. The computational basis for "relevant" articles
should be
more formally studied with reference to whether the computational
processes in
use are equivalent or merely doing what is easy to compute, ignoring
what can't
be computed.

For example, when researching I often identify more than related
terminology. I
look for authors, institutions, journals and library call numbers with
relevant works and then research those authors, institutions, journals and
library call numbers themselves to see what's there. One can often
discover a
pivotal organization or individual who mentored generations of students
following a theoretical approach that transcends the terminology. Or a
that has published the bulk of the articles about a theory (and scanning
journal tables of contents can restore the missing connectivity to
non-terminologially related works). Library call numbers are an
excellent way
of discovering related works and most electronic catalogs allow you to
scan the
shelves electronically if you can't do it in person.

Terminology is often invented to separate one's research from others.
terminology isn't proof of exhaustive coverage, as others looking at the
task may have likewise invented their own terminology. Knowing that
terms are
essentially from different schools of thought about a common problem is
hard to
determine through computer indexes alone.

From - Mon Jul 28 12:41:18 2008
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000

Humanist Main Page


Display software: ArchTracker © Malgosia Askanas, 2000-2005