Humanist Discussion Group

				
              Humanist Discussion Group, Vol. 39, No. 49.
        Department of Digital Humanities, University of Cologne
                      Hosted by DH-Cologne
                       www.dhhumanist.org
                Submit to: humanist@dhhumanist.org


    [1]    From: Gabriel Egan <mail@gabrielegan.com>
           Subject: Re: [Humanist] 39.47: repetition vs intelligence (357)

    [2]    From: Tim Smithers <tim.smithers@cantab.net>
           Subject: Re: [Humanist] 39.27: repetition vs intelligence (163)


--[1]------------------------------------------------------------------------
        Date: 2025-06-11 14:41:21+00:00
        From: Gabriel Egan <mail@gabrielegan.com>
        Subject: Re: [Humanist] 39.47: repetition vs intelligence

Tim Smithers writes:

<<
Text tokens are what the LLMs in
things like ChatGPT start with but
immediately turn into [numerical]
vector encodings, so called embeddings.
LLMs do not use words.
 >>

I believe you are making a distinction
without a difference regarding text
tokens and words. Linguists are not
agreed on what we mean by 'a word'.
They don't need to be for their
work to proceed.

For instance, is 'con-' in the
words 'construct' and 'context'
a word? Is 'un-' a word? Is 'a-'
(when used in words like
'asymmetrical' and 'apathy') a
word?

These are the kinds of tokens
that LLM tokenization separates
out, and it is right to do so.
These may not be what we want
to call 'words', but that
definition does not matter.

What matters is that these tokens
bear meaning -- they are semantic
-- and that is why I think you are
wrong to say that LLMs don't
deal in semantic units. Nobody
had put 'un-' before 'sex' and
'king' (as in 'unsex' and 'unking')
before Shakespeare did, but everyone
knew at the time what Shakespeare
meant by doing so, because they
knew the text token 'un-' and what
it meant.

LLMs find these meaningful text
tokens from our actual usages
across large bodies of writing
-- that is, from what Saussure
called our 'parole' -- rather than
from the abstract principles of
language (our 'langue').

You write that:

<<
And Computer Scientists, at
least the ones I know, do not
call any list of numbers a
vector.
 >>

The Wikipedia page for 'vector'
lists, amongst its senses, the
one in "Computer Science" as
"Vector, a one-dimensional array
data structure". That is, a list
of numbers identified by a
subscript (n1, n2, and so on).

The Oxford English Dictionary
gives, amongst the senses for
'vector', this definition:
"Computing. A sequence of
consecutive locations in memory;
a series of items occupying such
a sequence and identified within
it by means of one subscript".
That is, a list of numbers
identified by a subscript
(n1, n2, and so on).

I can multiply these examples
many times, showing that in
computing the term 'vector' is
often used to mean just a list of
numbers. One more: the W3 Schools
online course for the programming
language R gives this definition:
"A vector is simply a list of
items that are of the same type".

Are they all wrong?

I agree, incidentally, that a
lot of the ambiguity arises
because people forget to state
that their starting point for
a vector is the origin, which is
(0,0,0) in 3D space. That is,
(1,1,1) are the coordinates that
identify a point in space (where
x=1, y=1, and z=1) but the same
notation is often carelessly
used for what is properly called
a 'position vector', meaning
the vector from point (0,0,0) to
point (1,1,1).

I'm baffled by your claim that I
don't understand vector addition
when I write that as a vector (1,1,1)
is "a displacement from wherever
you are now". You go on to add this
vector (1,1,1) to the vector
(2,5,3) to get the vector (3,6,4).
Yep, that is exactly what I said.

You then assert that "vectors are
not . . . displacements". I say that
this is exactly what they are,
although 'displacement' has other
senses too.

A ship's displacement is a one-
dimensional value (a scalar), and
so is the displacement of a piston
engine, sure. But the displacement of
the volcano in the title of the 1968
film 'Krakatoa, East of Java' has
two dimensions: size and direction.
Krakatoa is in fact 20 miles WEST of
Java.

You can also see in the Wikipedia entry
for "Displacement (geometry)" this
definition:

<<
In geometry and mechanics, a displacement
is a vector whose length is the shortest
distance from the initial to the final
position of a point P undergoing motion.
It quantifies both the distance and direction
of the net or total motion along a straight
line . . .
 >>

Note the bit about "both the distance
and direction". That is, a displacement
is a vector not a scalar.

You could, I suppose, argue that Wikipedia
is wrong in all these things. Let me know
if that is your position and I'll substitute
definitions from textbooks instead. (I use
Wikipedia for these matters because it
tends to be reliable and because everyone
has access to it to check which of us is
mistaken.)

Your next couple of points depend
on your distinction between 'word'
and 'text token' that I have given
my response to above.

You then write:

<<
if, as you claim here, each dimension
of this vector space somehow encodes
meaning, then, to be a dimension of a
vector space, each dimension must
encode a unique meaning
 >>

No, that does not follow. Meanings
don't have to be "unique"; they can
easily be overlapping. For instance,
I may plot on a 2D scatterplot
the positions of various animals
along two dimensions: 'fluffiness'
and 'cuteness'. In fact I do this
as an exercise when teaching word
embeddings to arts and humanities
students.

Each point reflects the students'
agreed scores along each dimension
for a range of animals including cats
and dogs, insects, and various primates.
Although the students know that
'fluffiness' and 'cuteness' are not
the same thing, there emerges a clear
correlation between these dimensions:
cute animals tend to be fluffy.

That the x axis and y axis of the vector
space are orthogonal does not entail that
each dimension must represent "a unique
meaning, and a meaning that is orthogonal
to all other meanings on all the other
dimensions". Thus I do not need to show
what the "12,888 unique and orthogonal
meanings" are for a LLM vector space that
uses 12,888 dimensions.

This task you ask me to complete "if this
orthogonality of meanings is true" can
be ignored because the orthogonality
of meanings is not true. I cannot fathom
why you would suppose that meanings must
be orthogonal simply because we record
them along dimensions that are orthogonal.

You ask:

<<
does an LLM know that the vector
addition of the vector for "gender"
and the vector for "lemonade" doesn't
result in an interesting vector
"... because gender doesn't apply
to lemonade"?
 >>

You're misquoting me. I wrote:

<< add the 'male-to-female'
displacement to 'lemonade' and
you don't land anywhere interesting,
because gender doesn't apply to
lemonade.
 >>

In your misquotation of me you twice
have me referring to vectors where
I referred to points in space. By
this misquotation you are attributing
to me your confusion of points in
multidimensional space (given by
lists of numbers called coordinates)
with displacements in multidimensional
space (given by lists of numbers
called vectors).

In asking whether an LLM "knows" any of
this, you are begging the question. That
is, you and I have a foundational
disagreement about what it means to
know something. I assert that ChatGPT
knows that Paris is the capital of
France, whereas your position is that
it doesn't because it cannot know
anything.

The next part of your post is addressed
to other Humanists and asks whether they
agree with your view that "Languaging
needs a capacity to form intentions to
say something . . .". My view is that
if you start out with the definition
of 'intention' as something that only
people can have and make intention
the defining characteristic of language,
then necessarily you will conclude that
machines cannot create language. Meanwhile,
millions of people spend many hours talking
with the machines.

Then you ask me a question:

<< how, may I ask, do you account
for the many (easy to make) examples
of automatically generated text which,
when read by us, displays plenty of
signs that no real understanding was
involved in the generation of this text?
 >>

I account for that the same way I account
for it in people: there was, as you say,
no real understanding involved in the
generation of this text. This is the
studen essay marking season for me.
I know that the generators of these
texts are people. But some of them
produce texts that show no real
understanding, even of what they have
written. (Post-structuralist accounts
of Shakespeare's works -- my specialist
area -- are particularly prone to this
problem.)

You say that I "need to account for why they
fail so often". This is like saying that
a machine cannot play chess because sometimes
they lose or cannot drive a car because
sometimes they crash. For me, the amazing
thing is that they ever win at chess or
steer across the city without bumping into
things.

In G. K. Chesterton's extraordinary novel
'The Man Who Was Thursday', an anarchist and
a poet discuss whether things going right
is more or less poetical than things going
wrong.

The anarchist finds the London Underground
a tedious bit of technology, with no magic
in it, and explains that this is why everyone
on it and running it looks so bored:

<<
. . . after they have passed
Sloane Square they know that the
next station must be Victoria,
and nothing but Victoria. Oh,
their wild rapture! oh, their
eyes like stars and their souls
again in Eden, if the next station
were unaccountably Baker Street!'
 >>

The poet replies, no:

<<
. . . in chaos the train might indeed
go anywhere, to Baker Street, or to
Bagdad. But man is a magician, and
his whole magic is in this, that he
does say Victoria, and lo! it is Victoria.
... every time a train comes in I feel
that it has broken past batteries of
besiegers, and that man has won a battle
against chaos. You say contemptuously
that when one has left Sloane Square
one must come to Victoria. I say that
one might do a thousand things instead,
and that whenever I really come there
I have the sense of hair-breadth escape.
And when I hear the guard shout out the
word "Victoria", it is not an unmeaning
word. It is to me the cry of a herald
announcing conquest. It is to me indeed
"Victoria"; it is the victory of Adam.'
 >>

I think I feel something of the poet's
glee every time an LLM -- a thing humans
have made that might go wrong in any
number of ways -- instead says something
clever.

Regards

Gabriel Egan

--[2]------------------------------------------------------------------------
        Date: 2025-06-11 08:18:04+00:00
        From: Tim Smithers <tim.smithers@cantab.net>
        Subject: Re: [Humanist] 39.27: repetition vs intelligence

Dear Maurizio,

Thank you for your kinds words about my text is marks left by
writing.  I'm happy these worked for you.  Which goes to show
how words can work.  On reading my text it now looks rather
awkward and clumsy.

You pointed to the Brian Porter and Edouard Machery on
AI-generated poetry.  Jim Rovira is far better qualified to
comment on this than me, but I wonder if Generative AI systems
are keeping up with trends?  I saw this fun piece in The
Economist recently.

    Rhyme, once in its prime, is in decline
      Readers like it.  So why do poets eschew rhyme?
    The Economist, May 28, 2025
    <https://www.economist.com/culture/2025/05/28/rhyme-once-in-its-prime-is-in-
decline>

(This may be behind a pay wall, but try the link in case.  If
it doesn't work, beg someone who has a subscription to show
you this.)

And here is Ernest Davis replying to this Porter and Machery
piece.

    Ernest Davis, 2024.  ChatGPT’s Poetry is Incompetent and
    Banal: A Discussion of (Porter and Machery, 2024)
    <https://cs.nyu.edu/~davise/papers/GPT-Poetry.pdf>


I like your question, "what do we do/what can we do/what
should be do to help people to understand and appreciate good
food for the mind?"

Showing more people this other recent piece from The
Economist, could be part of what we do, I think.

    Why the president must not be lexicographer-in-chief
      Who decides what legal terms mean?  If it is Donald
      Trump, God help America
    The Economist, May 30, 2025
    <https://www.economist.com/united-states/2025/05/30/why-the-president-must-
not-be-lexicographer-in-chief>

A lot of (so called) political argument seems to me to involve
pushing the meanings of words to extreme and inappropriate
places on the shove-ha'penny board.  "Insurrection," just to
pick one example from news reporting I saw over the weekend,
has now been pushed to a place I would say it clearly does not
belong.

Here's another piece I think is interesting, but for a
different reason.  When we write our words, thus leaving the
marks of text, these marks need to be readable.  All this, how
do we make our text readable by those we want to read it, is,
as far as I can see, completely neglected by everything done
to build the automatic text generators we have today, as if it
has no importance.  But it does have an importance, a big one,
and computers have had an important role in how we prepare our
texts for good reading.  Words and meanings cannot be built by
readers from text that is not comfortably readable, not
reliably, at least.  Here's a piece I came across recently
which is about some digging back in some early history of how
some maths was typeset for good reading.

    David F Brailsford, W Kernighan, and A Ritchie, 2022.  How
    did Dennis Ritchie Produce his PhD Thesis?  A
    Typographical Mystery, DocEng '22: Proceedings of the 22nd
    ACM Symposium on Document Engineering Article No.: 2,
    Pages 1 - 10 <https://doi.org/10.1145/3558100.3563839>

Also recently posted on Fermat's Library here
<https://fermatslibrary.com/s/how-did-dennis-ritchie-produce-his-phd-thesis-a-
typographical-mystery#email-newsletter>

Thanks again, Maurizio, for your post, with my apologies for
taking so long to say so.

-- Tim


> On 25 May 2025, at 10:53, Humanist <humanist@dhhumanist.org> wrote:
>
>
>              Humanist Discussion Group, Vol. 39, No. 27.
>        Department of Digital Humanities, University of Cologne
>                      Hosted by DH-Cologne
>                       www.dhhumanist.org
>                Submit to: humanist@dhhumanist.org
>
> <snip>

>    [2]    From: maurizio lana <maurizio.lana@uniupo.it>
>           Subject: Re: [Humanist] 39.21: repetition vs intelligence (333)
> <snip>

> --[2]------------------------------------------------------------------------
>        Date: 2025-05-21 19:47:04+00:00
>        From: maurizio lana <maurizio.lana@uniupo.it>
>        Subject: Re: [Humanist] 39.21: repetition vs intelligence
>
> thank you for these lines Tim.
> their the peak is here, for me:
>
>> Text is the marks left by some human writing, and, now-a-days,
>> often printed or screen rendered using suitable well designed
>> font(s) and typographical designs.  Text is not the same as
>> words.  The words involved were formed in the head of the
>> author and remain there.  Writing words to say something
>> involves encoding the chosen words in some shared alphabet and
>> shared spelling and grammar.  This results in the marks we
>> call text.  Text is thus a sequence of signs, and it must be
>> read, by, of course, something that can read these signs, to
>> re-form the words of the author.  These again formed words are
>> formed in the reader's head, they are not found and somehow
>> picked out of the text; the signs are not the words, they are
>> signs for words.  This notion of "picking up the words" is not
>> what reading is, though this is how it might seem to us, and
>> how we often talk about it being.  This confusion -- the text
>> is the words -- was harmless when we [just about] only had
>> text from human writing, but now we have, thanks to things
>> like ChatGPT, automated text generation systems, and lots of
>> text which is not the result of any kind of writing.  Just
>> because we can read this automatically generated text, and
>> form words in our heads from this reading, words which mean
>> something to us, and thus give us the impression that the text
>> is about something, does not mean, nor necessarily make, the
>> generator of this text a writer.  To be a writer requires the
>> author to be a reader of the written text, and, or course,
>> lots of other text.  And it requires the writer to have a mind
>> in which they form words to say something with.  ChatGPT, and
>> other Generative AI systems like it, do not read anything.
>> ChatGPT does no reading of your [so called] prompt.  The text
>> you make by writing your prompt is simply chopped into a
>> sequence of text tokens which are, in turn, used to build a
>> sequence of vector encodings, together with quite a lot of
>> other stuff added to your prompt text by the always hidden
>> prompt processing ChatGPT has to do.  (ChatGPT is not just an
>> LLM, it has plenty of other machinery needed to make it do
>> what it does.)
>
> and just this evening saw this article:
> Porter, Brian, e Edouard Machery. «AI-generated poetry is
> indistinguishable from human-written poetry and is rated more
> favorably». /Scientific Reports/ 14, fasc. 1 (14 novembre 2024): 26133.
> https://doi.org/10.1038/s41598-024-76900-1.
>
> allow me to say that literature is like cuisine: if you are not educated
> you are not able to distinguish the flavors, and to appreciate them and
> their combination; or you are educated only in some flavors (e.g. the
> Italian people who abroad first of all search for an Italian restaurant).
> if you are not educated in literature you are not able to distinguish
> and fully appreciate it. you could end tasting and appreciating junk
> food without even knowing what junk food (junk information) is.
>
> so my question, as a professor and as a citizen is: what do we do/what
> can we do/what should be do to help people to understand and appreciate
> good food for the mind?
>
> Maurizio
>
>


_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php