Humanist Discussion Group, Vol. 36, No. 484.
Department of Digital Humanities, University of Cologne
Hosted by DH-Cologne
www.dhhumanist.org
Submit to: humanist@dhhumanist.org
Date: 2023-03-26 14:25:06+00:00
From: Henry Schaffer <hes@ncsu.edu>
Subject: Using numbers for words?
I was at a workshop about large scale computer processing with neural
networks/AI and Natural Language Processing (NLP) came up briefly. The
presenter mentioned that typically numbers were substituted for words - but
didn't discuss why. She referred us to
https://www.tensorflow.org/tutorials/text/word2vec as a method, and there's
some more explanation at https://en.wikipedia.org/wiki/Word2vec
I can see an advantage in storage and processing speed when dealing with a
word represented as perhaps 2 bytes rather than using perhaps 10-20+ bytes
per word, but I don't see any additional advantage. Do you?
Representing a word as a vector allows more information to be kept (as in
word2vec) and so that could give other advantages.
Can anyone add more explanation/reasons?
--henry
_______________________________________________
Unsubscribe at: http://dhhumanist.org/Restricted
List posts to: humanist@dhhumanist.org
List info and archives at at: http://dhhumanist.org
Listmember interface at: http://dhhumanist.org/Restricted/
Subscribe at: http://dhhumanist.org/membership_form.php