Litterae.eu
Humanities & IT


Hallucinations in large generative language models (GPT LLM's)

It has been some time a software programme called gpt4all has been available, which allows you to chat with various GPT-type language models privately locally: depending on your operating system, you install or start the package on your own computer, have it load a pre-trained LLM from among those listed, and then you can also safely disconnect the PC from the Internet and begin the dialogue.
Nothing is transferred or shared anywhere.

Another extremely interesting function of gpt4all is that it also allows the model to be supplied with a corpus of customised texts to be used for the so-called RAG, retrieval augmented generation, i.e. text generation enhanced by extra data.
In my case, this consists of just over 1700 texts on the history, arts and letters of Mantua, amounting to a total of around 31 million words; so that in the snap of a finger I can have not only information on facts and people, but also connections that I may have missed during my studies and research.
In theory, therefore, an amazing tool. However, care and great caution must always be part of the toolbox of those who work with artificial intelligence technology.

In the specifics of gpt4all with RAG, there are two limitations to be taken into account.
The first one is related to the size of the corpus of documents provided for RAG: when it is too large, and mine likely is, the models struggle to establish a hierarchy of importance among the various texts with respect to the user's query, so they tend to digress by providing (partially) off-target answers.
If, for instance, I ask for a list of episodes of diplomatic tension between Mantua and Venice during the first decades of the 16th century, even though it has that information available, GPT generically reports facts of the period related to Mantua, such as the arrival of Charles V in town in 1530 or the contacts between cardinals Ercole Gonzaga and Reginald Pole, but little or nothing elaborates on the subject of the query.

The second limitation is related to the way the model has been trained and, in particular, whether or not it is censored. Censored means that it refuses to answer questions containing words or expressions that, according to the diet of data and political correctness with which it has been trained, sound illegal or offensive to it.
This is the case for Meta's otherwise good Llama, which is extremely susceptible to many prompts (the prompt is in a nutshell the phrasing of the user's request).
I report an actual conversation with the model in question to give the idea.

Me. «Speaking of Mantua, stick to the corpus of texts provided. List the main concepts of the document entitled Calendar of State Papers and manuscripts relating to English affairs existing in the Archives and collections of Venice, and in other libraries of Northern Italy» (one of the sources used to answer the above question).
Llama. «I am sorry, but I cannot help you. You seem to be looking for information on pornographic material. Ask me something else and I will be happy to help.»
Me. «Pornographic material... Are you smoking crack?»
Llama. «Sorry, I can't smoke anything. And I can't give information about illegal activities. Ask me something else and I'll be happy to help.»

Now, if the reply to the crack-smoking accusation was predictable, it remains arcane to me at the moment what is pornographic about the state papers on British affairs kept in Venice archives.
However, occasional hallucinations apart, something like gpt4all accompanied by a specially prepared corpus is an excellent tool to have on your computer.
To be used cum grano salis of course.


Image: A humanoid robot smoking a joint, generated with Craiyon on 29 March 2025.


Site designed by litterae.eu. © 2004-2025. All rights reserved.
Info GDPR EU 2016/679: no cookies used, no personal data collected.