Info page Large Language Models: Concepts and Usage

This info page provides an overview of the underlying principles as well as the potential applications and limitations of large language models (LLMs). On the university's internal AI platform YoKI, you can interact with several open-source language models without compromising data protection.

What are large language models?

Language models are a foundational class of artificial intelligence systems that are capable of processing human language and generating or reading texts in natural language. LLMs such as GPT-4, LLaMA and Qwen use very large and complex neural network structures and massive amounts of text data to recognize statistical patterns in human languages and apply them to tasks such as text generation, translation, summarizing and answering questions.

What is the ideas underlying large language models?

The power of statistics

Large language models can perform tasks in various areas with little to no task-specific training by accessing general knowledge encoded during pre-training. 

  • Its capabilities are statistical and mathematical, not factual or semantic: Large language models do not “understand” text in the same way that humans do, as they do not possess consciousness. It can only predict probable sequences based on the data received during training. This involves highly complex and resource-intensive mathematical calculations, which are used to model human speech behavior as accurately as possible. As a result, LLMs do not work with the semantic meanings of words as we do, but with mathematical representations and vectorized units.
  • Our entire language is made up of these small units: Before a text is processed by an LLM, it is broken down into smaller units called tokens. Tokens are units of text and can be sequences of letters and characters, individual words, parts of words or sentences. They calculate the probability distribution of possible subsequent tokens and select the one which is most probable and contextually appropriate.

From individual sequences to the full context

Modern large language models use what is called "transformer architecture." This architecture enables input sequences to be processed simultaneously rather than sequentially, token by token. This means that the model considers all words or characters in a sentence or paragraph at the same time. This allows the model to take into account the context of a chat history over long sequences. The model responds not only to a single word or sentence, but also to the entire context of the conversation.

A typical usage example

Imagine you are using a language model to write about a book. Due to its transformer architecture, it can

  • understand the context of your conversation across multiple sentences and process the content of previous inputs,
  • analyze how your statements relate to each other, and, if necessary, create a small profile on you based on the chat.

This means, for example, that you don't have to use the name of the book again later in the conversation, but you can ask further questions about characters in the book, etc. Based on context analysis, the system understands whether your questions refer to the previous context or are establishing a new context.

What are the usage limitations of large language models?

Language models can "hallucinate."

A large language model (and more generally any generative system) "hallucinates" when it produces content that appearsplausible to laypersons but is factually incorrect or completely made up. Errors can be divided into the following categories:

  • Factual errors: Statements about events or facts that never took place or are partially inaccurate or false.
  • Fabricated content: Inventing quotes, titles of research papers and publications, or historical facts
  • Contradictions: Conflicting answers in different contexts

Because of the statistical structure of LLMs, there can be no guarantee that they will generate true statements. This is because the most probable answer is not necessarily the correct answer. They are not connected to a database or programmed with a truth system, so they rely solely on patterns from training data, which can be misleading in specific use cases. In research, applying general patterns to unsuitable, specific cases is called “overgeneralization.” 

Why do hallucinations occur?

In addition to the causes attributable to the statistical nature of LLMs, there are several other factors that contribute to the occurrence of hallucinations:

  • A model can only be as good as the data sets it receives during training. Current events in particular can be very difficult for language models to handle, as the training data has to be “cut off” at a certain point in time.
  • Ambiguity in a user query can lead to reduced output quality because the system is unable to determine exactly what is being requested.
  • Input that is too complex and broad can exceed a model's token window, causing information to be lost.
  • Transformer models generally prioritize text flow over accuracy.
  • Certain decoding methods that have not been tailored to the user's application context can amplify errors.

Although modern large language models continue to reduce the occurrence of hallucinated responses, they will still occasionally hallucinate. That is why is it is important to have a basic understanding of the concept of large language models and their weaknesses. Their primary task is to process and generate natural language texts.

Further challenges

In addition to the technical challenges, there are other concerns, such as:

  • toxic or discriminatory language due to possible biases in the data sets
  • the high, environmentally harmful energy consumption
  • data protection and copyright (which texts were used for training and was this done with agreement from the authors? How is sensitive personal data contained in inputs handled?)
  • transparency and comprehensible traceability of output for legal issues.

Why open-source models?

Using open-source LLMs in research-related settings offers a number of advantages, such as:

  • higher transparency for research
  • reproducibility of AI outputs
  • independence from large tech companies 

With increasing support for tools such as vLLM and Ollama, the creation and deployment of application is now more accessible to individual developers. Commercial models such as GPT-4, Claude and Gemini are very powerful, but open-source alternatives such as LLaMA 3.1 or Qwen are quickly catching up. In the link bar, you can find a more comprehensive overview of available commercial and open-source language models.

Commercial and open-source language models in comparison:

Table

Access
easy to download and self-hosting possible
cloud-based and accessible only via API interface
Cost
free (addition hardware required)
subscription or "pay-per-call" system
Customization
complete, autonomous fine-tuning and quantization
limited customization settings
Data protection
local processing
cloud service from third parties
Performance
for many tasks, nearly as good as GPT-3.5
GPT-4 level (Claude 3 still highest performing)