14 July 2025 - Artificial Intelligence An Interdisciplinary Approach to Open-source AI: A spotlight on YoKI
Since 2018, Marcus Buchwald has been working in the fields of medical informatics, data analysis and artificial intelligence (AI) in medicine, and he was instrumental in the development of the university’s AI platform YoKI. Marcus Buchwald is a doctoral student and member of the research group at the Engineering Mathematics and Computing Lab (EMCL) at Heidelberg University. He is also part of the Data Analysis and Modeling in Medicine research group at the Mannheim Institute for Intelligent Systems in Medicine (MIISM) at the Mannheim Faculty of Medicine. Marcus Buchwald is pursuing a PhD in generative AI models in medicine.
In an interview, he gave an overview of YoKI's large language models (LLMs), explaining how they work and why the term “artificial intelligence” can be misleading.
Who was involved in YoKI's development and what are the advantages of the platform?
From the very beginning, the development of YoKI was a team project involving the EMCL group and the University Computing Centre. The URZ, namely Holger Altenbach from the Core Technologies and Collaboration service areas, is responsible for technical aspects and server provision. Alexander Zeillmann, Jonas Roller, Pascal Memmesheimer and I selected the various models for YoKI, configured them and tested them extensively.
When YoKI was being developed, data sovereignty was a priority right from the outset. The biggest advantage of the platform is that it allows you to use a variety of models, all of which are operated on premises, i.e., on university servers. This means that all models can be used in full compliance with data protection regulations because the data never leaves the university—which is crucial for work in an academic context.
Which models were selected for YoKI and will others be added in the future?
When we were selecting the models, we made sure that they were all open source so they could be run on university servers and customized for our needs. Currently, there are four models to choose from, with the Meta-Llama language model set as the default. Llama and DeepSeek are powerful general-purpose tools for text generation and processing. Qwen and Aya are specialized models available as additional options. Aya from Cohere is multilingual and can chat in 101 languages, while Qwen can handle specialized questions about coding and assist with programming.
The numbers in the model names represent the number of parameters used to train them (Llama 70B = 70 Billions, equivalent to 70 billion parameters). Similar to neurons in the neural network of the brain, the parameters represent the size, depth, and complexity of the model network. With DeepSeek, we selected a language model that made a name for itself by being significantly less energy- and cost-intensive and much less complex. DeepSeek applies optimization techniques from machine learning such as model pruning and distillation. This involves transferring knowledge from a large model to a smaller one (distilling it) without a significant loss of accuracy. Unnecessary parameters or entire layers from the pretrained neural network are additionally removed using a process known as pruning without significantly impacting performance. This is also reflected in the name (e.g., DeepSeek-R1-Distill-Qwen-32B).
YoKI allows students and employees to see the limitations and differences between the various models for themselves without reservations. When comparing them, clear differences can be seen, for example, when the same prompt is given.
As the newer versions (from 3.2) of Llama are not approved under the EU AI Act, other models are also being considered for university use, such as Mistral, which was developed for the EU. So, we will be continuously developing and adapting YoKI and its language models.
In a nutshell: What are your key takeaways on LLMs?
Language models are powerful tools that can be very helpful, but they should by no means be seen as universal problem solvers for every type of task. Although they are now catching up in this area, they are still very limited in solving tasks in mathematics and logic because they are primarily trained to understand language in order to generate realistic texts.
The term AI is misleading because the methodology of language models is based on the calculation of probabilities, e.g., of word sequences or context. As the models become more and more powerful, I think that the quality of the input, i.e., prompt engineering, will become less relevant in future because the models will be better able to extract the core of a prompt statement or question, even if it is vaguely formulated. However, a responsible approach to generative AI will remain important due to bias, hallucinations, security, ethical issues and environmental impact.
There is also now a chatbot for IT Service as a test model. How does it work and what can it be used for?
The IT Service chatbot is currently based on the Llama model. It is intended to answer practical questions about IT problems. For this purpose, the language model was specially supplemented with the websites, how-tos and other information from the URZ. The system has been designed so that the chatbot can respond as an IT Service employee but cannot provide information on other questions.
In technical terms, this was achieved using a Retrieval Augmented Generation (RAG) approach. A RAG model combines targeted access to relevant data from clearly defined sources such as the URZ website with a generative, pre-trained language model. RAG therefore has access to data relevant to IT Service, such as how-tos or the Service Catalogue, and can provide support responses based on this information. As a result, it is specifically tailored to the needs of IT Service. This type of solution allows data sources with specific support information such as documents, databases or tutorials to be precisely specified. This ensures that the origin of the data remains transparent and sensitive information can be processed in a controlled manner. The chatbot is still in its test phase and will be further modified so that the answers and links provided are even more relevant and accurate.
This should serve as a basis for developing and using chatbots for specific questions for other applications in administration, institutions and departments in the future.
Last but not least: How do you use AI models apart from your research?
Personally, I use Claude, for example, for programming, particularly for creating classes and functions in Python.
Thank you very much for your insights!
Profile: Marcus Buchwald
Marcus Buchwald is a medical physicist and has been researching generative AI models for his doctorate under Prof. Dr. Vincent Heuveline and Prof. Dr. Jürgen Hesser since 2022. These models aim to predict changes in the image data acquired from patients during treatment, taking into account factors such as medication, risks, age, gender and previous illnesses. His work encompasses not only technical issues, but also methodological issues in particular, as any AI model can only produce good results if the collection and quality of the data used to train the system meets the requisite scientific standards.
“Clean” data, i.e., correct, complete, up-to-date and consistent data, is a major challenge in medical informatics.
In the future, research work of this kind should enable the development of predictive AI models that can be used to help “predict” the best possible, precisely tailored, individualized treatment and its progression for patients.