Project Details

Projects should allow students to apply what they’ve learned throughout the course. They must implement an LLM-based system that includes at least two of the following features:

Retrieval Augmentation/RAG (i.e., the system should query documents or other content in an index for its answers and reference the sources of its generation)
Data Analysis (i.e., the system should “talk” to a dataset and decide on which analysis-steps are to be taken to then execute them)
Multiple Agents (i.e., at least two agents should work in tandem, for example in a generator-reviewer arrangement)
Fine-tuning on (Synthetic) Data (i.e., a small LM or SDM should be finetuned on (synthetic) data to adapt it to your needs. You could as an example train a model to only answer in nouns.)

The project should also include function-calling-based interface (“a tool”) to an AI image generator.

Students are free to choose their project topic, as long as it fits within the course scope and is approved by the instructor. All projects must be implemented in Python.

The active participation on the course will be taken into account before grading. This means that all tasks asking the students to upload their results to moodle should be completed. If more than one of the required tasks is missing, the student will not be graded.

The projects are to be presented in the last session of the course. The students of each group need to take part in this session. The presentation will become part of the overall grade. The presentation can but does not have to be prepared in PPT, any other mode of presentation (including a live-demo based on a nice notebook) is fine.

The project will then be graded based on these contents in addition to the following criteria:

The minimum of components mentioned above have to be used
The more components are used, the better the grade
The project-solution has to work.(Since we are talking about LLMs it does not have to generate perfect results, the pipeline has to generally work though.)
The students have to hand in code the instructors can run. The code has to be documented. This can be done either in sensible docstrings, appropriately commented notebooks or a report. The students can choose the mode. It is possible and recommended to create a github repository with the code and the documentation.

Example Project Ideas:

LLM Tourist Guide: Uses TA.SH data to provide travel tips and enhances them with generated images.
Quarto Data Presentation Pipeline: Builds and illustrates a Quarto presentation based on a given open dataset.
Synthetic Author: Generates commit-messages based on commit history/diff. It could also suggest GitHub issues illustrated with AI-generated images.
AI Storyteller: Creates illustrated short stories for children based on historical events.
AI Webdesigner A tool that creates and illustrates a webpage based on a Amazon product page.

A note about using LLMs for your Project

Let’s start with a small psychological demonstration.

Look at these anagrams:

\[ \begin{array}{ccc} \text{edbbal} & \rightarrow & \text{dabble} \\ \text{eaeslg} & \rightarrow & \text{eagles} \\ \text{fcbair} & \rightarrow & \text{fabric} \\ \text{elsmod} & \rightarrow & \text{models} \\ \text{actysh} & \rightarrow & \text{yachts} \end{array} \]

How long would you take to solve such an anagram?

A: 30 sec
B: > 30 sec, < 1 min
C: > 1 min, < 1:30 min
D: > 1:30 min, < 2 min
E: > 2 min

\[ \begin{array}{ccc} \text{piemls} & \rightarrow & \text{???} \end{array} \]

Please take your time to think about how long you will take before clicking here to unveiling the anagram you are to solve.

I will not show you the solution, though I assure you that it is quite simple.

The main point of this demo is to illustrate the psychological bias often called overconfidence. This effect takes place when you underestimate the effort a reaching a solution takes when you are directly presented with the solution.

In terms of using Gen AI to solve tasks, findings in the same vain can be found in Stadler et al. (2024), who ran a study in which students were asked to research nanoparticles in sunscreen either using search engines or ChatGPT 3.5.

Their

Results indicated that students using LLMs experienced significantly lower cognitive load. However, despite this reduction, these students demonstrated lower-quality reasoning and argumentation in their final recommendations compared to those who used traditional search engines.

and they argue further that

[…] while LLMs can decrease the cognitive burden associated with information gathering during a learning task, they may not promote deeper engagement with content necessary for high-quality learning per se.

Giving a lecture about Gen AI and expecting the students to not use seems rather pointless, but we will use the presentation at the end of the semester to test if you do indeed understand your solution to test the depth of your engagement with the lecture’s contents.

Show solution

Stadler, M., Bannert, M., & Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior, 160, 108386. https://doi.org/10.1016/j.chb.2024.108386