Project Details
Projects allow you to apply what you have learned throughout the course. The only hard constraint is that your project must implement an LLM-based system that does something meaningful. What exactly you build is your decision, as long as it is approved by the instructor and implemented in Python.
To give you some orientation, the following are the kinds of techniques covered in this course that a project could draw from:
- Retrieval Augmented Generation (RAG): the system queries a document index and references sources in its responses
- LLM-based Data Analysis: the system interprets a dataset and decides which analysis steps to execute
- Multi-step LLM Pipeline: multiple coordinated LLM calls work together, for example in a generator-reviewer or planner-executor arrangement, or using agent frameworks
- Fine-tuning: rather than implementing fine-tuning (compute and time constraints make this impractical for most projects), you design and justify a fine-tuning strategy for your use case: which model, which technique (e.g., LoRA, QLoRA), what training data and why, and what you would expect to gain over prompting alone. The strategy must be specific to your project; a generic description will not count.
Your project should integrate at least two of these (or comparable techniques). How you combine them, and for what purpose, is up to you.
Grading
Your project is graded on three criteria, each counting for one third of the project grade. Grades follow the scale defined in the Prüfungsverfahrensordnung (PVO).
1. Functionality
The system must run end-to-end. Setup instructions must work. What you build and how you combine techniques is your decision; the grade reflects whether the result does something non-trivial and whether the implementation is complete.
2. Code Quality
Your code is evaluated on structure, readability, and documentation. Follow the coding guidelines below.
3. Presentation
You present your project and answer technical questions. We are less interested in what you built than in whether you understand why you built it the way you did: the tradeoffs you made, what broke, and how you fixed it. Only features you present will be graded.
The presentation does not need to be slides. A live demo or a well-structured notebook walkthrough is equally valid.
Mandatory Requirements
Failure to meet any of these results in automatic failure, regardless of the grade earned on the three criteria above.
- The system runs with exactly the features presented on the day of presentation, no more, no less
- All required Moodle tasks completed (max 1 missing)
- Attendance and contribution to the presentation
Coding Guidelines
- No god-files. Split responsibilities across modules.
- Functions do one thing. If you need “and” to describe it, split it.
- Names are documentation: avoid
data,result,tmp,x. - No commented-out code in the final submission.
- Comments explain why, not what. If the name explains it, no comment is needed.
- No hardcoded model names or file paths; use constants or
.env. - No external APIs or services requiring registration or a paid account. All models must run locally or on infrastructure you control.
- If you copied it, you must be able to explain every line.
- If it works but you don’t know why, it doesn’t work.
Bonus
Additional technical depth, creative application, or production-ready features can earn bonus points. Complexity and justification matter more than quantity. The presentation is the feature proposal; you will only be graded on what you present and can explain.
Example Project Ideas:
- LLM Tourist Guide: Uses TA.SH data to provide travel tips and enhances them with generated images.
- Quarto Data Presentation Pipeline: Builds and illustrates a Quarto presentation based on a given open dataset.
- Synthetic Author: Generates commit-messages based on commit history/diff. It could also suggest GitHub issues illustrated with AI-generated images.
- AI Storyteller: Creates illustrated short stories for children based on historical events.
- AI Webdesigner A tool that creates and illustrates a webpage based on a Amazon product page.
A note about using LLMs for your Project
Let’s start with a small psychological demonstration.
Look at these anagrams:
\[ \begin{array}{ccc} \text{edbbal} & \rightarrow & \text{dabble} \\ \text{eaeslg} & \rightarrow & \text{eagles} \\ \text{fcbair} & \rightarrow & \text{fabric} \\ \text{elsmod} & \rightarrow & \text{models} \\ \text{actysh} & \rightarrow & \text{yachts} \end{array} \]
How long would you take to solve such an anagram?
- A: 30 sec
- B: > 30 sec, < 1 min
- C: > 1 min, < 1:30 min
- D: > 1:30 min, < 2 min
- E: > 2 min
\[ \begin{array}{ccc} \text{piemls} & \rightarrow & \text{???} \end{array} \]
I will not show you the solution, though I assure you that it is quite simple.
The main point of this demo is to illustrate the psychological bias often called overconfidence. This effect takes place when you underestimate the effort a reaching a solution takes when you are directly presented with the solution.
In terms of using Gen AI to solve tasks, findings in the same vain can be found in Stadler et al. (2024), who ran a study in which students were asked to research nanoparticles in sunscreen either using search engines or ChatGPT 3.5.
Their
Results indicated that students using LLMs experienced significantly lower cognitive load. However, despite this reduction, these students demonstrated lower-quality reasoning and argumentation in their final recommendations compared to those who used traditional search engines.
and they argue further that
[…] while LLMs can decrease the cognitive burden associated with information gathering during a learning task, they may not promote deeper engagement with content necessary for high-quality learning per se.
Giving a lecture about Gen AI and expecting the students to not use seems rather pointless, but we will use the presentation at the end of the semester to test if you do indeed understand your solution to test the depth of your engagement with the lecture’s contents.
