Project Details

Projects should allow students to apply what they’ve learned throughout the course. They must implement an LLM-based system that includes at least two of the following features:

The project should also include function-calling-based interface (“a tool”) to an AI image generator.

Students are free to choose their project topic, as long as it fits within the course scope and is approved by the instructor. All projects must be implemented in Python.

The active participation on the course will be taken into account before grading. This means that all tasks asking the students to upload their results to moodle should be completed. If more than one of the required tasks is missing, the student will not be graded.

The projects are to be presented in the last session of the course. The students of each group need to take part in this session. The presentation will become part of the overall grade. The presentation can but does not have to be prepared in PPT, any other mode of presentation (including a live-demo based on a nice notebook) is fine.

Grading Rubric

Grading Scale: German Notenpunkte System
Passing Grade: 20 points (4 Notenpunkte)

Binary Requirements (MUST ALL BE MET TO PASS)

Failure to meet any of these requirements results in automatic failure (0 Notenpunkte), regardless of points earned.

Once all binary requirements are met, the project receives 20 points (4- grade) as baseline.

Point-Based Evaluation

Maximum Points: 40 points (20 baseline + up to 40 additional = 60 total before bonus)

Core Components Quality (up to 20 points)

Choose at least TWO (required for binary requirements above):

  • Retrieval Augmentation/RAG with source citations (up to 10 points)
  • Data Analysis capabilities with executable analysis steps (up to 10 points)
  • Multi-step LLM pipeline with multiple coordinated LLM calls (e.g., generator-reviewer, planner-executor, or agent systems) (up to 10 points)
  • Fine-tuning on (synthetic) data (up to 10 points)

Points awarded based on implementation quality, completeness, and sophistication.

Implementation Quality (up to 5 points)

  • Code quality, structure, and documentation (up to 5 points)

Presentation (up to 15 points)

  • Time management (up to 3 points)
  • Question handling & understanding demonstration (up to 8 points)
  • Clarity of explanation and live demo (up to 4 points)

TOTAL: 20 baseline + up to 40 points = 60 points maximum before bonus

Bonus Features

Maximum Bonus Points: 40 points (total cannot exceed 100 points)

Additional Core Components

  • Implementation of 3rd core component (up to 10 points)
  • Implementation of all 4 core components (up to 10 points for 4th)

Advanced Features (examples, non-exhaustive):

  • Prompt optimization/engineering framework (~5-10 points)
  • Custom evaluation metrics for LLM outputs (~5-10 points)
  • Streaming responses with proper UI (~5 points)
  • Multi-modal input handling (text + images) (~10 points)
  • Advanced RAG (hybrid search, re-ranking, query rewriting) (~10-15 points)
  • Chain-of-thought or reasoning traces visualization (~5-10 points)
  • LLM caching/optimization strategies (~5 points)
  • Synthetic data generation pipeline for fine-tuning (~10 points)
  • Agent memory/conversation history management (~5-10 points)
  • MCP (Model Context Protocol) integration (~10-15 points)
  • LoRA/QLoRA implementation details and analysis (~10 points)
  • Production-ready deployment (containerization, API) (~10-15 points)
  • Novel combination or creative application of techniques (~5-25 points)

This list is not exhaustive. Be creative and propose your own features!

Example Project Ideas:

  1. LLM Tourist Guide: Uses TA.SH data to provide travel tips and enhances them with generated images.
  2. Quarto Data Presentation Pipeline: Builds and illustrates a Quarto presentation based on a given open dataset.
  3. Synthetic Author: Generates commit-messages based on commit history/diff. It could also suggest GitHub issues illustrated with AI-generated images.
  4. AI Storyteller: Creates illustrated short stories for children based on historical events.
  5. AI Webdesigner A tool that creates and illustrates a webpage based on a Amazon product page.

A note about using LLMs for your Project

Let’s start with a small psychological demonstration.

Look at these anagrams:

\[ \begin{array}{ccc} \text{edbbal} & \rightarrow & \text{dabble} \\ \text{eaeslg} & \rightarrow & \text{eagles} \\ \text{fcbair} & \rightarrow & \text{fabric} \\ \text{elsmod} & \rightarrow & \text{models} \\ \text{actysh} & \rightarrow & \text{yachts} \end{array} \]

How long would you take to solve such an anagram?

  • A: 30 sec
  • B: > 30 sec, < 1 min
  • C: > 1 min, < 1:30 min
  • D: > 1:30 min, < 2 min
  • E: > 2 min

\[ \begin{array}{ccc} \text{piemls} & \rightarrow & \text{???} \end{array} \]

Please take your time to think about how long you will take before clicking here to unveiling the anagram you are to solve.


I will not show you the solution, though I assure you that it is quite simple.

The main point of this demo is to illustrate the psychological bias often called overconfidence. This effect takes place when you underestimate the effort a reaching a solution takes when you are directly presented with the solution.

In terms of using Gen AI to solve tasks, findings in the same vain can be found in Stadler et al. (2024), who ran a study in which students were asked to research nanoparticles in sunscreen either using search engines or ChatGPT 3.5.

Their

Results indicated that students using LLMs experienced significantly lower cognitive load. However, despite this reduction, these students demonstrated lower-quality reasoning and argumentation in their final recommendations compared to those who used traditional search engines.

and they argue further that

[…] while LLMs can decrease the cognitive burden associated with information gathering during a learning task, they may not promote deeper engagement with content necessary for high-quality learning per se.

Giving a lecture about Gen AI and expecting the students to not use seems rather pointless, but we will use the presentation at the end of the semester to test if you do indeed understand your solution to test the depth of your engagement with the lecture’s contents.

Show solution

Stadler, M., Bannert, M., & Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior, 160, 108386. https://doi.org/10.1016/j.chb.2024.108386