Using SetFit

SetFit Multilingual Inference Example

This tutorial demonstrates how to use a pretrained SetFit model for multilingual text classification inference.

The script works on Windows, macOS, and Linux, and can be run using Python (via conda or venv) or R with the reticulate package.


Example Python Code

Here’s a step-by-step guide to loading the pretrained model and making multilingual predictions:

# Import SetFit Model from the setfit library
from setfit import SetFitModel

# Load the pretrained model for multilingual classification
model = SetFitModel.from_pretrained("automatedMotiveCoder/setfit")

# List of sample multilingual sentences
texts = [
    "Du schaffst das schon.",  # German
    "Tu vas y arriver.",       # French
    "Zvládnete.",              # Czech
    "You'll manage.",          # English
    "Te las arreglarás.",      # Spanish
    "Saate hakkama."           # Estonian
]

# Predict probabilities for each sentence
predictions = model.predict_proba(texts)
print(predictions)

When using the predict_proba() method, the predicted probabilities for all classes might not sum to 1. This is because the model loaded uses a One-vs-Rest classification approach, which means the model treats each class as a binary classification problem. As a result, the probabilities are independent, and their sum may exceed or fall below 1.


Setup and Run (Python)

Follow one of the two methods below to set up your environment and run the script.

Sure! Here’s the updated section with the note that setting up venv is more lightweight but may require more steps for Windows users:


Option 1: Using venv

This method is lightweight and works across all platforms. It’s recommended for users who want a minimal setup. However, note that setting up venv might be more involved for Windows users compared to using Anaconda.

Installing Python and pip

Before using virtualenv, make sure you have Python and pip installed on your system. Follow these steps:

  1. Install Python

    • Windows: Download the latest Python version from python.org. Make sure to check the box that says Add Python to PATH during installation.

    • macOS: Python should be pre-installed. If not, you can install it using Homebrew:

      brew install python
    • Linux: Install Python through your package manager. For example, on Ubuntu:

      sudo apt update
      sudo apt install python3 python3-pip
  2. Install pip (Python’s package installer)
    You can install pip by running the following command (if it’s not already installed):

    python -m ensurepip --upgrade

Installing virtualenv

Once Python and pip are installed, you can install virtualenv:

pip install virtualenv

Setting Up the Virtual Environment

Now, you can create and activate your virtual environment:

# Create and activate the virtual environment
python -m venv venv
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate     # Windows (Note: Windows users may need to run `python -m venv venv` in a terminal with admin rights)

Note for Windows Users:

While venv is more lightweight and doesn’t require extra packages like Anaconda, the setup process on Windows may involve additional steps, such as ensuring the environment variables are set correctly or running the terminal as an administrator for certain permissions. If you encounter difficulties, consider switching to the Anaconda setup, which is more straightforward for Windows.

Upgrade pip and Install Dependencies

Once the virtual environment is set up and activated, upgrade pip and install the necessary dependencies:

pip install --upgrade pip
pip install setfit

Once the environment is ready, run your script:

python your_script.py

Option 2: Using Anaconda

Anaconda simplifies dependency management, making it especially useful for Windows users. If you don’t have Anaconda installed, follow these installation instructions first:

Installing Anaconda

You can download and install Anaconda from here. After installation, you can proceed with the following steps to set up your environment:

# Create a new Anaconda environment
conda create -n setfit_env python=3.10 -y
conda activate setfit_env

# Install pip and required dependencies
pip install --upgrade pip
pip install setfit

Run the script:

python your_script.py

Running in R with Reticulate

You can use the reticulate package to run the Python model directly from R. This way might be more convenient if you are already used to R, though it will increase the runtime due to the additional overhead.

Note: You may need to install conda or virtualenv support explicitly using reticulate::install_miniconda() or by ensuring virtualenv is available on your system.

1. Install Required Packages

In R:

install.packages("reticulate")

2. Set Up Python Environment

You can use either of the following options, where venv is generally recommended for all OSes. Conda might work better on Windows.

venv

library(reticulate)
virtualenv_create("setfit_env")
use_virtualenv("setfit_env", required = TRUE)
py_install("setfit", envname = "setfit_env", method = "auto")

Conda (Alternative)

library(reticulate)
install_miniconda()  # Only needed if conda is not yet installed
conda_create("setfit_env", packages = "python=3.10")
use_condaenv("setfit_env", required = TRUE)
py_install("setfit", envname = "setfit_env", method = "auto", channel = "conda-forge") # make sure to not use "defaults"-channel (https://www.fz-juelich.de/en/rse/the_latest/the-anaconda-is-squeezing-us)

3. Run Python Code in R

library(reticulate)

if(!reticulate::virtualenv_exists("setfit_env")){
  virtualenv_create("setfit_env", python="/usr/bin/python3.10") # make sure your python-version is supported both by reticulate as well as setfit
  use_virtualenv("setfit_env", required = TRUE)
  py_install("setfit", envname = "setfit_env", method = "auto")
}

# Use the environment appropriate to your setup
# use_condaenv("setfit_env", required = TRUE) # conda
use_virtualenv("setfit_env", required = TRUE) # virtualenv


setfit <- import("setfit")
model <- setfit$SetFitModel$from_pretrained("automatedMotiveCoder/setfit")
texts <- c(
  "Du schaffst das schon.",
  "Tu vas y arriver.",
  "Zvládnete.",
  "You'll manage.",
  "Te las arreglarás.",
  "Saate hakkama."
)
probs <- py_to_r(model$predict_proba(texts)$numpy())
probs <- cbind(probs, rowSums(probs))
rownames(probs) <- texts
colnames(probs) <- c(model$labels, "Sum")
print(probs)
##                               ach         aff        pow      null       Sum
## Du schaffst das schon. 0.28847598 0.009921958 0.03042893 0.6491311 0.9779580
## Tu vas y arriver.      0.07186226 0.017097502 0.02335468 0.8272591 0.9395735
## Zvládnete.             0.13917435 0.012112386 0.02983335 0.7684807 0.9496008
## You'll manage.         0.15298036 0.010970814 0.03377106 0.7286764 0.9263987
## Te las arreglarás.     0.01812695 0.020106626 0.02863201 0.9044114 0.9712769
## Saate hakkama.         0.03867746 0.009102110 0.02799679 0.9016467 0.9774230

When using the predict_proba() method, the predicted probabilities for all classes might not sum to 1. This is because the model loaded uses a One-vs-Rest classification approach, which means the model treats each class as a binary classification problem. As a result, the probabilities are independent, and their sum may exceed or fall below 1.

Known issues

There are a few known issues that users might encounter while using this package. These issues are related to setting up the virtual environment (OS-specific) or specific to using R/RStudio. Below are some common issues and their respective fixes:

reticulate

If you encounter errors when converting Python results to R objects, it could be due to an incompatible Python version. To resolve this, ensure that you’re using a Python version supported by reticulate. You can find the supported range with ?virtualenv_create. Do also make sure that you are using the most recent version of reticulate.

Mac

On Mac devices with M1, M2,… chips (i.e., ARM64), you may experience architecture errors when using venv. To resolve this, install Python via Homebrew and explicitly use that version in your R script.

To do this, first check that homebrew is installed and working:

brew doctor

If you do not get a positive response, install brew.

You can then install python3.x, where x is the python version you want to install, see the SetFit-pip-page for supported versions.

brew install python@3.x

In the R script, you can then set the python path for the installed python version:

python_arm_path <- "/opt/homebrew/bin/python3.x"

# Create a virtual environment explicitly using ARM Python
virtualenv_create("setfit_env", python = python_arm_path)

The rest of the script stays the same.

Max Brede
KI-Projektmanager - FuE FH Kiel GmbH | Wissenschaftlicher Mitarbeiter CAU zu Kiel