SetFit Multilingual Inference Example
This tutorial demonstrates how to use a pretrained SetFit model for multilingual text classification inference.
The script works on Windows, macOS, and Linux, and can be run using Python (via conda or venv
) or R with the reticulate
package.
Example Python Code
Here’s a step-by-step guide to loading the pretrained model and making multilingual predictions:
# Import SetFit Model from the setfit library
from setfit import SetFitModel
# Load the pretrained model for multilingual classification
model = SetFitModel.from_pretrained("automatedMotiveCoder/setfit")
# List of sample multilingual sentences
texts = [
"Du schaffst das schon.", # German
"Tu vas y arriver.", # French
"Zvládnete.", # Czech
"You'll manage.", # English
"Te las arreglarás.", # Spanish
"Saate hakkama." # Estonian
]
# Predict probabilities for each sentence
predictions = model.predict_proba(texts)
print(predictions)
When using the predict_proba()
method, the predicted probabilities for all classes might not sum to 1. This is because the model loaded uses a One-vs-Rest classification approach, which means the model treats each class as a binary classification problem. As a result, the probabilities are independent, and their sum may exceed or fall below 1.
Setup and Run (Python)
Follow one of the two methods below to set up your environment and run the script.
Sure! Here’s the updated section with the note that setting up venv
is more lightweight but may require more steps for Windows users:
Option 1: Using venv
This method is lightweight and works across all platforms. It’s recommended for users who want a minimal setup. However, note that setting up venv
might be more involved for Windows users compared to using Anaconda.
Installing Python and pip
Before using virtualenv
, make sure you have Python and pip
installed on your system. Follow these steps:
Install Python
Windows: Download the latest Python version from python.org. Make sure to check the box that says Add Python to PATH during installation.
macOS: Python should be pre-installed. If not, you can install it using Homebrew:
brew install python
Linux: Install Python through your package manager. For example, on Ubuntu:
sudo apt update sudo apt install python3 python3-pip
Install
pip
(Python’s package installer)
You can installpip
by running the following command (if it’s not already installed):python -m ensurepip --upgrade
Installing virtualenv
Once Python and pip
are installed, you can install virtualenv
:
pip install virtualenv
Setting Up the Virtual Environment
Now, you can create and activate your virtual environment:
# Create and activate the virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows (Note: Windows users may need to run `python -m venv venv` in a terminal with admin rights)
Note for Windows Users:
While venv
is more lightweight and doesn’t require extra packages like Anaconda, the setup process on Windows may involve additional steps, such as ensuring the environment variables are set correctly or running the terminal as an administrator for certain permissions. If you encounter difficulties, consider switching to the Anaconda setup, which is more straightforward for Windows.
Upgrade pip and Install Dependencies
Once the virtual environment is set up and activated, upgrade pip
and install the necessary dependencies:
pip install --upgrade pip
pip install setfit
Once the environment is ready, run your script:
python your_script.py
Option 2: Using Anaconda
Anaconda simplifies dependency management, making it especially useful for Windows users. If you don’t have Anaconda installed, follow these installation instructions first:
Installing Anaconda
You can download and install Anaconda from here. After installation, you can proceed with the following steps to set up your environment:
# Create a new Anaconda environment
conda create -n setfit_env python=3.10 -y
conda activate setfit_env
# Install pip and required dependencies
pip install --upgrade pip
pip install setfit
Run the script:
python your_script.py
Running in R with Reticulate
You can use the reticulate package to run the Python model directly from R. This way might be more convenient if you are already used to R, though it will increase the runtime due to the additional overhead.
Note: You may need to install conda
or virtualenv
support explicitly using reticulate::install_miniconda()
or by ensuring virtualenv
is available on your system.
1. Install Required Packages
In R:
install.packages("reticulate")
2. Set Up Python Environment
You can use either of the following options, where venv is generally recommended for all OSes. Conda might work better on Windows.
venv
library(reticulate)
virtualenv_create("setfit_env")
use_virtualenv("setfit_env", required = TRUE)
py_install("setfit", envname = "setfit_env", method = "auto")
Conda (Alternative)
library(reticulate)
install_miniconda() # Only needed if conda is not yet installed
conda_create("setfit_env", packages = "python=3.10")
use_condaenv("setfit_env", required = TRUE)
py_install("setfit", envname = "setfit_env", method = "auto", channel = "conda-forge") # make sure to not use "defaults"-channel (https://www.fz-juelich.de/en/rse/the_latest/the-anaconda-is-squeezing-us)
3. Run Python Code in R
library(reticulate)
if(!reticulate::virtualenv_exists("setfit_env")){
virtualenv_create("setfit_env", python="/usr/bin/python3.10") # make sure your python-version is supported both by reticulate as well as setfit
use_virtualenv("setfit_env", required = TRUE)
py_install("setfit", envname = "setfit_env", method = "auto")
}
# Use the environment appropriate to your setup
# use_condaenv("setfit_env", required = TRUE) # conda
use_virtualenv("setfit_env", required = TRUE) # virtualenv
setfit <- import("setfit")
model <- setfit$SetFitModel$from_pretrained("automatedMotiveCoder/setfit")
texts <- c(
"Du schaffst das schon.",
"Tu vas y arriver.",
"Zvládnete.",
"You'll manage.",
"Te las arreglarás.",
"Saate hakkama."
)
probs <- py_to_r(model$predict_proba(texts)$numpy())
probs <- cbind(probs, rowSums(probs))
rownames(probs) <- texts
colnames(probs) <- c(model$labels, "Sum")
print(probs)
## ach aff pow null Sum
## Du schaffst das schon. 0.28847598 0.009921958 0.03042893 0.6491311 0.9779580
## Tu vas y arriver. 0.07186226 0.017097502 0.02335468 0.8272591 0.9395735
## Zvládnete. 0.13917435 0.012112386 0.02983335 0.7684807 0.9496008
## You'll manage. 0.15298036 0.010970814 0.03377106 0.7286764 0.9263987
## Te las arreglarás. 0.01812695 0.020106626 0.02863201 0.9044114 0.9712769
## Saate hakkama. 0.03867746 0.009102110 0.02799679 0.9016467 0.9774230
When using the predict_proba()
method, the predicted probabilities for all classes might not sum to 1. This is because the model loaded uses a One-vs-Rest classification approach, which means the model treats each class as a binary classification problem. As a result, the probabilities are independent, and their sum may exceed or fall below 1.
Known issues
There are a few known issues that users might encounter while using this package. These issues are related to setting up the virtual environment (OS-specific) or specific to using R/RStudio. Below are some common issues and their respective fixes:
reticulate
If you encounter errors when converting Python results to R objects, it could be due to an incompatible Python version. To resolve this, ensure that you’re using a Python version supported by reticulate
. You can find the supported range with ?virtualenv_create
. Do also make sure that you are using the most recent version of reticulate
.
Mac
On Mac devices with M1, M2,… chips (i.e., ARM64), you may experience architecture errors when using venv. To resolve this, install Python via Homebrew and explicitly use that version in your R script.
To do this, first check that homebrew is installed and working:
brew doctor
If you do not get a positive response, install brew.
You can then install python3.x, where x is the python version you want to install, see the SetFit-pip-page for supported versions.
brew install python@3.x
In the R script, you can then set the python path for the installed python version:
python_arm_path <- "/opt/homebrew/bin/python3.x"
# Create a virtual environment explicitly using ARM Python
virtualenv_create("setfit_env", python = python_arm_path)
The rest of the script stays the same.