Tasks
Natural Language Processing Models
Table Question Answering
Table Question Answering using a pipeline
We will use the pipeline for Table Question Answering. We will create simple synthetic data. The table will display the names of the products and the number of products.
from transformers import pipeline
import pandas as pd
# prepare table
data = {"Products": ["jeans", "jackets", "shirts"], "Number of products": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
#prepare your question
question = "how many shirts are there?"
# pipeline model
tqa = pipeline(task="table-question-answering", model="google/tapas-large-finetuned-wtq", aggregator="SUM")
# result
print(tqa(table=table, query=question))
{'answer': 'SUM > 69', 'coordinates': [(2, 1)], 'cells': ['69'], 'aggregator': 'SUM'}
If we change the data and make two columns for the shirts, the answer changes:
#new data
data = {"Products": ["jeans", "jackets", "shirts", "shirts"], "Number of products": ["87", "53", "69", "21"]}
print(tqa(table=table, query=question))
The answer:
{'answer': 'COUNT > 69, 21', 'coordinates': [(2, 1), (3, 1)], 'cells': ['69', '21'], 'aggregator': 'COUNT'}
We can get the total number of shirts:
z = tqa(table=table, query=question)["cells"]
x= []
for i in z:
x.append(int(i))
print(sum(x))
The answer is 90.
*google/tapas-large-finetuned-wtq model from Hugging Face — licensed under the Apache 2.0 License.
Table Question Answering Model
You can load the Table Question Answering Model directly. We will use the same data.
from transformers import TapasTokenizer, TapasForQuestionAnswering
import pandas as pd
import torch
# Load model and tokenizer
model_name = "google/tapas-base-finetuned-wtq"
tokenizer = TapasTokenizer.from_pretrained(model_name)
model = TapasForQuestionAnswering.from_pretrained(model_name)
# Example table
data = {"Products": ["jeans", "jackets", "shirts"], "Number of products": ["87", "53", "69"]}
table = pd.DataFrame.from_dict(data)
# Question
question = "how many shirts are there?"
# Tokenize inputs
inputs = tokenizer(table=table, queries=[question], return_tensors="pt")
# Forward pass
with torch.no_grad():
outputs = model(**inputs)
# Decode predicted answer
logits = outputs.logits
logits_agg = outputs.logits_aggregation
# Get the most probable cell answer
predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(
inputs,
outputs.logits,
outputs.logits_aggregation
)
# Extract the answer from the table
answers = []
for coordinates in predicted_answer_coordinates:
if not coordinates:
answers.append("No answer found.")
else:
cell_values = [table.iat[row, column] for row, column in coordinates]
answers.append(", ".join(cell_values))
# Print the result
print("Answer:", answers[0])
Answer: 69
*The pipeline model, google/tapas-large-finetuned-wtq based on code from https://huggingface.co/google/tapas-large-finetuned-wtq (Apache 2.0)
The model's syntax can be a bit complex. Let's analyze this step by step. logits are raw output scores. logits_aggregation returns the scores of numeric aggregation operations. The model can perform basic operations like SUM using the table data.
Zero-shot classification
Zero-shot classification using a pipeline
Zero-shot classification is used to predict the class of unknown data. Zero-shot classification models require text and labels. Let's see an example of Zero-shot classification using a pipeline:
from transformers import pipeline
classifier = pipeline("zero-shot-classification")
print(classifier(
"Is this a good time to buy gold?",
candidate_labels=["education", "politics", "business", "finance"]
))
{'sequence': 'Is this a good time to buy gold?', 'labels': ['finance', 'business', 'education', 'politics'], 'scores': [0.5152193307876587, 0.38664010167121887, 0.057615164667367935, 0.040525417774915695]}
You see the results in descending order. The "finance" label has the highest score.
Zero-Shot classification model
You can load the model directly:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
# Load model and tokenizer
model_name = "facebook/bart-large-mnli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Input sentence
sequence = "The pi is the ratio of the circumference of any circle to the diameter of that circle"
# Candidate labels
labels = ["education", "psychology", "sports", "finance", "math"]
# Create NLI-style premise-hypothesis pairs
premise = sequence
hypotheses = [f"This text is about {label}." for label in labels]
# Tokenize and get model outputs for each hypothesis
inputs = tokenizer([premise]*len(hypotheses), hypotheses, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
logits = model(**inputs).logits
# Convert logits to probabilities (softmax over entailment class)
entailment_logits = logits[:, 2]
probabilities = F.softmax(entailment_logits, dim=0)
print(probabilities)
# Print results
for label, score in zip(labels, probabilities):
print(f"{label}: {score:.4f}")
tensor([0.0125, 0.0091, 0.0089, 0.0109, 0.9586])
education: 0.0125
psychology: 0.0091
sports: 0.0089
finance: 0.0109
math: 0.9586
*facebook/bart-large-mnli model from Hugging Face — licensed under the MIT License.
The BART MNLI model has complex syntax rules. Let's simplify this. We need to get model outputs for each hypothesis. There are 5 labels. Therefore, the premise ("sequence") must be provided five times. The model returns logits for contradiction, neutral, and entailment. We are interested in entailment, and its index is 2. That's why we selected the logits at index 2.
What's softmax?
softmax in PyTorch is applied to all slices along dim, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1.
The sum of the scores [0.0125 + 0.0091 + 0.0089 + 0.0109 + 0.9586] in the example above is 1 and the "math" label has the highest score.
For more information about softmax, visit the PyTorch docs.
Fill-Mask
Fill-Mask task using a pipeline
The fill-mask models replace the masked word/words in a sentence.
from transformers import pipeline
unmasker = pipeline("fill-mask")
print(unmasker("The most popular sport in the world is <mask>.", top_k=2))
[{'score': 0.11612111330032349, 'token': 4191, 'token_str': ' soccer', 'sequence': 'The most popular sport in the world is soccer.'},
{'score': 0.10927936434745789, 'token': 5630, 'token_str': ' cricket', 'sequence': 'The most popular sport in the world is cricket.'}]
Fill-Mask model
You can also load the model directly:
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
"google-bert/bert-base-uncased"
)
model = AutoModelForMaskedLM.from_pretrained(
"google-bert/bert-base-uncased",
torch_dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa"
)
#See the device type explanation below
inputs = tokenizer("The most popular sport in the world is [MASK].", return_tensors="pt").to("mps")
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits
masked_index = torch.where(inputs['input_ids'] == tokenizer.mask_token_id)[1]
predicted_token_id = predictions[0, masked_index].argmax(dim=-1)
prediction = tokenizer.decode(predicted_token_id)
print(f"The most popular sport in the world is {prediction}.")
The most popular sport in the world is football.
*google-bert/bert-base-uncased model from Hugging Face — licensed under the Apache 2.0 License.
You can use "mps" for macOS and "cuda" for devices compatible with CUDA. You can also remove it.
What's argmax?
The argmax returns the indices of the maximum value of all elements in the input tensor.
It returns the index of the maximum value to decode in the example above. For more information about argmax, visit the PyTorch docs.
Question Answering
Question Answering pipeline
There are different types of Question Answering (QA) tasks. If you use a pipeline for QA without specifying a model, the distilbert/distilbert-base-cased-distilled-squad model is used. It is used for extractive QA tasks. In other words, the model extracts the answer from a given text. Let's see an example of an extractive QA task using a pipeline:
from transformers import pipeline
question_answerer = pipeline("question-answering")
print(question_answerer(
question="Where does Julia live?",
context="Julia is 40 years old. She lives in London and she works as a nurse."
))
{'score': 0.9954689741134644, 'start': 36, 'end': 42, 'answer': 'London'}
Question Answering model
You can load the QA model directly:
from transformers import AutoTokenizer, BertForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained("deepset/bert-base-cased-squad2")
model = BertForQuestionAnswering.from_pretrained("deepset/bert-base-cased-squad2")
#question, text
question, text = "Where does Julia live?", "Julia is 40 years old. She lives in London and she works as a nurse."
#tokenize question and text
inputs = tokenizer(question, text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
result = tokenizer.decode(predict_answer_tokens, skip_special_tokens=True)
print(result)
{'score': 0.9954689741134644, 'start': 36, 'end': 42, 'answer': 'London'}
*deepset/bert-base-cased-squad2 model from Hugging Face — licensed under the CC BY 4.0 License.
Translation
Translation using a pipeline
Our model will translate a sentence from French to English. However, there are other models for other languages.
from transformers import pipeline
translator = pipeline("translation", "Helsinki-NLP/opus-mt-fr-en")
print(translator("C'est un beau roman."))
Translation model
We will use the same model but it will translate the sentence from English to French:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
text = "The food is very delicious."
inputs = tokenizer(text, return_tensors="pt").input_ids
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
*"Helsinki-NLP/opus-mt-en-fr”model from Hugging Face — licensed under the Apache 2.0 License.
Summary
Summary using a pipeline
You can use summary models to summarize a text:
from transformers import pipeline
from datasets import load_dataset
ds = load_dataset("dataset_name")
text = ds["train"][0]["context"]
classifier = pipeline("summarization", max_length=100)
print(classifier(text))
Summary model
We will use a Hugging Face dataset, abisee/cnn_dailymail to summarize. You can write your own paragraph.
from transformers import AutoTokenizer, BartForConditionalGeneration
checkpoint = "facebook/bart-large-cnn"
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
from datasets import load_dataset
ds = load_dataset("abisee/cnn_dailymail", "1.0.0")
text = ds["train"][0]["article"]
inputs = tokenizer(text, max_length=100, return_tensors="pt")
# Generate Summary
summary_ids = model.generate(inputs["input_ids"], max_length=180,
min_length=40,
do_sample=False,
no_repeat_ngram_size=3)
print(tokenizer.batch_decode(summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
Harry Potter star Daniel Radcliffe turns 18 on Monday. He gains access to a reported $41.1 million fortune. Radcliffe says he has no plans to fritter his cash away on fast cars.
*"abisee/cnn_dailymail", "1.0.0" dataset from Hugging Face — licensed under the Apache 2.0 License.
*"facebook/bart-large-cnn" model from Hugging Face — licensed under the MIT License.
You can control how the model generates a summary. For example, you might set the minimum and maximum length of the output, as shown above.
Token Classification
Token Classification using a pipeline
Token classification models are used to identify entities in a text. What type of entities can a token classification model identify? It depends on the model. For example, dslim/bert-base-NER can identify four types of entities: location (LOC), organizations (ORG), person (PER), and miscellaneous (MISC).
from transformers import pipeline
classifier = pipeline("token-classification")
z = "I'm Alicia and I live in Milano."
d = classifier(z)
print(d)
for token in d:
print(token["word"], token["entity"])
[{'entity': 'B-PER', 'score': np.float32(0.9941089), 'index': 4, 'word': 'Alicia', 'start': 4, 'end': 10},
{'entity': 'B-LOC', 'score': np.float32(0.9950382), 'index': 9, 'word': 'Milano', 'start': 25, 'end': 31}]
Alicia B-PER
Milano B-LOC
Token Classification Model
We can load the token classification model directly. We will use the same text with a different model:
import torch
from transformers import BertTokenizerFast, BertForTokenClassification
# Load model and tokenizer
model_name = "dslim/bert-base-NER"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForTokenClassification.from_pretrained(model_name)
# Sample input
text = "I'm Alicia and I live in Milano."
# Tokenize
tokens = tokenizer(text, return_tensors="pt", truncation=True, is_split_into_words=False)
# Forward pass
with torch.no_grad():
outputs = model(**tokens)
logits = outputs.logits # shape: (batch_size, seq_len, num_labels)
# Get predicted class indices
predictions = torch.argmax(logits, dim=2)
# Convert IDs to label names
id2label = model.config.id2label
# Token IDs
input_ids = tokens["input_ids"][0]
predicted_labels = [id2label[label_id.item()] for label_id in predictions[0]]
print(predicted_labels)
['O', 'O', 'O', 'O', 'B-PER', 'O', 'O', 'O', 'O', 'B-LOC', 'O', 'O']
*"dslim/bert-base-NER" model from Hugging Face — licensed under the MIT License.
B refers to the beginning of the entity: B-PER - Beginning of a person's name right after another person's name,
B-LOC - Beginning of a location right after another location.
For more detailed information about the dslim/bert-base-NER model, please visit the dslim/bert-base-NER website.
Text Classification
Text Classification using a pipeline
Text classification models are designed to categorize text into predefined labels. They are widely used in tasks like sentiment analysis, spam detection, and topic labeling. In the example below, the model will determine whether a given text expresses a positive or negative sentiment.
from transformers import pipeline
text = "Your dog is super cute."
pipe = pipeline("text-classification")
result = pipe(text)
print(result[0]["label"])
POSITIVE
Text Classification Model
We will load the same model directly:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
inputs = tokenizer("Your dog is super cute.", return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]
print(model.config.id2label[predicted_class_id])
POSITIVE
*"distilbert-base-uncased-finetuned-sst-2-english" model from Hugging Face (Apache 2.0).
We used a simple text, but you can use the model for more complicated texts like reviews as well.