Serve NLP ML Models using Accelerated Inference API

HuggingFace hosts thousands of state-of-the-art NLP models. With only a few lines of code, you can deploy your NLP model and use it by making simple API requests using Accelerated Inference API.

The requests will accept specific parameters depending on the task (aka pipeline) for which the model is configured. When making requests to run a model, a set of API options enables you to specify model caching and loading behavior, and GPU inference (Startup or Enterprise plan required).

An example of BART (large-sized model), fine-tuned on CNN Daily Mail

The CNN / DailyMail Dataset contains over 300k unique news articles written by journalists at CNN and the Daily Mail. The current version supports both extractive and abstractive summarization. The original version was created for machine-reading and comprehension and question answering.

BART is a transformer encoder-encoder model (seq2seq) with a bidirectional encoder (BERT-like) and an autoregressive decoder (GPT-like). BART is pre-trained by (1) corrupting the text with an artificial noise function, and (2) learning a model to rebuild the original content.

BART performs particularly well when refined for text generation (e.g. summarisation, translation), it also works well for comprehension tasks (e.g. text classification, question answering). This particular checkpoint was refined on CNN Daily Mail, a large collection of text-summary pairs.

Input/output

Use in Tranformers

You can exploit this model using pipeline API for text summarization tasks:

Output:

Liana Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree" In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.

Don’t miss these tips!

We don’t spam! Read our privacy policy for more info.

Open chat
Powered by