featured

Language Models are Injective and Hence Invertible

Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs …

Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodolà

Language Models are Injective and Hence Invertible

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results

Uncertainty Quantification (UQ) in Language Models (LMs) is key to improving their safety and reliability. Evaluations often use …

Andrea Santilli, Adam Golinski, Michael Kirchhof, Federico Danieli, Arno Blaas, Miao Xiong, Luca Zappella, Sinead Williamson

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results

Mergenetic: a Simple Evolutionary Model Merging Library

Model merging allows combining the capabilities of existing models into a new one—post hoc, without additional training. This has made …

Adrian Robert Minut, Tommaso Mencattini, Andrea Santilli, Donato Crisostomi, Emanuele Rodolà

Mergenetic: a Simple Evolutionary Model Merging Library

MERGE3: Efficient Evolutionary Merging on Consumer-grade GPUs

Evolutionary model merging enables the creation of high-performing multi-task models but remains computationally prohibitive for …

Tommaso Mencattini, Adrian Robert Minut, Donato Crisostomi, Andrea Santilli, Emanuele Rodolà

MERGE3: Efficient Evolutionary Merging on Consumer-grade GPUs

Camoscio: An italian instruction-tuned llama

In recent years Large Language Models have improved the state of the art on several natural language processing tasks. However, their …

Andrea Santilli, Emanuele Rodolà

Camoscio: An italian instruction-tuned llama

Accelerating Transformer Inference for Translation via Parallel Decoding

Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network …

Andrea Santilli, Silvio Severino, Emilian Postolache, Valentino Maiorca, Michele Mancusi, Riccardo Marin, Emanuele Rodolà

Multimodal Neural Databases

The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. …

Giovanni Trappolini, Andrea Santilli, Emanuele Rodolà, Alon Halevy, Fabrizio Silvestri

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their …

BIG-bench contributors including, Andrea Santilli, Antonio Norelli, Emanuele Rodolà, Giambattista Parascandolo, Giorgio Mariani, Luca Moschella, Simone Melzi

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Multitask Prompted Training Enables Zero-Shot Task Generalization

Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., …

Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, BIG-Science contributors including, Andrea Santilli

Multitask Prompted Training Enables Zero-Shot Task Generalization

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations

Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality …

Fabio Massimo Zanzotto, Andrea Santilli, Leonardo Ranaldi, Dario Onorati, Pierfrancesco Tommasino, Francesca Fallucchi

KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations