Andrea Santilli
Andrea Santilli
Home
Publications
Experience
Contact
Light
Dark
Automatic
featured
Language Models are Injective and Hence Invertible
Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs …
Giorgos Nikolaou
,
Tommaso Mencattini
,
Donato Crisostomi
,
Andrea Santilli
,
Yannis Panagakis
,
Emanuele Rodolà
Cite
arXiv
Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results
Uncertainty Quantification (UQ) in Language Models (LMs) is key to improving their safety and reliability. Evaluations often use …
Andrea Santilli
,
Adam Golinski
,
Michael Kirchhof
,
Federico Danieli
,
Arno Blaas
,
Miao Xiong
,
Luca Zappella
,
Sinead Williamson
Cite
DOI
arXiv
Mergenetic: a Simple Evolutionary Model Merging Library
Model merging allows combining the capabilities of existing models into a new one—post hoc, without additional training. This has made …
Adrian Robert Minut
,
Tommaso Mencattini
,
Andrea Santilli
,
Donato Crisostomi
,
Emanuele Rodolà
Cite
DOI
arXiv
MERGE3: Efficient Evolutionary Merging on Consumer-grade GPUs
Evolutionary model merging enables the creation of high-performing multi-task models but remains computationally prohibitive for …
Tommaso Mencattini
,
Adrian Robert Minut
,
Donato Crisostomi
,
Andrea Santilli
,
Emanuele Rodolà
Cite
arXiv
Camoscio: An italian instruction-tuned llama
In recent years Large Language Models have improved the state of the art on several natural language processing tasks. However, their …
Andrea Santilli
,
Emanuele Rodolà
Cite
arXiv
Accelerating Transformer Inference for Translation via Parallel Decoding
Autoregressive decoding limits the efficiency of transformers for Machine Translation (MT). The community proposed specific network …
Andrea Santilli
,
Silvio Severino
,
Emilian Postolache
,
Valentino Maiorca
,
Michele Mancusi
,
Riccardo Marin
,
Emanuele Rodolà
PDF
Cite
arXiv
GitHub
Multimodal Neural Databases
The rise in loosely-structured data available through text, images, and other modalities has called for new ways of querying them. …
Giovanni Trappolini
,
Andrea Santilli
,
Emanuele Rodolà
,
Alon Halevy
,
Fabrizio Silvestri
PDF
Cite
arXiv
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their …
BIG-bench contributors including
,
Andrea Santilli
,
Antonio Norelli
,
Emanuele Rodolà
,
Giambattista Parascandolo
,
Giorgio Mariani
,
Luca Moschella
,
Simone Melzi
PDF
Cite
arXiv
Multitask Prompted Training Enables Zero-Shot Task Generalization
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., …
Victor Sanh
,
Albert Webson
,
Colin Raffel
,
Stephen H. Bach
,
BIG-Science contributors including
,
Andrea Santilli
PDF
Cite
ICLR 2022 (Oral)
KERMIT: Complementing Transformer Architectures with Encoders of Explicit Syntactic Interpretations
Syntactic parsers have dominated natural language understanding for decades. Yet, their syntactic interpretations are losing centrality …
Fabio Massimo Zanzotto
,
Andrea Santilli
,
Leonardo Ranaldi
,
Dario Onorati
,
Pierfrancesco Tommasino
,
Francesca Fallucchi
PDF
Cite
DOI
EMNLP 2020
Cite
×