Towards Effective and Efficient Sparse Neural Information Retrieval

Apr 29, 2024·

Thibault Formal

Carlos Lassance

Benjamin Piwowarski

Stéphane Clinchant

· 0 min read

Cite DOI

Abstract

Sparse representation learning based on Pre-trained Language Models has seen a growing interest in Information Retrieval. Such approaches can take advantage of the proven efficiency of inverted indexes, and inherit desirable IR priors such as explicit lexical matching or some degree of interpretability. In this work, we thoroughly develop the framework of sparse representation learning in IR, which unifies term weighting and expansion in a supervised setting. We then build on SPLADE – a sparse expansion-based retriever – and show to which extent it is able to benefit from the same training improvements as dense bi-encoders, by studying the effect of distillation, hard negative mining as well as the Pre-trained Language Model’s initialization on its effectiveness – leading to state-of-the-art results in both in- and out-of-domain evaluation settings (SPLADE++). We furthermore propose efficiency improvements, allowing us to reach latency requirements on par with traditional keyword-based approaches (Efficient-SPLADE).

Type

Journal article

Publication

ACM Transactions on Information Systems

Last updated on Aug 26, 2024

Neural Information Retrieval

← Learning Relational Decomposition of Queries for Question Answering from Tables Aug 1, 2024

What Makes Multimodal In-Context Learning Work? Jan 1, 2024 →