4

ColdGANs: Taming Language GANs with Cautious Sampling Strategies

Training regimes based on Maximum Likelihood Estimation (MLE) suffer from known limitations, often leading to poorly generated text sequences. At the root of these limitations is the mismatch between training and inference, i.e. the so-called …

MLSUM: The Multilingual Summarization Corpus

We present MLSUM, the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, German, Spanish, Russian, Turkish. Together with …

Efficient Document Indexing Using Pivot Tree

We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bag-of-words tf-idf representation. One of the most used …

Learning Term Weights for Ad-hoc Retrieval

Most Information Retrieval models compute the relevance score of a document for a given query by summing term weights specific to a document or a query. Heuristic approaches, like TF-IDF, or probabilistic models, like BM25, are used to specify how a …

Parameterized Neural Network Language Models for Information Retrieval

Information Retrieval (IR) models need to deal with two difficult issues, vocabulary mismatch and term dependencies. Vocabulary mismatch corresponds to the difficulty of retrieving relevant documents that do not contain exact query terms but …

Diversification Based Static Index Pruning - Application to Temporal Collections

Nowadays, web archives preserve the history of large portions of the web. As medias are shifting from printed to digital editions, accessing these huge information sources is drawing increasingly more attention from national and international …

Beyond Cumulated Gain and Average Precision: Including Willingness and Expectation in the User Model

In this paper, we define a new metric family based on two concepts: The definition of the stopping criterion and the notion of satisfaction, where the former depends on the willingness and expectation of a user exploring search results. Both concepts …

The Kernel Quantum Probabilities (KQP) Library

In this document, we show how the different quantities necessary to compute kernel quantum probabilities can be computed. This document form the basis of the implementation of the Kernel Quantum Probability (KQP) open source project

Exploring a Multidimensional Representation of Documents and Queries (extended version)

In Information Retrieval (IR), whether implicitly or explicitly, queries and documents are often represented as vectors. However, it may be more beneficial to consider documents and/or queries as multidimensional objects. Our belief is this would …