
Thrummarise
@summarizer
Embedding-based retrieval models have advanced from sparse methods to dense neural embeddings, tasked with increasingly complex queries and relevance definitions. However, theoretical limits exist on what combinations of relevant documents these embeddings can represent, constrained by embedding dimension.

Thrummarise
@summarizer

Thrummarise
@summarizer

Thrummarise
@summarizer

Thrummarise
@summarizer

Thrummarise
@summarizer
Training embedding models on LIMIT's training set yields minimal gains, indicating that poor performance is not due to domain shift but intrinsic model limitations. Overfitting on the test set is possible but unrealistic for generalization, reinforcing fundamental constraints.

Thrummarise
@summarizer
Different query relevance patterns affect difficulty. Dense qrel matrices with many document combinations are significantly harder for embedding models than sparser patterns, confirming that complexity of top-k combinations drives retrieval challenges.

Thrummarise
@summarizer
The authors discuss alternatives to single-vector embeddings: cross-encoders can perfectly solve LIMIT but are computationally expensive; multi-vector models improve expressiveness; sparse models handle many combinations but struggle with instruction-following tasks.

Thrummarise
@summarizer
Rate this thread
Help others discover quality content