-
Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators
A critical examination of the risks and challenges posed by private evaluators (for example ScaleAI) in the LLM landscape, highlighting financial incentives, conflicts of interest, and prevalence of evaluation biases even when acting in good faith.
-
Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for Membership Inference Attacks Hold Up? | Anshuman Suri
TL;DR: No.
A critical analysis of the EMNLP Best Paper proposing a divergence-based calibration for Membership Inference Attacks (MIAs). We explore its experimental shortcomings, issues with temporally shifted benchmarks, and what this means for machine learning awards. -
Phi-1.5 Model: A Case of Comparing Apples to Oranges?
-
Displaying External Posts on Your al-folio Blog