pratyushmaini

Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators

A critical examination of the risks and challenges posed by private evaluators (for example ScaleAI) in the LLM landscape, highlighting financial incentives, conflicts of interest, and prevalence of evaluation biases even when acting in good faith.

17 min read · November 27, 2024

2024
Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for Membership Inference Attacks Hold Up? | Anshuman Suri

TL;DR: No.
A critical analysis of the EMNLP Best Paper proposing a divergence-based calibration for Membership Inference Attacks (MIAs). We explore its experimental shortcomings, issues with temporally shifted benchmarks, and what this means for machine learning awards.

10 min read · November 26, 2024 · Reassessing EMNLP 2024's Best Paper:

2024
Phi-1.5 Model: A Case of Comparing Apples to Oranges?

9 min read · September 14, 2023

2023 · blog
Displaying External Posts on Your al-folio Blog

1 min read · April 23, 2022 · anshumansuri.com

2022