Pratyush Maini

📍 70% Cali. | 20% Pitt | 10% Delhi

~~
I like to observe. Look for patterns. Ponder over these generalizations. Try to refute them.
Or otherwise prove their validity. And re-imagine their applications in alternate spheres.
~~

I am a PhD candidate in the Machine Learning Department at Carnegie Mellon University, and a founding member of DatologyAI. I am advised by Prof. Zico Kolter and Prof. Zachary Lipton. My research goal is to make Machine Learning systems trustworthy to the extent that they can be safely and reliably deployed outside the comfort of our research labs. My research is partially supported by the OpenAI Cybersecurity Award. Previously,

Collaborate? I am always excited to exchange research perspectives and hop on to new research endeavors. If you are interested, reach out via email!

Bio: If you need a bio for a talk, please use this:

Talks

July 2025: MAGIC: Diffusion Model Memorization Auditing via Generative Image Compression @ MemFM, ICML’25
July 2025: Unlocking Post-hoc Dataset Inference with Synthetic Data @ Dig-BUGS, ICML’25
July 2025: Invited Talk MemFM @ICML’25: What Memorization Research Taught Me About Safety
July 2025: Panelist @ FPF Technologist Roundtable Series for Policymakers on “Machine Unlearning”
May 2025: Stanford NLP Seminar: What Memorization Research Taught Me About Safety
April 2025: Dataset Inference, Unlearning, and Memorization Report Cards @ OpenAI
March 2025: Safety Pretraining @ Schmidt Sciences Safety Convention
December 2024: Synthetic Pretraining @ Stanford (Ludwig Schmidt’s lab)
October 2024: Mentorship Panel at COLM @ Penn-MLR
September 2024: Guest Lecture on Data Curation @ CMU-10605
August 2024: LLM Dataset Inference @ Private-NLP, ACL 2024
August 2024: Rethinking Memorization with Adversarial Compression @ CONDA Workshop, ACL 2024 (Best Paper Talk)
August 2024: LLM Dataset Inference @ Google Privacy Seminar. Youtube Link
June 2024: Rephrasing The Web @ Princeton NLP Group
May 2024: Scaling Laws for Data Filtering @ Data Problems for Foundation Models, ICLR 2024. Video
May 2024: TOFU (Unlearning) @ Secure and Trustworthy LLM Workshop, ICLR 2024. Video
April 2024: TOFU @ Responsible AI Reading Group at AWS
April 2024: Scaling Laws for Data Filtering @ Bethge Lab, Tübingen
April 2024: Rephrasing The Web @ Together AI Research Group
March 2024: Rephrasing The Web @ Sambanova Research Group
February 2024: Can Neural Network Memorization be Localized @ ML PDG Karlsruhe
November 2023: Can Neural Network Memorization be Localized @ Ellis Reading Group on Mathematics of Deep Learning
October 2023: T-MARS @ ICCV 2023, Datacomp Workshop
September 2023: T-MARS @ Ludwig Schmidt’s lab
June 2022: Characterizing Datapoints via Second-split Forgetting @ SCIS ICML 2022

Publications

(15) Understanding Hallucinations in Diffusion Models through Mode Interpolation
Sumukh Aithal, Pratyush Maini, Zack Lipton, Zico Kolter
Conference on Neural Information Processing Systems (NeurIPS) 2024
DMLR @ International Conference on Machine Learning (ICML) 2024
TLDR | Paper | Citation

@inproceedings{aithal2024understanding,
  title={Understanding Hallucinations in Diffusion Models through Mode Interpolation},
  author={Aithal, Sumukh and Maini, Pratyush and Lipton, Zachary C and Kolter, J Zico},
  booktitle={Advances in Neural Information Processing Systems},
  year={2024}
}

(14) LLM Dataset Inference: Did you train on my dataset?
Pratyush Maini*, Hengrui Jia*, Nicolas Papernot, Adam Dziedzic
Conference on Neural Information Processing Systems (NeurIPS) 2024
Oral @ Private NLP Workshop, ACL 2024
TLDR | Paper | Citation

@inproceedings{
maini2024llm,
title={LLM Dataset Inference: Did you train on my dataset?},
author={Pratyush Maini and Hengrui Jia and Nicolas Papernot and Adam Dziedzic},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=Fr9d1UMc37}
}

(13) Rethinking LLM Memorization through the Lens of Adversarial Compression
Avi Schwarzschild*, Zhili Feng*, Pratyush Maini, Zack Lipton, Zico Kolter
Conference on Neural Information Processing Systems (NeurIPS) 2024
Best Paper @ Data Contamination Detection and Auditing Workshop, ACL 2024
TLDR | Paper | Citation

@inproceedings{
schwarzschild2024rethinking,
title={Rethinking {LLM} Memorization through the Lens of Adversarial Compression},
author={Avi Schwarzschild and Zhili Feng and Pratyush Maini and Zachary Chase Lipton and J Zico Kolter},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=KFmRMvzAZy}
}

(12) TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini*, Zhili Feng*, Avi Schwarzschild*, Zack Lipton, Zico Kolter
Set-LLM @ ICLR 2024
Conference on Language Modeling (COLM) 2024
TLDR | Paper | Website | Citation

@inproceedings{maini2024tofu,
  title={TOFU: A Task of Fictitious Unlearning for LLMs},
  author={Maini, Pratyush and Feng, Zhili and Schwarzschild, Avi and Lipton, Zachary C and Kolter, J Zico},
  booktitle={Conference on Language Modeling},
  year={2024}
}

(11) Scaling Laws for Data Filtering—Data Curation cannot be Compute Agnostic
Sachin Goyal*, Pratyush Maini*, Zachary C. Lipton, Aditi Raghunathan, J. Zico Kolter
Conference on Computer Vision and Pattern Recognition (CVPR) 2024
Best Paper @ Data Problems for Foundation Models (ICLR) 2024 Workshop
TLDR | Paper | Citation

@inproceedings{goyal2024scaling,
  title={Scaling Laws for Data Filtering—Data Curation cannot be Compute Agnostic},
  author={Goyal, Sachin and Maini, Pratyush and Lipton, Zachary C and Raghunathan, Aditi and Kolter, J Zico},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

(10) Rephrasing the Web: A Recipe for Compute & Data-Efficient Language Modeling
Pratyush Maini*, Skyler Seto*, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly
Association for Computational Linguistics (ACL) 2024
TLDR | Paper | Citation

@inproceedings{maini2024rephrasing,
  title={Rephrasing the Web: A Recipe for Compute \& Data-Efficient Language Modeling},
  author={Maini, Pratyush and Seto, Skyler and Bai, He and Grangier, David and Zhang, Yizhe and Jaitly, Navdeep},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics},
  year={2024}
}

(9) Can Neural Network Memorization be Localized?
Pratyush Maini, Michael Curtis Mozer, Hanie Sedghi, Zachary Chase Lipton, J Zico Kolter, Chiyuan Zhang
International Conference on Machine Learning (ICML) 2023
TLDR | Paper | Website | Slides | Poster | Citation

@inproceedings{maini2023memorization,
  title={Can Neural Network Memorization Be Localized?},
  author={Maini, Pratyush and Mozer, Michael C and Sedghi, Hanie and Lipton, Zachary C and Kolter, J Zico and Zhang, Chiyuan},
  booktitle={Proceedings of the 40th International Conference on Machine Learning},
  year={2023}
}

(8) T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
Pratyush Maini*, Sachin Goyal*, Zachary C. Lipton, Zico Kolter, Aditi Raghunathan
International Conference on Learning Representations (ICLR) 2024
DMLR @ International Conference on Machine Learning (ICML) 2023
Datacomp Workshop @ ICCV 2023
TLDR | Paper | Website | Poster | Citation

@inproceedings{maini2024tmars,
  title={T-MARS: Improving Visual Representations by Circumventing Text Feature Learning},
  author={Maini, Pratyush and Goyal, Sachin and Lipton, Zachary C and Kolter, J Zico and Raghunathan, Aditi},
  booktitle={International Conference on Learning Representations},
  year={2024}
}

(7) Model-tuning Via Prompts Makes NLP Models Adversarially Robust
Mrigank Raman*, Pratyush Maini*, Zico Kolter, Zachary C. Lipton, Danish Pruthi
Empirical Methods in Natural Language Processing (EMNLP) 2023
AdvML-Frontiers @ International Conference on Machine Learning (ICML) 2023
TLDR | Paper | Slides | Poster | Citation

@inproceedings{raman2023modeltuning,
  title={Model-tuning Via Prompts Makes NLP Models Adversarially Robust},
  author={Raman, Mrigank and Maini, Pratyush and Kolter, J Zico and Lipton, Zachary C and Pruthi, Danish},
  booktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
  year={2023}
}

(6) Characterizing Datapoints via Second-Split Forgetting
Pratyush Maini, Saurabh Garg, Zachary C. Lipton, Zico Kolter
Advances in Neural Information Processing Systems (NeurIPS) 2022
SCIS @ International Conference on Machine Learning (ICML) 2022
TLDR | Paper | Slides | Poster | Citation

@inproceedings{
	maini2022characterizing,
	title={Characterizing Datapoints via Second-Split Forgetting},
	author={Maini, Pratyush and Garg, Saurabh and Lipton, Zachary C and Kolter, J Zico},
	booktitle={Advances in Neural Information Processing Systems},
	year={2022},
	url={https://openreview.net/forum?id=bCdztvpaEUG}
}

(5) Dataset Inference: Ownership Resolution in Machine Learning
Pratyush Maini, Mohammad Yaghini, Nicolas Papernot
International Conference on Learning Representations (ICLR) 2021
Privacy Preserving Machine Learning (PPML) Workshop at NeurIPS 2020
Workshop on Dataset Curation and Security (WDCS) at NeurIPS 2020
TLDR | Paper | Video | Slides | Poster | Citation

@inproceedings{maini2021dataset,
	title={Dataset Inference: Ownership Resolution in Machine Learning},
	author={Maini, Pratyush and Yaghini, Mohammad and Papernot, Nicolas},
	booktitle={International Conference on Learning Representations},
	year={2021},
	url={https://openreview.net/forum?id=hvdKKV2yt7T},
	note={Spotlight at ICLR 2021}
}

(4) Data-Free Model Extraction
Jean-Baptiste Truong*, Pratyush Maini*, Robert Walls, Nicolas Papernot
Conference on Computer Vision and Pattern Recognition (CVPR) 2021
TLDR | Paper | Code | Poster | Citation

@inproceedings{truong2021data,
	title={Data-Free Model Extraction},
	author={Truong, Jean-Baptiste and Maini, Pratyush and Walls, Robert J and Papernot, Nicolas},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
	year={2021},
	url={https://arxiv.org/abs/2011.14779}
}

(3) Perturbation Type Categorization for Multiple $\ell_p$ Bounded Adversarial Robustness
Pratyush Maini, Xinyun Chen, Bo Li, Dawn Song
Conference on Uncertainty in Artificial Intelligence (UAI) 2022
ICML Workshop on Uncertainty and Robustness in Deep Learning
TLDR | Paper | Citation

@inproceedings{maini2022perturbation,
  title={Perturbation Type Categorization for Multiple $\ell_p$ Bounded Adversarial Robustness},
  author={Maini, Pratyush and Chen, Xinyun and Li, Bo and Song, Dawn},
  booktitle={Proceedings of the 38th Conference on Uncertainty in Artificial Intelligence},
  year={2022},
  url={https://openreview.net/forum?id=Oe2XI-Aft-k}
}

@inproceedings{maini2020adversarial,
	title={Adversarial Robustness Against the Union of Multiple Perturbation Models}, 
	author={Maini, Pratyush and Wong, Eric and Kolter, J Zico},
	booktitle={Proceedings of the 37th International Conference on Machine Learning},
	year={2020},
	url={https://arxiv.org/abs/1909.04068}
}

@inproceedings{maini2020pool,
	title={Why and When Should You Pool? Analyzing Pooling in Recurrent Architectures},
	author={Maini, Pratyush and Kolluru, Keshav and Pruthi, Danish and Mausam},
	booktitle={Findings of the Association for Computational Linguistics: EMNLP 2020},
	year={2020},
	url={https://arxiv.org/abs/2005.00159},
	note={Also presented at BlackBoxNLP'20}
}

* = equal contribution

students mentored

I have been extremely fortunate and privileged to collaborate with some insanely talented and passionate students on their journey to upskill themselves.

Mrigank Raman, MSML (2022) → Abridge
Xuchen Gong, MSML (2023) → PhD at UChicago
Sumukh Aithal, MSML (2024) → Mistral
Vineeth Kada, MSML (2024) → Anthropic
Vishruth Veerendranath, MIIS (2024) → Snowflake
Vishwa Shah, MIIS (2024) → Apple
Anmol Mekala, MS @ UMass (2025) (ongoing)
Vineeth Dorna, MS @ UMass (2025) (ongoing)
Gunjan Dhanuka, MSCS (2025) (ongoing)
Bihe Zhao, PhD at CISPA (ongoing)

latest posts

Nov 27, 2024	Peeking Behind Closed Doors: Risks of LLM Evaluation by Private Data Curators
Nov 26, 2024	Reassessing EMNLP 2024’s Best Paper: Does Divergence-Based Calibration for Membership Inference Attacks Hold Up? \| Anshuman Suri
Sep 14, 2023	Phi-1.5 Model: A Case of Comparing Apples to Oranges?