E-GEO is a benchmark that measures how well AI-powered rewriting systems improve a product's ranking on generative shopping engines. Given a product, a rewriter generates an optimized description; five independent LLM judges then rank a pool of products and we measure where the target product ranks before and after rewriting.
The table below compares rewriting systems across all five evaluator models. Each row is one rewriter — a model + prompt strategy, a fine-tuned model, or an agent — and each column is one evaluator. Cells show mean ranking improvement over 2,000 test products; larger is better. The Average column is the unweighted mean across all evaluators. Use the weight inputs to re-rank by a custom evaluator mix, click any column header to sort, and visit Rewriters to learn what each entry actually is.
Leaderboard
Mean ranking improvement · 5 LLM evaluators · n=2,000
Paper & Code
@misc{bagga2025egeo,
title = {E-GEO: A Testbed for Generative Engine Optimization in E-Commerce},
author = {Puneet S. Bagga and Vivek F. Farias and Tamar Korkotashvili and Tianyi Peng and Yuhang Wu},
year = {2025},
eprint = {2511.20867},
archivePrefix = {arXiv},
primaryClass = {cs.IR},
url = {https://arxiv.org/abs/2511.20867}
}