E-GEO Benchmark

E-GEO is a benchmark that measures how well AI-powered rewriting systems improve a product's ranking on generative shopping engines. Given a product, a rewriter generates an optimized description; five independent LLM judges then rank a pool of products and we measure where the target product ranks before and after rewriting.

The table below compares rewriting systems across all five evaluator models. Each row is one rewriter — a model + prompt strategy, a fine-tuned model, or an agent — and each column is one evaluator. Cells show mean ranking improvement over 2,000 test products; larger is better. The Average column is the unweighted mean across all evaluators. Use the weight inputs to re-rank by a custom evaluator mix, click any column header to sort, and visit Rewriters to learn what each entry actually is.

Leaderboard

Mean ranking improvement · 5 LLM evaluators · n=2,000

Rank

Rewriter

Evaluator weight

Claude Sonnet 4.5 + Simple Paper

$0.061

-0.20

+0.72

+0.45

+0.77

+1.12

+0.57

GPT-5 + Simple Paper

$0.014

+0.41

+0.65

+0.13

+0.87

+0.72

+0.56

Llama 4 Maverick + Simple Paper

$0.001

+0.35

+0.22

+0.15

+0.31

+0.75

+0.36

GPT-4.1 + unique (optimized) Paper

$0.009

+0.04

+0.28

+0.14

+0.32

+0.67

+0.29

Gemini 3 Flash Preview + Simple Paper

$0.007

-0.26

+0.23

+0.59

+1.02

+0.26

GPT-4.1 + competitive (optimized) Paper

$0.009

+0.11

+0.40

+0.04

+0.24

+0.53

+0.26

GPT-4.1 + advertisement (optimized) Paper

$0.009

+0.03

-0.06

+0.19

+0.32

+0.54

+0.20

GPT-4.1 + FAQ (optimized) Paper

$0.009

-0.03

+0.18

+0.02

+0.22

+0.50

+0.18

DeepSeek V3.2 + Simple Paper

$0.002

-0.05

+0.14

-0.09

+0.30

+0.49

+0.16

GPT-4.1 + Simple Paper

$0.009

-0.07

-0.22

-0.27

-0.05

+0.44

-0.03

GPT-4o-mini + Simple Paper

$0.001

-0.40

-0.49

-0.44

-0.41

-0.07

-0.36

Click a column header to sort · SE = standard error

Paper & Code

Paper · arXiv:2511.20867 Code · GitHub Dataset · Hugging Face

BibTeX

@misc{bagga2025egeo,
  title         = {E-GEO: A Testbed for Generative Engine Optimization in E-Commerce},
  author        = {Puneet S. Bagga and Vivek F. Farias and Tamar Korkotashvili and Tianyi Peng and Yuhang Wu},
  year          = {2025},
  eprint        = {2511.20867},
  archivePrefix = {arXiv},
  primaryClass  = {cs.IR},
  url           = {https://arxiv.org/abs/2511.20867}
}