ROUGE
ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is an evaluation metric for summarization tasks that counts how much the words and n-grams (consecutive words) overlap between a human-written reference summary and a generated summary. The more overlap, the higher the score.
Because it looks at surface-level word matching, paraphrasing with the same meaning tends to lower the score, and the metric that evaluates semantic closeness is BERTScore, so it is incorrect.