An image or a robot turning a hand crank like on an old model t ford on the side of a computer, and works pour out of the screen. This is supposed to be a representation of generative AI. Create the image in the style steampunk. It should have a horizontal orientation.

Generative Engine Optimization: Paper Analysis

For those who caught my interview with Authoritas CEO, Laurence O’Toole on SGE (this was prior to the rollout of AI Overviews), you’ll have heard a reference to a paper written on Generative Engine Optimization, published by the folks from Princeton, Georgia Tech, IIT Dehli and The Allen Institute for AI in November of 2023 and updated in May of 2024. The changes mainly involved the including of testing, as well as looking into possible benefits using multiple GEO strategies).

In this post we’ll be walking you through what’s included in the paper, a couple of flaws I believe they made in their interpretation, and hopefully leave you walking away with some ideas about how to deal with what’s coming.

GEO: Generative Engine Optimization

Generative Engine Optimization (GEO) refers to the art and science of creating or adjusting content with the goal of having the information from within that content used and referenced by generative engines like GPT, Gemini and others.

The authors of the paper point out that the technology:

“… can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them using LLMs.”

Now, one thing I’d like to point out is that I feel the claim, “… rapidly replacing traditional search engines like Google and Bing.” is incredibly generous and I would argue that the statement would be better worded, “… is rapidly replacing traditional search engines results on engines like Google and Bing.

This is basically called out as the authors write:

“The recent success of large language models (LLMs) however has paved the way for better systems like BingChat, Google’s SGE, and that combine the strength of conventional search engines with the flexibility of generative models.”

That said, there is a lot in the paper that sheds light on how generative engines select the content to produce, and the sites to cite. While the field is rapidly changing, and the paper was not written using Google or even Bing, it sets a good baseline for some of the strategies to explore as our starting point as we all seek to rank in this very new environment.

The purpose of the article (according to the authors)

A declared purpose of performing the research and writing the paper, according to the authors, was that,

“While this shift significantly improves user utility and generative search engine traffic, it poses a huge challenge for the third stakeholder – website and content creators.”

They expand on this to discuss one of the major concerns of website owners, that generative search reduces the need to click to the publisher’s website to access information.

I think they’re mistaken in a very odd way. After defining this problem and the amplification of it via the black-box nature of generative engines, they suggest that a generative engine optimization framework empowers businesses and web publishers in a way unavailable today. I would argue that today is it traditional SEO and tomorrow it will be SGEO but publishers and businesses will be in the same spot, those that can afford professionals will tend to do better than those that cannot.

The “democracy” will fade as fast as it did when search engines were born.

What is Generative Engine Optimization?

Generative Engine Optimization is the art and science of structuring websites and web content in such a way that the content is weighted favorably to be used and cited by generative search engines.

Below we’ll be exploring what the authors of the paper discovered, and what techniques are most successful in attaining visibility on a generative search engine.

To illustrate what success looks like before we dive in, here is an example and one of the figures from their paper:

Figure 1: Our proposed Generative Engine Optimization (GEO) method optimizes websites to boost their visibility in Generative Engine responses. GEO’s black-box optimization framework then enables the website owner of the pizza website, which lacked visibility originally, to optimize their website to increase visibility under Generative Engines. Further, GEO’s general framework allows content creators to define and optimize their custom visibility metrics, giving them greater control in this new emerging paradigm.

We see that they were trying to move the third citation into the first position, and did so successfully.

For anyone that has used ChatGPT or Gemini, you’ll likely have encountered hallucinations. This is where the LLM, not having an answer or information related to the topic request, simply makes up an answer. Citations then, are critical for users and user trust. And therein lies the opportunity the authors are exploring.

The authors note that,

“With generative engines rapidly emerging as the primary information delivery paradigm and SEO not directly applicable, new techniques are needed.”

Interestingly, this matches the conclusions that Authoritas found in their study, in which they discovered that 93.8% of generative URLs in SGE do not match any of the URLs from the first page of the search results.

93.8% of the generative URLs do not match any URL from the first page of the organic search results.
Source: Research Study – The Impact of Google’s Search Generative Experience on organic rankings

Measuring visibility on a generative search engine

One of the first difficulties with generative engine optimization involves defining how to measure success. The authors illustrate the difference between measuring visibility in traditional search results vs generative ones with the figure:

Figure 3: Ranking and Visibility Metrics are straightforward in traditional search engines, which list website sources in ranked order with verbatim content. However, Generative Engines generates rich and structured responses, and often embed citations in a single block interleaved with each other. This makes the notion of ranking and visibility highly nuanced and multi-faceted. Further, unlike search engines, where significant research has been conducted on improving website visibility, it remains unclear how to optimize visibility in generative engine responses. To address these challenges, our black-box optimization framework proposes a series of well-designed impression metrics that creators can use to gauge and optimize their website’s performance and also allows the creator to define their impression metrics.

The authors make claims on how to measure traditional search results that I believe are over-simplified, such as:

“In SEO, a website’s impression (or visibility) is determined by its average ranking over a range of queries.”

I would argue that visibility needs to take into consideration the more nuanced SERP structures such as ad and featured snippet presence, the size of the browser windows (i.e. is the result above-the-fold?), image and video boxes, and more, but overall their conclusions are directionally correct, and while they might have done well to pull in some SEOs to assist with their wording and understanding, I do not believe their misses impact their conclusions to a significant degree..

Impressions for generative engines

The authors assert that in a generated result:

“a higher word count correlates with the source playing a more important part in the answer, and thus the user gets higher exposure to that source. However, since “Word Count” is not impacted by the ranking of the citations (whether it appears first, for example), we propose a position-adjusted count that reduces the weight …”

Essentially, what the authors state is that the more words that are associated with your cititation, the more visibility it will have. They point out, however, that the word count does not necessarily correlate to the position in the generated text, and that an earlier position is also an indicator of visibility. The direction they propose is a system which considers first the word count associated with a citation, and then adjust the visibility metric based on the position the citation appears.

Rightfully, they expand on this further in an attempt to address the more subjective aspects of a generated search results. The subjective metric they use, takes into account many factors of the result including:

  1. relevance of the cited material to the user query,
  2. influence of the citation, which evaluates the degree to which the generated response depends on the citation,
  3. uniqueness of the material presented by a citation,
  4. subjective position, which measures how prominently the source is positioned from the user’s perspective,
  5. subjective count, which measures the amount of content presented from the citation as perceived by the user upon reading the citation,
  6. probability of clicking the citation, and
  7. diversity in the material presented.

They evaluate these sub-metrics using G-Eval.

Methods for generative engine optimization

The authors tested nine techniques that they believed could be used to influence the inclusion of a web page in the citations of a result produces by a generative search engine. These methods were:

  1. Authoritative tone: Modify the text style of the source content to be more persuasive while making authoritative claims,
  2. Keyword Stuffing: Modify the content to include more keywords from the query, as would be expected in classical SEO optimization.
  3. Include Statistics: Modify the content to include quantitative statistics instead of qualitative discussion.
  4. Cite Sources &
  5. Quotation Addition: Add relevant citations and quotations from credible sources.
  6. Easy-to-Understand: Simplify the language of website.
  7. Fluency Optimization: Improve the fluency of website text.
  8. Unique Words &
  9. Technical Terms: Add unique and technical terms.

The experimental setup

I’m only going to briefly discuss how the experiment was set up. I invite you to read the paper for more details, as it is interesting but I feel a basic overview is enough for the purposes of this post.

The Cliff Notes are:

  • While the techniques tested would ideally be manually adjusted by websites owners, for the purpose of the experiment the authors used GPT-3.5 to convert the source text into the modified text based on the optimization technique being analyzed.
  • In creating their generative engine, the authors used a 2-step setup wherein they first fetch the top 5 Google results for the query, and then use an LLM to generate a response using gpt3.5-turbo.
  • They evaluates the same methods on
  • As there was no publicly available dataset of generative engine-related queries, they created GEO-BENCH consisting of 10k queries “from multiple sources, repurposes for generative engines”. Details in the paper.
  • The methods were evaluated based on the Relative Improvement in Impression.
  • The modified content to evaluate was generated by applying one of the methods noted above to be evaluated, against the source. Which method was randomly selected.

The results of various generative engine optimization techniques

Let’s begin with a graph of the results before we discuss them. This is a graph of how the methods perform on GEO-BENCH.

Table 1: Performance improvement of GEO methods on GEO-BENCH. Performance Measured on Two metrics and their sub-metrics. Compared to the baselines simple methods such as Keyword Stuffing traditionally used in SEO do not perform very well. However, our proposed methods such as Statistics Addition and Quotation Addition show strong performance improvements across all metrics considered. The best performing methods improve upon baseline by 41% and 29% on Position-Adjusted Word Count and Subjective Impression respectively. For readability, Subjective Impression scores are normalized with respect to Position-Adjusted Word Count resulting in baseline scores being similar across the metrics

We can see that of the nine methods tested, only two had no impact. Thank goodness we didn’t see keywords stuffing in the list of winners or I would weep for what’s to come.

Worth pointing out here however is the lack of knowledge about modern SEO exemplified by the Table 1 caption including, “… simple methods such as Keyword Stuffing traditionally used in SEO …”

You will notice that in the table above, the results are broken out into two sections. Position-adjusted word count and Subjective impression. To address the fact that generative engines do not display results in a easily-ranked way like a traditional search engine, they approach rank and visibility in two different ways.

These authors define them as:

  • Position-Adjusted Word Count: considers both the word count and the position of the citation in the generative engine’s response. In essence, the authors assert that the higher the word count associate with a citation, the higher its visibility. They assign a score based on this, and adjust it for the position it holds (i.e. the first citation holds more weight than the second, and so on).
  • Subjective Impression: incorporates multiple subjective factors to compute an overall impression score. The factors used are:
    1.) relevance of the cited material to the user query,
    2.) influence of the citation, which evaluates the degree to which the generated response depends on the citation,
    3.) uniqueness of the material presented by a citation,
    4.) subjective position, which measures how prominently it is positioned from the user’s perspective,
    5.) subjective count, which measures the amount of content presented from the citation as perceived by the user upon reading the citation,
    6.) probability of clicking the citation, and
    7.) diversity in the material presented.

Across both metrics, the high-performing methods beat the benchmarks.

“Cite Sources, Quotation Addition, and Statistics Addition, achieved a relative improvement of 30- 40% on the Position-Adjusted Word Count metric and 15-30% on the Subjective Impression metric compared to the baseline.”

It’s also interesting to notice that stylistic choices like fluency and easy-of-understanding also positively influenced the results.

While the authors assert that, “This suggests that Generative Engines value not only content but also information presentation.” I would argue that a simpler conclusion may be that the generative engines can simply make use language that is simpler and structured more commonly. That it’s not a factor unto itself, but simply a matter of a limitation of current LLMs.

Of course, one might say I sound like a Google rep now. “It’s not a ranking factor. It just acts like one.”

For context, some examples they gave, illustrating content deletion and additions in their GEO optimization efforts were:

Table 4: Representative examples of GEO methods optimizing source website. Additions are marked in green and Deletions in red. Without adding substantial new information, GEO significantly increase visibility.

Domain-specific generative engine optimization

Note: by “domain” we are referring to ranking in specific sectors and areas, not the website domain or TLD.

The authors found that specific techniques had greater impact in some sectors than others. For example:

  • Authoritativeness improved performance in debate-style questions and in the historical domain.
  • Citing sources benefited queries related to factual information.
  • The domains of law and government and opinion-based questions benefitted from relevant statistics.
  • The people and society, explanation and history domains gained increased exposure with the inclusion of quotes.

For optimizing in specific domains, the authors suggest (and it’s logically concluded) that site owners should consider and test what will likely work best for the query types they are targeting.

GEO in the wild

As noted above, the authors tested the same techniques on they attained similar results:

Table 5: Absolute impression metrics of GEO methods on GEO-bench with as GE. While SEO methods such as Keyword Stuffing perform poorly, our proposed GEO methods generalize well to multiple generative engines significantly improve content visibility.

The results overall were:

  • Quotation Addition performs best in Position-Adjusted Word Count with a 22% improvement over the baseline.
  • Cite Sources, Statistics Addition show improvements of up to 9% and 37% on the two metrics.
  • Keyword Stuffing performs 10% worse than the baseline.

The authors further looked into whether using multiple techniques worked better than a single one. Unsurprisingly the answer was generally “yes”.

Figure 4: Relative Improvement on using combination of GEO strategies. Using Fluency Optimization and Statistics Addition in conjunction results in maximum performance. The rightmost column shows using Fluency Optimization with other strategies is most beneficial.

The combinations differ in impact, and I suspect how that works varies greatly by domain and query type.

My overall take

As I’m sure is obvious by now, I believe the takeaways to be worth investigation and valuable to understand. Many of the results make sense, and I’ve found with SEO that things that make sense tend to work over a longer period of time than things that do not. Perhaps because they’re more difficult to game over the long haul. You might be able to play around with crap citations or auto-generated ones through RAG systems, but so will everyone else. Those that cite the right things will win overall. Just like links in traditional SEO.

One important area that was outside the scope of the paper was the idea of working on 3rd party sites. The paper covered onsite optimization strategies, and that is important, but did not discuss this other aspect of generative engine optimization.

A consideration you will want to keep in mind, is that generative engines are drawing from multiple sources to determine what content to generate in the first place. To be cited you need to provide the authoritative answer to generated content. You may need to have your information/answer available on multiple website.

For example, if you want to be cited as one of the best restaurants in Seattle, it will likely not suffice to deploy some of these techniques on your site. For that, you’ll undoubtedly need to be referenced on multiple sites discussing the best restaurants. You need to be positioned such that when the engine looks at all the verbiage it was trained on and starts the answer,

“While what makes a restaurant the best in Seattle is subjective, ones that are frequently considered outstanding by people who visit and review them are ….”

you are included in the list that follows.

So not only do you have to think of new onsite optimization strategies, but offsite strategies as well.

But that’s a topic for a separate article and hopefully a new paper.