Savvy Search Marketer Series: Ad Copy Testing Tips, Part III

In the first part of the Savvy Search Marketer series on Ad Copy Testing Tips, we focused primarily on the different metrics you use, or could use, to help manage and evaluate your ad testing.  In the second part, we focused more on the ad testing experiment itself and some tips and tricks to avoid being "Fooled by Randomness" (great book, by the way, but one of my favorite authors, Nassim Taleb).

For this, the third and final installment in this series on Ad Copy Testing Tips, we'll continue on with the examples presented in the previous posts, so if you haven't read them yet, please do so now to ensure you have the context needed to follow along in this wrap up.

We’ll start off with a segue on sample sizes – let’s take the first example from Part 2 in this series and add slightly more complexity by looking at the Ad-KW combinations to see if we can determine if this could be the driver of the CTR differences between ads.


  • First off, notice the large differences in impression distribution for the same keywords against the two ads.  The Even Ad Rotation only evenly distributes at the ad group level, and in fact, only evenly distributes the ads in the auction.  This means advertisers can’t control the distribution of an ad to each keyword (or against any other variable).
  • When you compare the individual keywords for each ad (i.e. KW1 Ad1 vs Ad2), we have statistical significance for each KW – but recall we did not have significance at the ad group level.  If we were to evenly distribute impressions by ad and keyword, we would have had ad group level significance as well (2.45% vs 3.58% CTR).  This demonstrates the potential for keyword/ad group organization to confound results that otherwise would have been significant and actionable.


Takeaway: Many variables are mostly, if not entirely, out of the control of the advertiser when doing a test and need to at least be considered as potentially impacting the validity of any test (including those that deliver insignificant results). 

  • Luckily, keyword impression distribution is one in which advertisers can control to a certain extent by effectively organizing keywords into ad groups.  An ad test can thus be just as effective in helping identify keywords that should be organized together.  When the same ad has similar results for the same keyword (i.e. no statistical significance), this can be a signal to continue grouping those keywords together. Conversely, those with different performance should be considered for restructuring.
  • Some of the other variables, many of which are visible through performance reporting, include:
    • ML vs SB Placement – the extent to which ads are distributed unevenly between the ML and SB has a significant impact on results – especially when the metric being evaluated is CTR.  Since the ML has CTR’s 10X-50X higher than the sidebar, it only takes a slightly uneven distribution to create very different CTR results.  Ensure minimal bid changes (if not eliminating them all together) to try and control for this.
    • Match Types – the more an ad group is exposed to Broad Match the more likely that the variance sin query distribution are contributing to any CTR difference.  Consider only ad testing with Exact Match and then extrapolating those learnings, within reason, to other match types.
    • User location, demographics, etc – given the variation of user behavior and personalization algorithms to show ads differently for different types of users, distribution of impressions to different users can ad uncertainty to ad tests.  Consider randomly choosing different geo- or demo-graphic targets to test with to attempt to control for this.
    • Marketplace differences – at any given time there are dozens of experiments occurring in the Bing Ads marketplace.  These experiments can affect ranking and placement and thus impact keyword/ad combinations differently.  Again, the smaller the sample size the more likely the distribution of impressions against these sets of traffic can invalidate the results of a test.

On a regular basis, Bing Ads provides guidance on what types of ads to write depending on what type of advertiser you are or keywords you bid on.  Over the last year, we have done a number of in depth analyses that control for many of these variables mentioned above, in order identify those ad copy elements (i.e. phrases, words, symbols, etc) that can help any advertiser improve their CTR and Quality Score.

As you can see in this article, there is substantial opportunity for advertisers in the Travel space to make improvements in their ad copy without needing to test and test and test – or risk getting the wrong signal from their tests that may be invalidated by the variables mentioned above.

In this example for travel, it bears mentioning that over 80% of the ads in the travel space are using the ad copy elements that are highlighted in the article, yet they can produce, when used as guided, an average of ~110% increases in Ad Quality.

In the last part of our series, we’ll dive deeper into how to use Bing Ads reporting to gather the data necessary to assess the impact of some of the variables mentioned in this article as well as provide more examples of Vertical specific ad copy opportunities for advertisers in the Bing Ads marketplace.

Have any thoughts or comments about this blog or any other thoughts about ad testing challenges and opportunities? Comment below and we can discuss.

Thanks for Reading!

Mike McMeekin and Vivian Li

Advertiser Insights and Analytics, Bing Ads





Read this blog post in:
Previous postNext post