Previously, we found that the association between subject line length and email open rate varies by industry. In this blog post, I will use the same generalized linear model (GLM) to examine the relationship between subject line words and open rates, and whether that relationship varies by industry as well.

Again, we'll be focusing on four industries within ThriveHive:

  1. Computers & Electronics
  2. Entertainment & Events
  3. Health & Fitness
  4. Professional Services

 Our data shows the first two industries' email open rates are performing below industry standards, while the second two are performing above industry standards.

 There's a healthy debate about which words work and which words don't.  For example, MailChimp's team suggests not using words like "Free" and using words like "Urgent".  Yesware suggests using terms like "call" versus a term like "calendar." In both cases, open rate is used as a metric of the words' usefulness.

So let's see what words are associated with higher open rates in our sample:

Red words are associated with lower open rates, while blue words are associated with higher open rates.  The larger the word, the more usage it had in the specific industry.

 From the visual above, we see that "sale" is frequently used by Computers & Electronics businesses. Meanwhile,  "newsletter" and other time-oriented words are used by Professional Services.  If we look at these words on their own, emails using the word "sale" are associated with lower than average open rates and "newsletter" are associated with higher than average open rates.

 So does this mean that when writing email subject lines you shouldn't use words like "sale" and should use words like "newsletter"? Not necessarily. As mentioned above and examined in the previous post, there may be something about these industries that leads these words to perform better or worse. Or, generally, these words may just be more used by one industry versus another.  

Subject Line Words + Industry

Using the model for this analysis proved tricky, as there were a number of terms that were used a lot by some industries and not at all by others (e.g. "week" was used by Health & Fitness, but not by Computers & Electronics).  

 Focusing on our industries of interest (i.e. Computers & Electronics, Entertainment & Events, Health & Fitness, and Professional Services), I identified three words used by all three; "newsletter", "spring", and "time".  All three showed a weak positive relationship with higher open rates. When included in the GLM model along with industry, the relationship all but disappeared; use of any of the three was not associated with any significant differences in open rates.  

 This is a bit at odds with other research, but much of the analysis on word choice is done for general guidelines, not at the industry level. If these analyses controlled for industry, they may see smaller differences in open rates by word use.

Limitations of These Results

 One potential issue is that of the number of emails in our sample; 2,500 emails compared to Adestra's report of 3 billion emails. Where Adestra has higher statistical power, we have more complete information about senders (e.g. industry).

 Additionally,  I looked at the 50 most frequent words across industries. This extracts words that represent our email sample, but what are we really interested in? Not necessarily what words are associated with open rates, but what concepts are associated with open rates.  For example, a restaurant might have a "special" while a computer store might have a "sale".

 In another blog post, I will spend some time exploring how to extract words that are connected conceptually across industries to see how those "concepts" are associated with open rates.  Potentially I will be able to suggest alternative words for the same promotion that would likely result in higher open rates (e.g. "deal" instead of "sale).