How Vividness Analysis Works

We define “vivid words” as those that evoke a specific sensory experience. Colors, textures, flavors, aromas, bodily sensations. Can you see it in your mind’s eye? Can you feel it in your fingertips? Then it’s vivid.

Shaxpir helps you write vivid prose by giving you a Vividness Highlighter, so that you can see the intensity of your vivid vocabulary, right on the page in bold color!

Every word on the page is highlighted according to its “vividness score”. But where do those scores come from?

Editorial Oversight

It all starts with an editorial opinion about what constitutes a vivid word.

We have personally hand-crafted a list of over 50,000 English language words that we think have some measure of vividness. This is a purely human endeavor, and has been an ongoing project for nearly ten years.

When we started this project, we did a lot of Google searches for things like:

  • “list of paint colors”

  • ”list of exotic spices”

  • “words that describe motion”

  • “words that describe body parts and sensations”

  • ”list of dishes from [ethnicity] cuisine”

  • ”names of articles of [ethnicity] clothing”

Using this technique, we added thousands of words to our “vividness vocabulary” and we continue to add new words whenever we encounter them in our own personal reading. It’s a surprisingly analogue process!

In the course of building this vocabulary, we started identifying a set of subcategories that we could use to organize and reason about vividness:

  • OLFACTORY: Words that describe flavors and aromas, both pleasant and unpleasant, fall into this category. Foods and drinks, herbs and spices, as well words like “petrichor” (a pleasant smell that frequently accompanies the first rain after a long period of warm, dry weather) or “acrid” (strong and bitter, causing a burning feeling in the throat).

  • AURAL: Words that describe sounds, both harmonious and cacophonous, get put into this category. Singing, laughing, crashing, banging, saxophones, and thunderclaps. This category includes human sounds, as well as sounds of animals, atmospheres, machines, and spooky sounds from paranormal realms!

  • ANIMAL: This category includes the names of animals (elephants, insects, ameobas) as well as their body parts (trunks, beaks, mandibles) and behaviors (galloping, flapping, swarming).

  • BODILY: Words that describe the human body, its parts and behaviors. Heads, shoulders, knees, and toes! Eyes, ears, mouth, and nose! Appendix, pancreas, tibia, fibula, etc, etc…

  • WARDROBE & SCENERY: This category includes all the costumes and architecture of the scenery. Hats and gloves and pants and skirts, as well as wallpaper, chandeliers, rugs, and chairs. Plus cars and trucks and trees and grass. In our early efforts, we tried to separate these concepts into distinct categories. But in our real-world observations, it was impossible to find neat divisions between them, because words like ”ornate” (as well as all the color and texture words) are equally at home describing the embroidery on a dress or the rococo carvings of an antique picture frame.

  • MOTION: This category includes all the swirling, whirling, cartwheeling action-words that propel and animate all the other vivid noun-words. Jumping, thrashing, poking, prodding… This category is all about the vivid sensations of motion.

  • GROTESQUE: This is the sickly, slimy, nauseating category. The words here are putrid and ugly, because that ugliness represents a deeply ingrained and especially vivid sensory response. As a writer, these words are powerful tools to invoke intense reactions in your readers!

  • SEXUALITY: Words about human sexuality get a special category in our taxonomy because they convey especially vivid concepts in written depictions of interpersonal relationships. Some of these words are tender and delicate, while others are crude or crass or sloppy… We give them all special attention in our algorithmic design.

As you review this list of subcategories, you can probably intuit that the boundaries between them are fuzzy and ambiguous. That’s perfectly fine in our model! A word like “galloping” has elements of the animal category (because you can see the horse in your mind’s eye) but it also has a strong aural element (because you can hear the sounds of those hooves clattering against the ground) as well as a strong sense of motion.

We apply these eight labels to thousands of different words in our vocabulary… some words get only one label, and some words get multiple labels. At this point in the process, the application of these labels is still 100% based on human judgement.

We don’t try to do this exhaustively, though. We don't try to label every single word with every applicable category. We just try to label enough words to clearly establish a linguistic pattern in our labeling that defines the essence of the category.

Machine Learning

Next, we use these labels to train a simple machine-learning model, and ask it to infer the application of those labels to all the other words in our hand-curated “vividness vocabulary” and to score each of those labels on a scale of 1 to 10 for each of the words in the vocabulary.

Armed with this expanded set of labels and scores, we define a ”vividness formula” that incorporates all of these labels, as well as a few other parameters (including the intensity of the word’s Sentiment, either positive or negative).

This formula produces a new ”vividness score” on a scale of 1 to 10, which represents a composite of all the individual sub-categories and represents our best approximation of what vividness means: words that evoke a specific sensory experience.

It’s not a perfect measurement: It’s a composite of Human Editorial Opinion, enforced by machine-learning algorithm. And every time we add new words to the vocabulary or tweak the formula, all the scores shift by a small amount. But even with the imperfections, the visualizations they enable are useful for storytellers working hard to hone their craft. Turning on the Vividness Highlighter, and seeing the words leap off the page in bold color can help any author craft a more compelling narrative.

Score Labeling in Shaxpir

Having completed our human-centric labeling of data, and our machine-learning scoring methodology, we apply a set of labeling ranges in the Shaxpir writing platform (e.g., in the Vividness Highlighter). Here's how those scoring ranges work:

  • STRONGLY VIVID: scores between 7.0 and 10

  • MODERATELY VIVID: scores between 4.0 and 7.0

  • MILDLY VIVID: scores between 1.0 and 4.0

The Vividness Highlighter only applies coloring to the scored words in our hand-curated vocabulary, and all other words are left without colored highlighting.