How to Use an LLM Without Selling Your Soul

A guide for busy writers.

How to Use an LLM Without Selling Your Soul
Don't lose your head. Photo by Mika Ruusunen / Unsplash

Around April of 2024, I was stuck in a terrifying situation for a technical writer: in need of a review and not a soul available to provide one. I had less than a month to finalize documentation for a complex product that didn’t even exist the prior December. Meanwhile, the subject matter experts were all busy subjecting matters to their expertise and anyone capable of a didactic review was thoroughly preoccupied. We were all frantically patching together a plane as it jockeyed down the runway, praying that the damn thing would take flight.

And so, in a fit of desperation, I found myself pleading with ChatGPT for an honest and professional review. Prompting a large language model (LLM) is spooky business. LLMs don’t think — not in the way you or I do, or, for that matter, any other biological organism inhabiting this mudball. How does one even communicate with such a thing? Brushing aside my trepidations, I decided to treat ChatGPT like a human:

Please review the following document for the provided target audience and stated objectives.

TARGET AUDIENCE:
  Data engineers familiar with Python, SQL, and the Snowflake Data Cloud.

OBJECTIVES:
  1) Set up a local development environment and install the RelationalAI
     (RAI) Python package.
  2) Provide a good mental model for developing with RAI.
  3) Introduce the fundamental concepts required to solve problems using
     RAI Python.

BEGIN DOCUMENT:

...

END DOCUMENT.

In your review, please answer the following questions.

1) What do you like?
2) What don't you like?
3) Is anything confusing or unclear?
4) Is anything missing?

I submitted my query and before I’d lifted my finger from the RETURN key ChatGPT had conjured a response, each word appearing before my eyeballs had absorbed the photons of the last. Certainly ChatGPT deserves points for punctuality. Even if the review contained a single helpful comment, the speed with which it was produced could excuse whatever fluff I’d be forced to read to extract it.

In fact, the review contained several helpful comments. Answering my third question, for example, ChatGPT pointed out:

Placeholder Usage: Using placeholders such as <db>, <schema>, and <table> without additional context or examples might lead to confusion, especially for users unfamiliar with these conventions.

Touché. Indeed I had used those placeholders without providing a concrete example of what to replace them with. I shuddered as I recalled struggling with a bash command early in my career because some StackOverflow author had made a similar omission. I’d sinned, but at least ChatGPT corrected me sans judgement. I was ashamed I’d missed it, but the soulless comment spared me the embarrassment of receiving the same feedback from a fellow fleshbag.

Other comments were only vaguely helpful:

Assumed Knowledge: Certain sections assume a level of familiarity with both SQL and Python that might not be present in all SQL users.

If I’d waited days or, as is sometimes the case, weeks to get such a comment from a coworker, I’d be fuming. But I’d waited mere seconds. Besides, I could ask it to elaborate and get a similarly speedy response. I spent some time conversing with my synthetic reviewer and made a number of beneficial edits. Within an hour I had a draft I wouldn’t be embarrassed to publish should I fail to get a timely human review.

Was it perfect? God, no. But it was a hell of a lot better than my original draft and, crucially, completed without that asynchronous dance known as Waiting For a Review that would have undoubtedly spanned multiple days and required at least one “friendly reminder.” I met my deadlines and we successfully launched our product with a solid docs foundation that I’ve been iterating on since.

As a result of the experiment’s success, not to mention the perpetual time crunch professional writers must endure, I continued to toy with techniques for generating feedback. I list a few of the most powerful ones here. Bear in mind these are all developed from the perspective of a technical writer, although they could easily apply to other types of writing with a little adaptation.

  1. Ask the LLM to give your draft a score from 1–10 for its effectiveness at meeting your objectives for your audience.¹ It helps to state that it should be difficult to achieve a high score and to request suggestions for improving the score. I’ll iterate on my drafts until I get a score of eight or better before opening them up for human review.
  2. Devise a scenario and ask the LLM to solve a problem using only the text and basic knowledge assumed by the target audience. If an untrue or mistaken claim is made, follow up by asking the LLM to explain itself by providing quotes from your draft to support its conclusion.
  3. Use the LLM to create avatars for your target audience with different experience levels, backgrounds, and reading styles.² Start new chats where the LLM assumes the role of an avatar, provide it with your draft, and ask it for feedback. At least one of the avatars should be contrarian, and you should spend a fair amount of time conversing with that one.

There are a couple of caveats to be aware of. First, a review generated by an LLM is not, nor will it ever be, a replacement for a review from a human subject matter expert. If you’re writing documentation for a novel system with proprietary components there is no training set in the world that contains sufficient data from which an LLM can gauge technical accuracy.

The second caveat is frequent hallucination, a term the industry uses for an LLM’s propensity to lie unabashedly.³ This alone disqualifies LLMs as subject matter experts. Sure, a human might lie, too. But at least a human expert has a reputation to uphold. They have, quite literally, skin in the game. That’s not to say that hallucination is inherently bad. In fact, it’s a feature. Would LLM avatars be any good without some hallucinatory properties? I doubt it.

I like to think of ChatGPT as a sort of interactive notebook — a journal that talks back. It is a tool that accelerates experimentation so that I can more efficiently hone my thoughts. Have you ever submitted the same prompt to ChatGPT twice? You’ll often get vastly different results. Each variation offers you a new angle to contemplate and an opportunity to work out exactly what you think.

If LLMs are good at generating feedback, surely they’d excel at generating the final product, right? If we’re to believe the influencers on our social media feeds, generative AI will soon produce Pulitzer prize-winning novels, Oscar-winning movies, and an endless stream of entertainment tailored to our individual preferences! And you know what, who knows — maybe it will. But I assert that they’ll lack the unique and surprising inevitability of great art. They will, by definition, be derivative.

In All the Sentences ChatGPT Cannot Find, Pierz Newton-John calls ChatGPT a “cliché machine.” LLMs continue a given prompt by predicting the next most likely words and, as a result, regurgitate tropes and awkwardly mix metaphors. Pierz writes:

Predictability is what makes a text easy to read and understand. Yet it is the violations of the reader’s expectations that make a piece of writing feel vital, that make us sit up and pay attention.

For technical writers, creating text that is easy to read and understand is pretty much the whole job. And yet there are ways in which our uniqueness pokes through: in the examples we choose, the narratives we craft, and the way we present the content. Technical writers have style. We need to because we don’t just explain things; we mold users’ perceptions of a product.

Great technical writing uses familiar language and symbols to expose a reader to unfamiliar ideas in a way that violates their expectation of what documentation is capable of. My favorite example: LEGO instruction booklets. Each one presents new and genius ways of snapping bricks together using few, if any, words. Some are so good they practically require you to tear apart the build every now and again to start anew and relive the thrill.⁴

Achieving this level of artistry requires far more time and effort than our deadlines often allow. We may be tempted to simply use an LLM to generate the text for us. But doing so comes at a cost to both our craft and our integrity. Instead, if we can understand what LLMs are capable of and harness hallucinations for our benefit, we can reduce the time it takes to produce better writing.

In the torrent of tired tropes and cheap language we may soon find ourselves swept up in, and with ever tightening deadlines, learning to leverage LLMs without losing our souls may be vital for, if you’ll excuse the cliché, standing out in the crowd.

Notes

¹ Be careful taking LLM-generated scores at face value. In Can Large Language Models Automatically Score Proficiency of Written Essays?, Mansour et al. conclude that “despite the astonishing ability of LLMs to generate coherent and good-quality text, they struggle to distinguish between good and bad essays.” Their experiments indicate that scores are highly sensitive to the LLM used, the scoring task, and the prompt design.

² In my article Rethinking Developer Personas for Technical Documentation I discuss an evidence-based taxonomy for classifying how readers use documentation. Don’t just focus on the who.

³ Can an LLM really lie, though?

⁴ According to How we design our building instructions on LEGO’s website, the building instructions team builds a model “many, many times in different ways to come up with the best way of building.” This resonates with me. My first attempt at explaining a complex topic is rarely the best; it takes repeated attempts to find the right one. An LLM can accelerate the feedback loop required to test out various explanations.