How to use LLMs to Generate Coherent Long-Form Content using Hierarchical Expansion

Greg Nuttall

October 30, 2023

•

Share this post

Copied!

As impressive as they are, Large Language Models (LLMs) face difficulties when creating long-form content, primarily due to token limitations and inconsistencies in the output over time.

Together with Livy.ai, we developed a "Hierarchical Expansion" method to address these challenges and better the quality, flow, and structure of the content produced. Read further to learn more.

In the last year, Large Language Models (LLMs) have taken centre stage in the realm of artificial intelligence thanks to the popularity of tools such as ChatGPT. These models represent the forefront of natural language processing technology, demonstrating unparalleled proficiency in understanding and generating human-like text.

LLMs are pre-trained on massive amounts of text, enabling them to generate coherent and contextually relevant content based on the prompts provided to them. GPT-3.5 and GPT-4, developed by OpenAI, as well as Meta’s Llama 2, have showcased remarkable capabilities in tasks ranging from creative writing and translation to coding and problem-solving. Companies are already using these LLMs effectively for customer-facing chatbots, generating short-form social media posts, emails, and even simply for idea generation.

However, despite the advancements and the promising applications of LLMs, generating coherent and continuous long-form content presents its own set of challenges. Companies wishing to write articles, blog posts and other marketing copy with the aid of today's available LLMs will often struggle with the maximum output length these models can work with. This limitation often results in the truncation of content, and further requests to continue the output introduce continuity errors and "hallucinations" as more text is written. This means the output needs to be heavily edited and often falls short of the quality required.

Recently, we came across this very problem with one of our customers, Livy.ai, who are pioneering long-form content generation with their AI-based movie script generator. Together with Livy.ai’s product engineering team, we were able to overcome the challenges presented above and develop several approaches which improved the quality, coherence, and structure of the generated movie scripts. In this blog post, we will explore one of the most effective approaches we developed: “Hierarchical Expansion” using the example of blog post generation.

Challenges of Long-Form Content Generation

Context Limits and Hallucinations

As mentioned previously, the maximum output length of models like GPT-3.5 and GPT-4 is often the primary bottleneck for these long-form generation use cases. The maximum output length is constrained by the token limit in the model’s context window, which dictates how much information they can process and consider at a given time. Tokens are numerical representations of text that are used by LLMs, and in general, one token is equivalent to roughly 4 characters in English. This means that with GPT-4's 8192 token limit, we get around 6000 words shared between the prompt and the output.* (There is now a significantly larger context window available for GPT-4 with the 16k model, but this comes at a higher cost per call.)

Often, we will want to give the model a lot of information in the prompt, especially if the subject we are writing about isn't something that the model has not been trained on. This is where the token limit becomes a major bottleneck, since the more information we feed into the prompt, the shorter the allowed output length will be.

So how can we generate text content which is longer than that of the context window? At first the solution might seem simple: just ask the model to continue outputting the text. However, doing so will slide the context window forward, losing the information provided in the original prompt and leading to an output that can lose track of the original task - this is where it can begin to "hallucinate".

For instance, an LLM might be generating a detailed research article, but due to the token limit, it might lose track of the initial thesis statement or key arguments presented in earlier sections, leading to incongruence and potential contradictions in later parts. It might also start introducing falsehoods into its writing, since the information provided in the initial prompt is now unavailable.

Addressing these challenges is essential to fully harness the capabilities of LLMs in generating coherent, consistent, and contextually relevant long-form content that holds up to the level of quality companies expect when representing their business. In the following sections, we will explore one of the approaches we used to help navigate these challenges effectively.

Hierarchical Expansion – A Solution

The Approach

“Hierarchical Expansion” is a technique for chaining together prompts to produce increasingly long pieces of text. This method involves creating a structured outline of the content, starting with a high-level summary that captures the essence of the entire piece. This summary is then broken down into smaller parts, representing different components of the content, these parts are then expanded into more detailed summaries, and finally these summaries are used to create the actual content.

The idea is inspired by this blog post from OpenAI. It shows how GPT-3 could be used to summarise a novel in stages, by summarising each page, and then summarising those into higher level summaries and so on, until the entire book is summarised into a single block of text. This approach aims to do this process in reverse to create arbitrarily long content.

This methodology is highly adaptable and can be applied universally across various forms of content. Sections could represent chapters in a book, or literal sections in an article or blog post, thereby providing a generalised approach for content expansion.

The main advantages of this approach are:

The overall structure of generated content is better
The prompt engineering work for generating the substance of the article is decoupled from the prompt for the writing style, allowing for more fine-grained control
No longer limited to the size of the context window and can therefore generate arbitrarily long pieces of content
This architecture does not depend on the type of model being used - cheaper or open source models can be used to save on cost - we can even use different models for different stages where appropriate

A working example

To put this approach into something more concrete, let’s explore using this technique to write an informative blog post for a chocolate company. This is just a simple example to illustrate the approach, for more practical purposes we would want to spend more time improving the prompts, we will discuss strategies for doing this later on.

Note: the full notebook for this example is available here.

Generating an outline

First, let’s use the GPT-3.5 via the OpenAI Python library to generate an overall outline for our outline based on a short synopsis we will provide.

MODEL = "gpt-3.5-turbo" tone = "professional" synopsis = "The three different types of chocolate, milk, dark, and white. The blog post should be informative and educational and feature a comparison of the health benefits of each type of chocolate." type_of_content = "blog post" role_template = f"""You are writer who specialises in outlining the structure of {type_of_content}.""" prompt_template = f""" Develop a well-structured outline for a {tone} {type_of_content} about {synopsis}. The outline will be divided into sections, you will output the title of each section and give three bullet points describing what it will talk about. Make sure each section doesn't repeat things that have been said in other sections. Give your output in the following format: "Section Title:<title> Description:<description>\n". """ response = openai.ChatCompletion.create( model=MODEL, messages=[ {"role": "system", "content": role_template}, {"role": "user", "content": prompt_template}, ], temperature=0, )

Once we parse the response, we end up with the following outline:

Introduction Introduce the topic of the blog post and provide a brief overview of the three different types of chocolate. Milk Chocolate Discuss the characteristics and composition of milk chocolate, including its higher sugar and milk content compared to other types of chocolate. - Explain the process of making milk chocolate and its smooth and creamy texture. - Highlight popular brands and products that use milk chocolate. Dark Chocolate Explore the features and qualities of dark chocolate, known for its rich flavor and higher cocoa content. - Discuss the health benefits of dark chocolate, such as its antioxidant properties and potential cardiovascular benefits. - Explain the different percentages of cocoa in dark chocolate and how it affects taste and health benefits. White Chocolate Examine the unique characteristics of white chocolate, which is technically not a true chocolate due to the absence of cocoa solids. - Discuss the ingredients used in white chocolate, such as cocoa butter, sugar, and milk. - Address the controversy surrounding white chocolate and its lack of health benefits compared to milk and dark chocolate. Health Benefits Comparison Compare the health benefits of milk, dark, and white chocolate, highlighting their differences and similarities. - Discuss the potential benefits of consuming moderate amounts of chocolate, such as improved mood and cognitive function. - Address the importance of choosing high-quality chocolate and consuming it in moderation. Conclusion Summarize the key points discussed in the blog post and emphasize the importance of understanding the differences between milk, dark, and white chocolate. - Encourage readers to make informed choices when selecting chocolate products based on their preferences and health goals. - Provide a final thought or recommendation related to the topic of chocolate consumption.

This seems like a sensible outline, now we can expand on the outlines using a different prompt and with a new call to the API.

Expanding the summaries

To expand the sections we will use the following prompts:

# first build the 'context' string context = "" context += f" the previous section was called {previous_section}," if previous_section else "" context += f" the section that you are writing is called {section_title}," context += f" the next section after this is going to be called {next_section}." if next_section else "" # build the role and prompt strings summary_role = f"You are a bot who expands brief summaries of a section of a {type_of_content} into bullet pointed summaries that a writer can work from. For context,{context} But you will be given an individual section summary to expand." summary_prompt = f"""In BULLET POINTS write a detailed summary for the following section summary:{section_title}\n{section_description}. Ensure that the summary is consistent with and fits into the synopsis of the whole {type_of_content}."""

Running these prompts on the first section (Introduction) results in this expanded summary:

The introduction of the blog post serves to introduce the topic of chocolate and provide a brief overview of the three different types of chocolate. - It sets the stage for the rest of the blog post by giving readers a general understanding of what will be discussed. - The introduction may include a brief history of chocolate and its significance in various cultures. - It may also mention the popularity of chocolate as a sweet treat and its versatility in various culinary creations. - The three different types of chocolate that will be explored in the blog post are milk chocolate, dark chocolate, and white chocolate. - The introduction may touch upon the basic characteristics of each type, such as their ingredients and flavor profiles. - It may also mention the different uses and preferences for each type of chocolate. - Overall, the introduction aims to capture the readers' attention and provide a foundation for the subsequent sections that delve deeper into each type of chocolate.

Generating Written Content

Finally, we want to generate some writing using the expanded summary above (stored in the section_summary variable). Here are the prompts we will use for that:

And here is the resulting output for the introduction section:

script_role = f"You are a {tone} {type_of_content} writer. You will be given a detailed summary of a section of a blog post and will output it in well-written, reader-friendly fashion. If you are unsure of something, leave gaps in the output for another writer to fill in. Do not make up any information." script_prompt = f"""The section summary is as follows:\n{section_summary}\n\nWrite the section making sure you follow the points described in the summary. The writing should be engaging, entertaining and {tone} in tone, it can include the use of bullet points and gaps for images if appropriate. The section should be between 100 and 200 words long."""

And here is the resulting output for the introduction section:

Welcome to our blog post all about chocolate! In this article, we will be exploring the wonderful world of chocolate and delving into the three different types that exist. Whether you're a chocolate connoisseur or simply have a sweet tooth, this post is sure to satisfy your cravings for knowledge. Now, let's dive right in and take a look at the three main types of chocolate: 1. Milk Chocolate: This is the most popular type of chocolate and is loved by people of all ages. It is made by combining cocoa solids, cocoa butter, sugar, and milk powder. The result is a smooth and creamy texture with a sweet and milky flavor. Milk chocolate is perfect for those who prefer a milder and less intense chocolate experience. 2. Dark Chocolate: If you're a fan of rich and intense flavors, then dark chocolate is for you. Made with a higher percentage of cocoa solids and less sugar, dark chocolate has a bittersweet taste that is often described as complex and sophisticated. It is also known for its health benefits, as it contains antioxidants that can help improve heart health. 3. White Chocolate: While some may argue that white chocolate isn't technically chocolate, it still deserves a spot on our list. Made from cocoa butter, sugar, and milk solids, white chocolate has a creamy and sweet flavor. It lacks the bitterness of dark chocolate and the depth of flavor found in milk chocolate, but it is loved by many for its smooth and melt-in-your-mouth texture. So, whether you prefer the sweetness of milk chocolate, the richness of dark chocolate, or the creaminess of white chocolate, there is a type of chocolate out there to suit every taste bud. Stay tuned for the next section where we will be exploring the origins of chocolate and its fascinating history.

Evaluation

The introduction generated is quite a good one and gives an overview of what we will explore in the blog post, however we can see hints of the main issue of this approach emerging already. Notice how despite being told in the summary prompt that the next section will be about “Milk Chocolate” (through the {next_section} variable), this introduction finishes by saying:

Stay tuned for the next section where we will be exploring the origins of chocolate and its fascinating history.

This is an example of the model “hallucinating” and shows that the information about the next section was lost between the stages in this approach. The subsequent sections generated suffer even more greatly from this issue, each having clearly been written in isolation from the rest of the blog post. For example, in the 5th section which compares health benefits of each chocolate, it has not been written in a way which acknowledges that we have just talked about each of the chocolates in the previous sections:

Health Benefits Comparison When it comes to chocolate, there are three main types to choose from: milk, dark, and white. Each type has its own unique flavor profile and texture, but what about their health benefits? Let's take a closer look at how these chocolates compare and what potential benefits they offer. First, let's talk about the benefits of consuming moderate amounts of chocolate in general. Chocolate, especially dark chocolate, is rich in antioxidants called flavonoids. These compounds have been linked to various health benefits, including improved mood and cognitive function. So, indulging in a little chocolate can actually be good for you! Now, let's dive into the differences between milk, dark, and white chocolate. Milk chocolate is the sweetest of the three, thanks to its higher sugar and milk content. It's a crowd favorite for its creamy and smooth texture. Dark chocolate, on the other hand, has a higher cocoa content and less sugar. This gives it a more intense and slightly bitter taste. Finally, white chocolate is made from cocoa butter, sugar, and milk solids. It has a milder flavor and lacks the characteristic cocoa taste. When it comes to health benefits, dark chocolate takes the lead. Its higher cocoa content means it contains more flavonoids, making it a healthier choice compared to milk and white chocolate. However, it's important to note that not all chocolate is created equal. To reap the maximum health benefits, it's crucial to choose high-quality chocolate that is minimally processed and has a high cocoa content. In conclusion, while all types of chocolate can bring joy to our taste buds, dark chocolate stands out as the healthiest option. Its rich cocoa content and lower sugar levels make it a guilt-free indulgence. Remember, moderation is key when it comes to enjoying chocolate. So go ahead, savor a piece of your favorite chocolate, and reap the benefits it has to offer. [Image placeholder: Image of a variety of chocolate bars]

Issues with this output

The main problems with the output can be summarised as:

The points made in a section can sometimes repeat the ones made in the previous section
Sometimes, the generated text for a section continues to generate beyond the summary it was given. Often covering points which were supposed to be made in the next section.

Keeping Continuity

The crux of the issue here is the lack of context provided to each model call about the rest of the piece. In this particular case, the context window of the GPT-3.5-turbo model is large enough that we could stuff the entire article into each prompt if we wanted. But we want to create a more general approach which can work for arbitrarily long content, and without relying on the size of the context window - which can vary between different popular Large Language Models. For instance Llama was released with a mere 2048 token context window (around 1500 words), which would not be big enough to write a blog post.

The Role of Summarization

To refine the approach described above, we need to focus on maintaining continuity between the sections, whilst also being able to generate each one independently of each other. To do this, after each stage we can summarise the entire article generated so far and feed that summary into the prompt for the next section. To avoid hitting the context limits, we need this summary short but also be as information-dense as possible; this is where the recently published “Chain of Density” prompting technique can be helpful.

In short, the “Chain of Density” technique is a powerful way to summarise a long piece of content into a short summary by identifying the key informational “entities” in the content and iteratively improving the summary by incorporating more entities in each run.

To maintain contextual awareness and coherence across the piece we could build a summary of the article written so far using a version of the “Chain of Density” prompts and then incorporate this article summary into the prompt for generating new content. This architecture is illustrated below:

Further Improvements

The summarisation approach should go a long way in improving the overall output of the system. We can also address the issue of the generated text going beyond the given summary by providing clearer instructions in the generation prompt on when to stop writing.

Ultimately, any LLM-based system like this one will need significant time to be invested in engineering the prompts to create the optimal outputs for your use case. This necessitates a level of trial and error but there are general purpose prompting techniques emerging which can improve outputs. The TELeR taxonomy’s “Level 5 prompt” is a good structure to aim for in developing effective prompts. This is a prompt which has the following characteristics:

Description of High-level goal
A detailed bulleted list of sub-tasks
An explicit statement asking LLM to explain its own output
A guideline on how LLM output will be evaluated / Few-Shot Examples

Lastly, increasingly powerful open source LLMs are being released to the public at a growing rate. It is worth staying on top of this and experimenting with them as they come out to see if they can provide a higher quality output for your needs. Also, if prompting alone is not giving the best results, you may want to consider fine-tuning the model on your desired outputs. This comes with its own set of challenges, especially with gathering the training data, but it is a good way to give the model more information without being bound by the token limits in the prompt.

Conclusion

The approach described in this blog post lays out a framework for the development of generative long-form content systems where context length is a limiting factor. It aims to be agnostic of the underlying models used, and might even work best when implemented as a combination of different models, each focused on a different task.

Moreover, the post delves into the significance of summarisation as a critical tool in navigating the challenges posed by token limitations inherent in LLMs. The delicate equilibrium between brevity and detail is crucial, with techniques such as "Chain of Density" prompting leading the way.

By leveraging these approaches, we move closer to unlocking the full potential of LLMs in content generation, paving the way for advanced applications across various fields. Refining these techniques will surely advance generative content systems, driving further innovation and unveiling new technological possibilities.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

Share this post

Copied!

Blog

Machine Learning

Data Analysis