Harnessing GPT-4 for Meeting Summarization: Zero-Shot and Aspect-Based Approaches

Matt Payne
August 21, 2023

Every day, your sales organization conducts countless meetings, each one brimming with valuable insights and action items that help move your deals forward. These insights can help save time and resources spent in the sales cycle and improve conversion rates. However, distilling these insights from the sea of conversation is no easy task. In this article, we delve into how OpenAI's GPT-4 can be utilized to summarize meeting transcripts, cutting through the noise to deliver the insights you need.

Exploring GPT-4's Capabilities in Meeting Summarization

There are three primary methods to utilize GPT-4 for meeting summarization: zero-shot, few-shot, and fine-tuned approaches. In this article, we will focus on the first two methods.

Zero-Shot Summarization

In zero-shot summarization, the meeting transcript is fed directly to GPT-4 with a task focused prompt. The model is relied upon to follow the instructions in the prompt that best describe how we reach our goal output. For simple tasks, the prompt instructions are often sufficient. However, for more complex tasks, even detailed instructions may not be enough, necessitating the use of the few-shot approach or prompt variables. We’ve even started using SOPs to help the model better understand how a manual operator would perform the task.

Few-Shot Summarization

In the few-shot approach, the prompt includes both instructions and input-output examples to guide GPT-4 towards our goal state output. The model can infer the desired output based on the patterns in these examples and apply these patterns to the target input meeting transcript. These task completion examples help steer the model towards what we consider a successful output for our input data.

In the following sections, we will explore zero-shot and aspect-based summarization environments in various meeting scenarios.

Use Case 1: Zero-Shot GPT-4 Summarization of Meeting Transcripts

For our first use case, we take the transcript of a client meeting. The meeting is about developing a summarization solution for a client in the sports industry.

Inference Time Transcript

The meeting audio was transcribed to text using the Whisper API. The full transcript is shown below with fake names to protect the actual identities and businesses.



Key Topics in the Transcript We’ve Extracted

Identifying the key topics manually helps us set a baseline for what we would consider good topics to cover in a summary. Since we worked with this customer and put together a successful proposal we’ve got a pretty good idea of what information was important.

The key topics include:

  1. The customer's primary business goal of summarizing sports commentaries
  2. How the seller's summarization solution works and what the key pieces are
  3. Discussion on price estimates for the project based on past projects
  4. How the solution can be integrated with the client's workflows
  5. Concerns and questions surrounding data confidentiality and data ownership of the final project
  6. Questions surrounding the confidential data of one client leaking into the summaries of another client. This is key when building a multi-use summarization system with LLMs
  7. Final deliverables out of this call - project proposal and next steps

Zero-Shot Environment Extractive Summarization of the Meeting Transcript

Naturally, GPT-4 has a tendency to rephrase segments of the conversation when tasked with summarization. However, GPT-4 can also perform a more direct form of summarization if it is instructed to pick out sentences instead of creating new ones.

We guide GPT-4 in this process by using the following prompt:

Zero-Shot Environment Extractive Summarization

We then evaluate the summary that this prompt generates from our meeting transcript.

Assessment & Evaluation of the Zero-Shot Summary

The output zero-shot summary is shown below:

results - Assessment & Evaluation of the Zero-Shot Summary

As per the instructions, the summary has maintained the original sentences from the transcript. It effectively highlights the service provider's role and clientele, the customer's requirements, and a high-level overview of the proposed solution. It also includes some specific details provided by the service provider about the solution's components.

When it comes to covering the main topics, this summary manages to touch upon five out of the seven key points.

However, the summary falls short in addressing the topic of data confidentiality. Despite the client dedicating a significant portion of the meeting to discuss concerns about data confidentiality and ownership, the summary only indirectly references this through a mention of potential legal obstacles. This important issue has been insufficiently covered.

Another topic that isn't fully covered is the integration of the proposed solution with the client's existing systems. The summary includes one sentence about the solution being incorporated into the client’s system, but it doesn't expand on this. To improve this, we could employ aspect-based summarization prompts that allow us to concentrate more on the specifics of the industry. Let's delve into how we can refine this process.

Multi-Task Prompting to improve zero-shot extractive summarization

We can enhance GPT-4’s understanding of what information is important in our summary through a new multi-step prompt workflow. We first ask GPT-4 to provide us with a list of the key topics discussed in the meeting. This could be iterated on through an aspect-based or industry-specific if you want to constrain the data variance available at the input.  

key topics from the meeting as generated by gpt-4

Our topic identification process successfully highlighted the overlooked concerns around data confidentiality and ownership as a critical subject. It also efficiently captured details about the other topics that we had already drawn out, ensuring we maintain a comprehensive context moving forward. This list can now be used as a new prompt variable that is considered by our extraction model, and includes new pieces of information that the extractive summarization piece did not consider valuable before. I love workflows like this, as they create very few new edge cases as we didn’t widen the data variance, and improve the accuracy of our good results. In other words, results that were 1 out of 7 or 2 out of 7 rarely go down, but results that were 5 out of 7 and 6 out of 7 go up.

Next, we execute our standard direct summarization process but with a minor modification to consider the key topics. We supply both the transcript and the list of topics to GPT-4. In our full production workflow, these two steps are executed in succession to create a one step workflow.

output summaries

The summarization now includes key sentences pertaining to AWS IAM Roles and data privacy concerns. While this is pretty good, there's still room for refinement. Some of these sentences could be replaced with others that provide more valuable information. However, for a zero-shot approach with a straightforward prompt, the results are really good. The prompt can be further enhanced with an aspect-based approach which we will discuss later. This approach would allow us to focus more on the specific industry and the use case at hand. Here's an example of a summary output from an aspect-based prompt that we've used specifically for transcripts revolving around business strategy development.

example summary of aspect-based summarization

Next, we’ll take a look at how abstractive summarization looks.

Zero-Shot Abstractive Summarization of the Meeting Transcript

For abstractive summarization of the same meeting transcript, we use this simple prompt that has removed the idea of exact sentences, while keeping the same goal state output definition.

prompt for Zero-Shot Abstractive Summarization of the Meeting Transcript

In client specific products this soft prompt could be run through our prompt optimization framework to create a much more dataset-specific prompt. The term “write a summary” instead of “generate a summary” produces a much better output as well that reads like an article abstract.

Evaluation of the Abstractive Summary

The abstract summary output looks like this:

prompt example for Evaluation of the Abstractive Summary

This summary does a commendable job covering six out of the seven main topics. It successfully highlights the lengthy discussion on data confidentiality concerns and provides a snapshot of the proposed solution.

The coherence between sentences is impressive. Each sentence logically flows into the next, providing a clear and concise overview. In a zero-shot setting, abstractive summaries often have the upper hand over extractive summaries because they can weave together various sentences and reformulate key points. Abstractive summaries have the advantage of processing the data in a way the model finds more intuitive, while extractive summaries have to present the summary as it is. This can sometimes result in a somewhat disjointed read, even if the model grasps the key topics fully. The only area of improvement could be a more in-depth exploration of the client's specific challenges in incorporating the solution into their existing systems.

Next, we explore the summarization of longer dialogues such as webinars or extended meetings.

Modern Chunking Architectures For Meeting Text Summarization

While the meeting transcripts we've covered so far didn't necessitate breaking up the text for GPT-4, given their duration is between 32-40 minutes, lengthier meetings or interviews present a different challenge. These longer dialogues exceed the context window of GPT-4, necessitating a different approach.

In such cases, we use a technique called chunking to break the text into manageable pieces. Each segment is then processed and summarized individually by GPT-4. However, having multiple summaries for a single meeting based on each chunk isn't practical, so we employ an additional model at the end of our pipeline that focuses on creating a single combinator summary. This model's function is to intelligently merge the smaller summaries into one coherent summary that accurately represents the entire meeting.

To illustrate this workflow, I've created an architecture diagram back in 2021, which provides a visual representation of the entire process.

full summarization pipeline

A common question I’m asked from clients revolves around the process and goals of meeting chunking. How large should my text chunks be relative to the model and meeting? Should they opt for a static size, or should they use a sliding window based on content? How can they ensure their segment encapsulates as much relevant context as possible, while avoiding the repetition of information in different chunks or the loss of an entire key topic split between multiple chunks? Chunking is a crucial component of the process and significantly influences the performance of the summarization downstream. This is particularly true when we think about merging summaries together, which are based on the key context found in smaller chunks or phrases.

Let’s look at a chunking framework that works well for meeting transcripts. The goal of this framework is:

goal of the chunking framework as generated by gpt-4

Note: This chunking approach works best with chunks that are a larger proportion of the entire meeting. For pretty much any document or transcript chunking use case I recommend using larger chunks. While extremely long chunks could run into comprehension issues due to generalization, this is mostly seen in chunks that come close to the context limits. This work is outlined in this amazing research paper.

Topic Infused Chunking for Meeting Summarization

chunking pipeline for topic infused chunking

First we segment the meeting transcript into sizes based on a blended score of word count, relevant keywords or topics, and use case specific aspects (optional).

Next, we proceed to pinpoint the most relevant topic from the chunks both preceding and following the current one. This is a different approach from the previous prompt as we're seeking to derive information in such a way that communicates to the model that there's more to the meeting than just this primary topic of the meeting, but this particular topic either strongly aligns with the “current meeting chunk” or it does not. An alternative method would be to identify the most crucial topic from all chunks above and below and fuse them into a single “descriptive” style topic that resembles a more condensed summary. We can now inject a bit of context about the overall meeting into the chunk, without overshadowing or dominating the key context of the current meeting chunk.

We utilize the blending prompt to infuse the key topic from the rest of the meeting transcript into the present meeting chunk. The GPT model generates new contextually relevant language that indicates whether the information has already been discussed or not, based on the content of the current meeting transcript chunk. This step assists the model in discerning which information is already deemed significant and whether it should be included in the current meeting transcript chunk during extractive summarization.

The result is an updated current meeting transcript chunk that provides more insight into the topics discussed in the preceding and following chunks. When executed correctly, this process significantly influences the selection of key topics for this particular chunk. This method proves to be more effective than the traditional sliding window approach of generating key topics from all preceding chunks plus the current chunk. This is because we now have context about what's discussed in the following chunk and we're not overwhelmed with a multitude of semantically similar topics from preceding chunks of text.

Use Case 2: Aspect-Based Summarization of Meeting Transcripts

When deployed effectively, aspect-based summarization is a powerful tool for drawing out distinct summaries from the same meeting transcript without altering the prompt structure. The only change required is the aspect of focus. This approach allows you to cater to a variety of summarization use case or domain requirements without the need to construct a completely new system and prompt for each use case. The result can be a set of summaries, each offering a unique perspective based on the chosen aspects.

Zero-Shot Extractive Aspect-Based Summarization of Meeting Transcripts

In this context, we present a prompt design that facilitates the easy substitution of the key aspect in the instructions. This is done without the need to awkwardly rewrite the language of the prompt.

The structure of the prompt is as follows:

prompt for Zero-Shot Extractive Aspect-Based Summarization of Meeting Transcripts

Evaluation of the Zero-Shot Extractive Aspect-Based Summaries

Here’s a result that looks pretty good when provided a vague aspect.

output results for 4 key sentence extraction

3 of them are clearly related to the next steps and are centered around action items to take as the next steps. The second point is a bit confusing as to why its related to next steps, and I’d expect that summary to come back if my chosen aspect was more focused on key benefits to the customer. It's the sentence that comes directly after the first bullet point which is probably why it was chosen. The sentence “I can also do the same on my end, but I feel like you'll probably better at the elevator pitch and then pricing information and anything you can think of in terms of what it would take to train in the timeline.” should have been used instead. Here was the models reasoning for each of them:

The reasoning for number two is pretty interesting and provides a reason that takes into account the entire conversation up to that point.

2 key sentence extraction

Here I chose a much more granular aspect that is only briefly talked about in the transcript. This should be much harder to extract out as these are really the only two sentences that are related to exactly what the chunking algorithm focuses on trying to do. The model does a great job. It doesn’t choose sentences talking about what we focus on, what summarization focuses on, or even what the results of chunking are. These two sentences are directly related to what the chunking algorithm focuses on improving in our entire pipeline.

Self Reflection To Improve Results

Self Reflection To Improve Results

Here’s an interesting result that comes back when we let GPT-4 decide how many sentences to return instead of setting that value. This is commonly how summarization systems are used where we don’t decide ahead of time how many sentences are relevant, but ask the model to do it for us.

You’ll notice that the results that come back aren’t really correct. The first and third sentences provide context about the customer's product and the length of the sports commentaries, but do not specifically address their current summarization process.

By asking GPT-4 to “check its work” we can actually improve the results that come back when opening the domain of possible results that can come back. This is sometimes called self reflection and allows the model to review the results one more time before returning. We can see that this fixes our issue on the first attempt.

results of self reflection on meeting transcript

Incorporating Meeting Summarization With GPT-4

Interested in integrating these workflows into your product? Let’s chat about how we can build this tool right into Zoom and Google Meet via their APIs.