🟡 Generated Knowledge

The idea behind the generated knowledge approach¹ is to ask the LLM to generate potentially useful information about a given question/prompt before generating a final response.

For example, say you want to want to write a small blog post about some animal, like Spirit bears. Before asking the LLM to write the blog post, you can ask it to generate some facts about Spirit bears. This will help the LLM write a more informative blog post. There are two simple ways of going about this.

Single prompt approach

The first approach asks the LLM to generate knowledge then write the post, all with a single prompt.

Generate 4 facts about the Kermode bear, then use these facts to write a short blog post using the information:
The Kermode bear, also known as the spirit bear, is a rare subspecies of the American black bear that is found in the coastal regions of British Columbia, Canada. 

1. The Kermode bear is a unique subspecies of the American black bear, with approximately 10% of the population having white fur. 

2. The Kermode bear is a symbol of the First Nations people of British Columbia, and is seen as a spirit animal. 

3. The Kermode bear is an important part of the ecosystem, helping to disperse seeds and maintain the balance of the forest. 

4. The Kermode bear is a protected species, and its population is monitored closely by the provincial government.

The Kermode bear is a unique and special animal that is found in the coastal regions of British Columbia, Canada. Approximately 10% of the population of this subspecies of the American black bear have white fur, making them a rare sight. The Kermode bear is a symbol of the First Nations people of British Columbia, and is seen as a spirit animal. It is an important part of the ecosystem, helping to disperse seeds and maintain the balance of the forest. The Kermode bear is a protected species, and its population is monitored closely by the provincial government. It is a beautiful and majestic animal, and it is important that we do our part to protect and conserve this species.

Dual prompt approach

In this approach, we first prompt the LLM to generate facts about the bear:

Generate 10 facts about the Kermode bear:
The Kermode bear, also known as the Spirit Bear, is a rare subspecies of the American black bear found in British Columbia, Canada.

The Kermode bear has a unique white or cream-colored coat, which is caused by a recessive gene.

The Kermode bear is a symbol of hope and renewal for the First Nations people of British Columbia.

The Kermode bear is a protected species and is listed as a species of special concern by the Committee on the Status of Endangered Wildlife in Canada.

Then, we feed that information into another prompt to write the blog post:

1. The Kermode bear, also known as the Spirit Bear, is a rare subspecies of the American black bear found in British Columbia, Canada.

2. The Kermode bear has a unique white or cream-colored coat, which is caused by a recessive gene.

3. The Kermode bear is a symbol of hope and renewal for the First Nations people of British Columbia.

4. The Kermode bear is a protected species and is listed as a species of special concern by the Committee on the Status of Endangered Wildlife in Canada.

Use the above facts to write a one paragraph blog post about the Kermode bear:

The Kermode bear, also known as the Spirit Bear, is a rare subspecies of the American black bear found in British Columbia, Canada. This unique bear has a white or cream-colored coat, caused by a recessive gene, and is a symbol of hope and renewal for the First Nations people of British Columbia. The Kermode bear is a protected species and is listed as a species of special concern by the Committee on the Status of Endangered Wildlife in Canada, making it an important part of the region's biodiversity.

I recommend this second approach since it can produce longer content more reliably.

Another use case

The generated knowledge approach was actually introduced for a completely different task, that of answering difficult questions. Consider the following question, which GPT-3 answers incorrectly:

If we first ask the LLM to generate facts about Congo and South Africa, we can then use that information to answer the question correctly. In theory, this is similar to CoT, since we are effectively getting the LLM to generate intermediate reasoning in the form of related facts.

Let's start with the first step, knowledge generation. We can ask the LLM to generate facts about Congo and South Africa:

Next, let's use that knowledge to answer the question correctly. This is the knowledge integration step!

A more technical discussion

Although the above use case was similar to the way generated knowledge was originally introduced, it is not exactly the same. The below content covers the more technical context in which the approach was introduced. It follows the two intermediate steps (knowledge generation and knowledge integration) pattern that we saw above.

Generated Knowledge (Liu et al.)

Knowledge Generation

In the knowledge generation step, the LLM is asked to generate a set of facts about the question. The LLM is prompted in few-shot fashion as seen below. M different completions are generated using this same prompt (similar to the self-consistency approach).

Generated Knowledge Example (Liu et al.)

Knowledge Integration

Next, we generate "knowledge augmented" questions and prompt the LLM with them to get final answers. The easiest way to understand this is to go through an example.

Let's assume we are attempting to answer the question "Most Kangaroos have <mask> limbs". Assume that at the knowledge generation step we generated 2 knowledges (M=2):

Knowledge 1: Kangaroos are marsupials that live in Australia.
Knowledge 2: Kangaroos are marsupials that have 5 limbs.

Now, we concatenate each knowledge with the question to generate knowledge augmented questions:

Knowledge Augmented Question 1: Most Kangaroos have <mask\> limbs. Kangaroos are marsupials that live in Australia.
Knowledge Augmented Question 2: Most Kangaroos have <mask\> limbs. Kangaroos are marsupials that have 5 limbs.

We then prompt the LLM with these knowledge augmented questions and get the final answer proposals:

Answer 1: 4
Answer 2: 5

We select the answer with the highest probability as the final answer. The highest probability could be the softmax probability of the answer token, or the log probability of the answer token(s).

Results

This method shows improvements on various commonsense datasets.

Notes

The knowledge corresponding to the selected answer is called the selected knowledge.
In practice, you could take the most frequently occurring answer as the final one.

Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning. ↩

🟡 Generated Knowledge

Single prompt approach​

Dual prompt approach​

Another use case​

A more technical discussion​

Knowledge Generation​

Knowledge Integration​

Results​

Notes​