🟢 Introduction

This chapter covers how to make completions more reliable, as well as how to implement checks to ensure that outputs are reliable.

To a certain extent, most of the previous techniques covered have to do with improving completion accuracy, and thus reliability, in particular self-consistency¹. However, there are a number of other techniques that can be used to improve reliability, beyond basic prompting strategies.

LLMs exhibit various problems including hallucinations², flawed explanations with CoT methods², and multiple biases including majority label bias, recency bias, and common token bias³. Additionally, zero-shot CoT can be particularly biased when dealing with sensitive topics⁴.

Common solutions to some of these problems include calibrators to remove a priori biases, and verifiers to score completions, as well as promoting diversity in completions.

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ↩
Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning. ↩
Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. ↩
Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning. ↩