📚 Bibliography

The page contains an organized list of all papers used by this course. The papers are organized by topic.

To cite this course, use the provided citation in the Github repository.

🔵 = Paper directly cited in this course. Other papers have informed my understanding of the topic.

Note: since neither the GPT-3 nor the GPT-3 Instruct paper correspond to davinci models, I attempt not to cite them as such.

Prompt Engineering Strategies

Chain of Thought¹ 🔵

Zero Shot Chain of Thought² 🔵

Self Consistency³ 🔵

What Makes Good In-Context Examples for GPT-3?⁴ 🔵

Generated Knowledge⁵ 🔵

Rethinking the role of demonstrations⁶ 🔵

Scratchpads⁷

Maieutic Prompting⁸

STaR⁹

Least to Most¹⁰

Reliability

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning¹¹ 🔵

Prompting GPT-3 to be reliable¹²

Diverse Prompts¹³ 🔵

Calibrate Before Use: Improving Few-Shot Performance of Language Models¹⁴ 🔵

Enhanced Self Consistency¹⁵

Bias and Toxicity in Zero-Shot CoT¹⁶ 🔵

Constitutional AI: Harmlessness from AI Feedback¹⁷ 🔵

Compositional Generalization - SCAN¹⁸

Automated Prompt Engineering

AutoPrompt¹⁹ 🔵

Automatic Prompt Engineer²⁰

Models

Language Models

GPT-3²¹ 🔵

GPT-3 Instruct²² 🔵

PaLM²³ 🔵

BLOOM²⁴ 🔵

BLOOM+1 (more languages/ 0 shot improvements)²⁵

Jurassic 1²⁶ 🔵

GPT-J-6B²⁷

Roberta²⁸

Image Models

Stable Diffusion²⁹ 🔵

DALLE³⁰ 🔵

Soft Prompting

Soft Prompting³¹ 🔵

Interpretable Discretized Soft Prompts³² 🔵

Datasets

GSM8K³³ 🔵

HotPotQA³⁴ 🔵

Fever³⁵ 🔵

BBQ: A Hand-Built Bias Benchmark for Question Answering³⁶ 🔵

Image Prompt Engineering

Taxonomy of prompt modifiers³⁷

DiffusionDB³⁸

The DALLE 2 Prompt Book³⁹ 🔵

Prompt Engineering for Text-Based Generative Art⁴⁰ 🔵

With the right prompt, Stable Diffusion 2.0 can do hands.⁴¹ 🔵

Optimizing Prompts for Text-to-Image Generation⁴²

Prompt Engineering IDEs

Prompt IDE⁴³ 🔵

Prompt Source⁴⁴ 🔵

PromptChainer⁴⁵ 🔵

PromptMaker⁴⁶ 🔵

Tooling

LangChain⁴⁷ 🔵

TextBox 2.0: A Text Generation Library with Pre-trained Language Models⁴⁸ 🔵

OpenPrompt: An Open-source Framework for Prompt-learning⁴⁹ 🔵

GPT Index⁵⁰ 🔵

Applied Prompt Engineering

Language Model Cascades⁵¹

MRKL⁵² 🔵

ReAct⁵³ 🔵

PAL: Program-aided Language Models⁵⁴ 🔵

User Interface Design

Design Guidelines for Prompt Engineering Text-to-Image Generative Models⁵⁵

Prompt Injection

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods⁵⁶ 🔵

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples⁵⁷ 🔵

Prompt injection attacks against GPT-3⁵⁸ 🔵

Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions⁵⁹ 🔵

adversarial-prompts⁶⁰ 🔵

GPT-3 Prompt Injection Defenses⁶¹ 🔵

Talking to machines: prompt engineering & injection⁶²

Exploring Prompt Injection Attacks⁶³ 🔵

Using GPT-Eliezer against ChatGPT Jailbreaking⁶⁴ 🔵

Jailbreaking

Ignore Previous Prompt: Attack Techniques For Language Models⁶⁵

Lessons learned on Language Model Safety and misuse⁶⁶

Toxicity Detection with Generative Prompt-based Inference⁶⁷

New and improved content moderation tooling⁶⁸

OpenAI API⁶⁹ 🔵

OpenAI ChatGPT⁷⁰ 🔵

ChatGPT 4 Tweet⁷¹ 🔵

Acting Tweet⁷² 🔵

Research Tweet⁷³ 🔵

Pretend Ability Tweet⁷⁴ 🔵

Responsibility Tweet⁷⁵ 🔵

Lynx Mode Tweet⁷⁶ 🔵

Sudo Mode Tweet⁷⁷ 🔵

Ignore Previous Prompt⁷⁸ 🔵

Updated Jailbreaking Prompts⁷⁹ 🔵

Surveys

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing⁸⁰

PromptPapers⁸¹

Dataset Generation

Discovering Language Model Behaviors with Model-Written Evaluations⁸²

Selective Annotation Makes Language Models Better Few-Shot Learners⁸³

Applications

Atlas: Few-shot Learning with Retrieval Augmented Language Models⁸⁴

STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension⁸⁵

Miscl

Prompting Is Programming: A Query Language For Large Language Models⁸⁶

Parallel Context Windows Improve In-Context Learning of Large Language Models⁸⁷

Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models⁸⁸

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks⁸⁹

Making Pre-trained Language Models Better Few-shot Learners⁹⁰

Grounding with search results⁹¹

How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models⁹²

Plot Writing From Pre-Trained Language Models⁹⁴ 🔵

StereoSet: Measuring stereotypical bias in pretrained language models⁹⁵

Survey of Hallucination in Natural Language Generation⁹⁶

Examples⁹⁷

Wordcraft⁹⁸

PainPoints⁹⁹

Self-Instruct: Aligning Language Model with Self Generated Instructions¹⁰⁰

From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models¹⁰¹

Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference¹⁰²

A Watermark for Large Language Models¹⁰³

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. ↩
Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., & Iwasawa, Y. (2022). Large Language Models are Zero-Shot Reasoners. ↩
Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ↩
Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2021). What Makes Good In-Context Examples for GPT-3? ↩
Liu, J., Liu, A., Lu, X., Welleck, S., West, P., Bras, R. L., Choi, Y., & Hajishirzi, H. (2021). Generated Knowledge Prompting for Commonsense Reasoning. ↩
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., & Zettlemoyer, L. (2022). Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? ↩
Nye, M., Andreassen, A. J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., Sutton, C., & Odena, A. (2021). Show Your Work: Scratchpads for Intermediate Computation with Language Models. ↩
Jung, J., Qin, L., Welleck, S., Brahman, F., Bhagavatula, C., Bras, R. L., & Choi, Y. (2022). Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations. ↩
Zelikman, E., Wu, Y., Mu, J., & Goodman, N. D. (2022). STaR: Bootstrapping Reasoning With Reasoning. ↩
Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. ↩
Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning. ↩
Si, C., Gan, Z., Yang, Z., Wang, S., Wang, J., Boyd-Graber, J., & Wang, L. (2022). Prompting GPT-3 To Be Reliable. ↩
Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., & Chen, W. (2022). On the Advance of Making Language Models Better Reasoners. ↩
Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Calibrate Before Use: Improving Few-Shot Performance of Language Models. ↩
Mitchell, E., Noh, J. J., Li, S., Armstrong, W. S., Agarwal, A., Liu, P., Finn, C., & Manning, C. D. (2022). Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference. ↩
Shaikh, O., Zhang, H., Held, W., Bernstein, M., & Yang, D. (2022). On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning. ↩
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback. ↩
Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350 ↩
Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020). AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.346 ↩
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. (2022). Large Language Models Are Human-Level Prompt Engineers. ↩
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. ↩
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. ↩
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., … Fiedel, N. (2022). PaLM: Scaling Language Modeling with Pathways. ↩
Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., Tow, J., Rush, A. M., Biderman, S., Webson, A., Ammanamanchi, P. S., Wang, T., Sagot, B., Muennighoff, N., del Moral, A. V., … Wolf, T. (2022). BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. ↩
Yong, Z.-X., Schoelkopf, H., Muennighoff, N., Aji, A. F., Adelani, D. I., Almubarak, K., Bari, M. S., Sutawika, L., Kasai, J., Baruwa, A., Winata, G. I., Biderman, S., Radev, D., & Nikoulina, V. (2022). BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting. ↩
Lieber, O., Sharir, O., Lentz, B., & Shoham, Y. (2021). Jurassic-1: Technical Details and Evaluation, White paper, AI21 Labs, 2021. URL: Https://Uploads-Ssl. Webflow. Com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_ Tech_paper. Pdf. ↩
Wang, B., & Komatsuzaki, A. (2021). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax. https://github.com/kingoflolz/mesh-transformer-jax ↩
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv Preprint arXiv:1907.11692. ↩
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2021). High-Resolution Image Synthesis with Latent Diffusion Models. ↩
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. ↩
Lester, B., Al-Rfou, R., & Constant, N. (2021). The Power of Scale for Parameter-Efficient Prompt Tuning. ↩
Khashabi, D., Lyu, S., Min, S., Qin, L., Richardson, K., Welleck, S., Hajishirzi, H., Khot, T., Sabharwal, A., Singh, S., & Choi, Y. (2021). Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts. ↩
Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C., & Schulman, J. (2021). Training Verifiers to Solve Math Word Problems. ↩
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. ↩
Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for Fact Extraction and VERification. ↩
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R. (2021). BBQ: A Hand-Built Bias Benchmark for Question Answering. ↩
Oppenlaender, J. (2022). A Taxonomy of Prompt Modifiers for Text-To-Image Generation. ↩
Wang, Z. J., Montoya, E., Munechika, D., Yang, H., Hoover, B., & Chau, D. H. (2022). DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. ↩
Parsons, G. (2022). The DALLE 2 Prompt Book. https://dallery.gallery/the-dalle-2-prompt-book/ ↩
Oppenlaender, J. (2022). Prompt Engineering for Text-Based Generative Art. ↩
Blake. (2022). With the right prompt, Stable Diffusion 2.0 can do hands. https://www.reddit.com/r/StableDiffusion/comments/z7salo/with_the_right_prompt_stable_diffusion_20_can_do/ ↩
Hao, Y., Chi, Z., Dong, L., & Wei, F. (2022). Optimizing Prompts for Text-to-Image Generation. ↩
Strobelt, H., Webson, A., Sanh, V., Hoover, B., Beyer, J., Pfister, H., & Rush, A. M. (2022). Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models. arXiv. https://doi.org/10.48550/ARXIV.2208.07852 ↩
Bach, S. H., Sanh, V., Yong, Z.-X., Webson, A., Raffel, C., Nayak, N. V., Sharma, A., Kim, T., Bari, M. S., Fevry, T., Alyafeai, Z., Dey, M., Santilli, A., Sun, Z., Ben-David, S., Xu, C., Chhablani, G., Wang, H., Fries, J. A., … Rush, A. M. (2022). PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. ↩
Wu, T., Jiang, E., Donsbach, A., Gray, J., Molina, A., Terry, M., & Cai, C. J. (2022). PromptChainer: Chaining Large Language Model Prompts through Visual Programming. ↩
Jiang, E., Olson, K., Toh, E., Molina, A., Donsbach, A., Terry, M., & Cai, C. J. (2022). PromptMaker: Prompt-Based Prototyping with Large Language Models. Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491101.3503564 ↩
Chase, H. (2022). LangChain (0.0.66) [Computer software]. https://github.com/hwchase17/langchain ↩
Tang, T., Junyi, L., Chen, Z., Hu, Y., Yu, Z., Dai, W., Dong, Z., Cheng, X., Wang, Y., Zhao, W., Nie, J., & Wen, J.-R. (2022). TextBox 2.0: A Text Generation Library with Pre-trained Language Models. ↩
Ding, N., Hu, S., Zhao, W., Chen, Y., Liu, Z., Zheng, H.-T., & Sun, M. (2021). OpenPrompt: An Open-source Framework for Prompt-learning. arXiv Preprint arXiv:2111.01998. ↩
Liu, J. (2022). GPT Index. https://doi.org/10.5281/zenodo.1234 ↩
Dohan, D., Xu, W., Lewkowycz, A., Austin, J., Bieber, D., Lopes, R. G., Wu, Y., Michalewski, H., Saurous, R. A., Sohl-dickstein, J., Murphy, K., & Sutton, C. (2022). Language Model Cascades. ↩
Karpas, E., Abend, O., Belinkov, Y., Lenz, B., Lieber, O., Ratner, N., Shoham, Y., Bata, H., Levine, Y., Leyton-Brown, K., Muhlgay, D., Rozen, N., Schwartz, E., Shachaf, G., Shalev-Shwartz, S., Shashua, A., & Tenenholtz, M. (2022). MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning. ↩
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ↩
Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., & Neubig, G. (2022). PAL: Program-aided Language Models. ↩
Liu, V., & Chilton, L. B. (2022). Design Guidelines for Prompt Engineering Text-to-Image Generative Models. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3491102.3501825 ↩
Crothers, E., Japkowicz, N., & Viktor, H. (2022). Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods. ↩
Branch, H. J., Cefalu, J. R., McHugh, J., Hujer, L., Bahl, A., del Castillo Iglesias, D., Heichman, R., & Darwishi, R. (2022). Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples. ↩
Willison, S. (2022). Prompt injection attacks against GPT-3. https://simonwillison.net/2022/Sep/12/prompt-injection/ ↩
Goodside, R. (2022). Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions. https://twitter.com/goodside/status/1569128808308957185 ↩
Chase, H. (2022). adversarial-prompts. https://github.com/hwchase17/adversarial-prompts ↩
Goodside, R. (2022). GPT-3 Prompt Injection Defenses. https://twitter.com/goodside/status/1578278974526222336?s=20&t=3UMZB7ntYhwAk3QLpKMAbw ↩
Mark, C. (2022). Talking to machines: prompt engineering & injection. https://artifact-research.com/artificial-intelligence/talking-to-machines-prompt-engineering-injection/ ↩
Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/ ↩
Stuart Armstrong, R. G. (2022). Using GPT-Eliezer against ChatGPT Jailbreaking. https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking ↩
Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques For Language Models. arXiv. https://doi.org/10.48550/ARXIV.2211.09527 ↩
Brundage, M. (2022). Lessons learned on Language Model Safety and misuse. In OpenAI. OpenAI. https://openai.com/blog/language-model-safety-and-misuse/ ↩
Wang, Y.-S., & Chang, Y. (2022). Toxicity Detection with Generative Prompt-based Inference. arXiv. https://doi.org/10.48550/ARXIV.2205.12390 ↩
Markov, T. (2022). New and improved content moderation tooling. In OpenAI. OpenAI. https://openai.com/blog/new-and-improved-content-moderation-tooling/ ↩
(2022). https://beta.openai.com/docs/guides/moderation ↩
(2022). https://openai.com/blog/chatgpt/ ↩
ok I saw a few people jailbreaking safeguards openai put on chatgpt so I had to give it a shot myself. (2022). https://twitter.com/alicemazzy/status/1598288519301976064 ↩
Bypass @OpenAI’s ChatGPT alignment efforts with this one weird trick. (2022). https://twitter.com/m1guelpf/status/1598203861294252033 ↩
ChatGPT jailbreaking itself. (2022). https://twitter.com/haus_cole/status/1598541468058390534 ↩
Using “pretend” on #ChatGPT can do some wild stuff. You can kind of get some insight on the future, alternative universe. (2022). https://twitter.com/NeroSoares/status/1608527467265904643 ↩
I kinda like this one even more! (2022). https://twitter.com/NickEMoran/status/1598101579626057728 ↩
Degrave, J. (2022). Building A Virtual Machine inside ChatGPT. Engraved. https://www.engraved.blog/building-a-virtual-machine-inside/ ↩
(2022). https://www.sudo.ws/ ↩
Perez, F., & Ribeiro, I. (2022). Ignore Previous Prompt: Attack Techniques For Language Models. arXiv. https://doi.org/10.48550/ARXIV.2211.09527 ↩
AIWithVibes. (2023). 7 ChatGPT JailBreaks and Content Filters Bypass that work. https://chatgpt-jailbreak.super.site/ ↩
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2022). Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Computing Surveys. https://doi.org/10.1145/3560815 ↩
PromptPapers. (2022). https://github.com/thunlp/PromptPapers ↩
Perez, E., Ringer, S., Lukošiūtė, K., Nguyen, K., Chen, E., Heiner, S., Pettit, C., Olsson, C., Kundu, S., Kadavath, S., Jones, A., Chen, A., Mann, B., Israel, B., Seethor, B., McKinnon, C., Olah, C., Yan, D., Amodei, D., … Kaplan, J. (2022). Discovering Language Model Behaviors with Model-Written Evaluations. ↩
Su, H., Kasai, J., Wu, C. H., Shi, W., Wang, T., Xin, J., Zhang, R., Ostendorf, M., Zettlemoyer, L., Smith, N. A., & Yu, T. (2022). Selective Annotation Makes Language Models Better Few-Shot Learners. ↩
Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., & Grave, E. (2022). Atlas: Few-shot Learning with Retrieval Augmented Language Models. ↩
Wang, B., Feng, C., Nair, A., Mao, M., Desai, J., Celikyilmaz, A., Li, H., Mehdad, Y., & Radev, D. (2022). STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension. ↩
Beurer-Kellner, L., Fischer, M., & Vechev, M. (2022). Prompting Is Programming: A Query Language For Large Language Models. ↩
Ratner, N., Levine, Y., Belinkov, Y., Ram, O., Abend, O., Karpas, E., Shashua, A., Leyton-Brown, K., & Shoham, Y. (2022). Parallel Context Windows Improve In-Context Learning of Large Language Models. ↩
Bursztyn, V. S., Demeter, D., Downey, D., & Birnbaum, L. (2022). Learning to Perform Complex Tasks through Compositional Fine-Tuning of Language Models. ↩
Wang, Y., Mishra, S., Alipoormolabashi, P., Kordi, Y., Mirzaei, A., Arunkumar, A., Ashok, A., Dhanasekaran, A. S., Naik, A., Stap, D., Pathak, E., Karamanolakis, G., Lai, H. G., Purohit, I., Mondal, I., Anderson, J., Kuznia, K., Doshi, K., Patel, M., … Khashabi, D. (2022). Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks. ↩
Gao, T., Fisch, A., & Chen, D. (2021). Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). https://doi.org/10.18653/v1/2021.acl-long.295 ↩
Liévin, V., Hother, C. E., & Winther, O. (2022). Can large language models reason about medical questions? ↩
Dang, H., Mecke, L., Lehmann, F., Goller, S., & Buschek, D. (2022). How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. ↩
Akyürek, A. F., Paik, S., Kocyigit, M. Y., Akbiyik, S., Runyun, Ş. L., & Wijaya, D. (2022). On Measuring Social Biases in Prompt-Based Multi-Task Learning. ↩
Jin, Y., Kadam, V., & Wanvarie, D. (2022). Plot Writing From Pre-Trained Language Models. ↩
Nadeem, M., Bethke, A., & Reddy, S. (2021). StereoSet: Measuring stereotypical bias in pretrained language models. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5356–5371. https://doi.org/10.18653/v1/2021.acl-long.416 ↩
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2022). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys. https://doi.org/10.1145/3571730 ↩
Liu, J., Shen, D., Zhang, Y., Dolan, B., Carin, L., & Chen, W. (2022). What Makes Good In-Context Examples for GPT-3? Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures. https://doi.org/10.18653/v1/2022.deelio-1.10 ↩
Yuan, A., Coenen, A., Reif, E., & Ippolito, D. (2022). Wordcraft: Story Writing With Large Language Models. 27th International Conference on Intelligent User Interfaces, 841–852. ↩
Fadnavis, S., Dhurandhar, A., Norel, R., Reinen, J. M., Agurto, C., Secchettin, E., Schweiger, V., Perini, G., & Cecchi, G. (2022). PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization. arXiv Preprint arXiv:2209.09814. ↩
Wang, Y., Kordi, Y., Mishra, S., Liu, A., Smith, N. A., Khashabi, D., & Hajishirzi, H. (2022). Self-Instruct: Aligning Language Model with Self Generated Instructions. ↩
Guo, J., Li, J., Li, D., Tiong, A. M. H., Li, B., Tao, D., & Hoi, S. C. H. (2022). From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models. ↩
Schick, T., & Schütze, H. (2020). Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference. ↩
Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A Watermark for Large Language Models. https://arxiv.org/abs/2301.10226 ↩

📚 Bibliography

Prompt Engineering Strategies​

Chain of Thought1 🔵​

Zero Shot Chain of Thought2 🔵​

Self Consistency3 🔵​

What Makes Good In-Context Examples for GPT-3?4 🔵​

Generated Knowledge5 🔵​

Rethinking the role of demonstrations6 🔵​

Scratchpads7​

Maieutic Prompting8​

STaR9​

Least to Most10​

Reliability​

The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning11 🔵​

Prompting GPT-3 to be reliable12​

Diverse Prompts13 🔵​

Calibrate Before Use: Improving Few-Shot Performance of Language Models14 🔵​

Enhanced Self Consistency15​

Bias and Toxicity in Zero-Shot CoT16 🔵​

Constitutional AI: Harmlessness from AI Feedback17 🔵​

Compositional Generalization - SCAN18​

Automated Prompt Engineering​

AutoPrompt19 🔵​

Automatic Prompt Engineer20​

Models​

Language Models​

GPT-321 🔵​

GPT-3 Instruct22 🔵​

PaLM23 🔵​

BLOOM24 🔵​

BLOOM+1 (more languages/ 0 shot improvements)25​

Jurassic 126 🔵​

GPT-J-6B27​

Roberta28​

Image Models​

Stable Diffusion29 🔵​

DALLE30 🔵​

Soft Prompting​

Soft Prompting31 🔵​

Interpretable Discretized Soft Prompts32 🔵​

Datasets​

GSM8K33 🔵​

HotPotQA34 🔵​

Fever35 🔵​

BBQ: A Hand-Built Bias Benchmark for Question Answering36 🔵​

Image Prompt Engineering​

Taxonomy of prompt modifiers37​

DiffusionDB38​

The DALLE 2 Prompt Book39 🔵​

Prompt Engineering for Text-Based Generative Art40 🔵​

With the right prompt, Stable Diffusion 2.0 can do hands.41 🔵​

Optimizing Prompts for Text-to-Image Generation42​

Prompt Engineering IDEs​

Prompt IDE43 🔵​

Prompt Source44 🔵​

PromptChainer45 🔵​

PromptMaker46 🔵​

Tooling​

LangChain47 🔵​

TextBox 2.0: A Text Generation Library with Pre-trained Language Models48 🔵​

OpenPrompt: An Open-source Framework for Prompt-learning49 🔵​

GPT Index50 🔵​

Applied Prompt Engineering​

Language Model Cascades51​

MRKL52 🔵​

ReAct53 🔵​

PAL: Program-aided Language Models54 🔵​

User Interface Design​

Design Guidelines for Prompt Engineering Text-to-Image Generative Models55​

Prompt Injection​

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods56 🔵​

Evaluating the Susceptibility of Pre-Trained Language Models via Handcrafted Adversarial Examples57 🔵​

Prompt injection attacks against GPT-358 🔵​

Exploiting GPT-3 prompts with malicious inputs that order the model to ignore its previous directions59 🔵​

adversarial-prompts60 🔵​

GPT-3 Prompt Injection Defenses61 🔵​

Talking to machines: prompt engineering & injection62​

Exploring Prompt Injection Attacks63 🔵​

Using GPT-Eliezer against ChatGPT Jailbreaking64 🔵​

Jailbreaking​