Large Language Models: Test Your Knowledge Return to pathway How many 2-grams (bigrams) are present in the following phrase: they visited New York last week 3 4 5 6 Which attributes of large language models help them make better predictions than other types of language models? (Choose all that apply) Choose as many answers as you see fit. LLMs contain many more parameters. LLMs capture more context. LLMs don't need to be trained on as much data. LLMs never hallucinate. True or False: A full Transformer consists of both an encoder and a decoder. True False An LLM is trained on a large corpus of data that includes the following example: My cousin's new fashion line is so cool! What mechanism helps the LLM learn that in this sentence, "cool" most likely means "great" and does not refer to the temperature of the clothing? Prompt engineering Decoder Distillation Self-attention Which of the following statements about fine-tuning vs. distilling is true? Fine-tuning increases the number of parameters in the model, whereas distillation decreases the number of parameters in the model. Fine-tuning generally increases the quality of the model's predictions, whereas distillation generally decreases the quality of the model's predictions. Fine-tuning is performed on text models, whereas distillation is performed on image models. None of the above are true. Submit answers error_outline An error occurred when grading the quiz. Please try again.