Large Language Models: Test Your Knowledge

How many 2-grams (bigrams) are present in the following phrase:

they visited New York last week

3

4

5

6

Which attributes of large language models help them make better predictions than other types of language models? (Choose all that apply)

Choose as many answers as you see fit.

LLMs contain many more parameters.

LLMs capture more context.

LLMs don't need to be trained on as much data.

LLMs never hallucinate.

True or False: A full Transformer consists of both an encoder and a decoder.

True

False

An LLM is trained on a large corpus of data that includes the following example:

My cousin's new fashion line is so cool!

What mechanism helps the LLM learn that in this sentence, "cool" most likely means "great" and does not refer to the temperature of the clothing?

Prompt engineering

Decoder

Distillation

Self-attention

Which of the following statements about fine-tuning vs. distilling is true?

Fine-tuning increases the number of parameters in the model, whereas distillation decreases the number of parameters in the model.

Fine-tuning generally increases the quality of the model's predictions, whereas distillation generally decreases the quality of the model's predictions.

Fine-tuning is performed on text models, whereas distillation is performed on image models.

None of the above are true.