Large model Small model = Chinchilla Optimal
Pre-training Loss
Non-emergent Downstream Capability
Emergent Downstream Capability
Generalization — Type I
Generalization — Type II
Generalization — Type III