TinyLlama-1.1B-intermediate-step-1431k-3T#

README(From Huggingface)#

language:

en license: apache-2.0 datasets:
cerebras/SlimPajama-627B
bigcode/starcoderdata model-index:
name: TinyLlama-1.1B-intermediate-step-1431k-3T results:
- task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics:
  - type: acc_norm value: 33.87 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics:
  - type: acc_norm value: 60.31 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics:
  - type: acc value: 26.04 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics:
  - type: mc2 value: 37.32 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics:
  - type: acc value: 59.51 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics:
  - type: acc value: 1.44 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T name: Open LLM Leaderboard

TinyLlama-1.1B#

Eval#

Model	Pretrain Tokens	HellaSwag	Obqa	WinoGrande	ARC_c	ARC_e	boolq	piqa	avg
Pythia-1.0B	300B	47.16	31.40	53.43	27.05	48.99	60.83	69.21	48.30
TinyLlama-1.1B-intermediate-step-50K-104b	103B	43.50	29.80	53.28	24.32	44.91	59.66	67.30	46.11
TinyLlama-1.1B-intermediate-step-240k-503b	503B	49.56	31.40	55.80	26.54	48.32	56.91	69.42	48.28
TinyLlama-1.1B-intermediate-step-480k-1007B	1007B	52.54	33.40	55.96	27.82	52.36	59.54	69.91	50.22
TinyLlama-1.1B-intermediate-step-715k-1.5T	1.5T	53.68	35.20	58.33	29.18	51.89	59.08	71.65	51.29
TinyLlama-1.1B-intermediate-step-955k-2T	2T	54.63	33.40	56.83	28.07	54.67	63.21	70.67	51.64
TinyLlama-1.1B-intermediate-step-1195k-2.5T	2.5T	58.96	34.40	58.72	31.91	56.78	63.21	73.07	53.86
TinyLlama-1.1B-intermediate-step-1431k-3T	3T	59.20	36.00	59.12	30.12	55.25	57.83	73.29	52.99
# Open LLM Leaderboard Evaluation Results
Detailed results can be found here

Model

Pretrain Tokens

HellaSwag

Obqa

WinoGrande

ARC_c

ARC_e

boolq

piqa

avg

Pythia-1.0B

300B

47.16

31.40

53.43

27.05

48.99

60.83

69.21

48.30

TinyLlama-1.1B-intermediate-step-50K-104b

103B

43.50

29.80

53.28

24.32

44.91

59.66

67.30

46.11

TinyLlama-1.1B-intermediate-step-240k-503b

503B

49.56

31.40

55.80

26.54

48.32

56.91

69.42

48.28

TinyLlama-1.1B-intermediate-step-480k-1007B

1007B

52.54

33.40

55.96

27.82

52.36

59.54

69.91

50.22

TinyLlama-1.1B-intermediate-step-715k-1.5T

1.5T

53.68

35.20

58.33

29.18

51.89

59.08

71.65

51.29

TinyLlama-1.1B-intermediate-step-955k-2T

54.63

33.40

56.83

28.07

54.67

63.21

70.67

51.64

TinyLlama-1.1B-intermediate-step-1195k-2.5T

2.5T

58.96

34.40

58.72

31.91

56.78

63.21

73.07

53.86

TinyLlama-1.1B-intermediate-step-1431k-3T

59.20

36.00

59.12

30.12

55.25

57.83

73.29

52.99

# Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	36.42
AI2 Reasoning Challenge (25-Shot)	33.87
HellaSwag (10-Shot)	60.31
MMLU (5-Shot)	26.04
TruthfulQA (0-shot)	37.32
Winogrande (5-shot)	59.51
GSM8k (5-shot)	1.44

Metric

Value

Avg.

36.42

AI2 Reasoning Challenge (25-Shot)

33.87

HellaSwag (10-Shot)

60.31

MMLU (5-Shot)

26.04

TruthfulQA (0-shot)

37.32

Winogrande (5-shot)

59.51

GSM8k (5-shot)

1.44

TinyLlama-1.1B-intermediate-step-1431k-3T

目录

TinyLlama-1.1B-intermediate-step-1431k-3T#

README(From Huggingface)#

TinyLlama-1.1B#

This Collection#

Eval#

Model Files#