happy-llm
对数据集进行tikenize
1 2 3 4 5 6
| def tokenize_function(examples): output = tokenizer([item for item in examples["text"]]) return output
|
但是实际运行的时候会出现tokenizer not define,因此采用显示的引入
1 2 3 4
| def tokenize_function(examples,tokenizer=tokenizer): output = tokenizer([item for item in examples["text"]]) return output
|