I am trying to use AllenNLP models to parse a file to create a CCG dataset, because as a student I can't afford the CCGBank dataset, However I have to, cuz I need a dataset to help me to train a model to resolve syntactic ambiguities, parsing the sentence to ccg format is an inevitable step. I really need the model like predictor = Predictor.from_path(".02.10.tar.gz") or if you have better option , I am willing to have a try! It's my code below
import pandas as pd
from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
# 读取原始 CSV 文件
input_path = "validation.csv" # 替换为你的本地路径
df = pd.read_csv(input_path)
sentences = df["sentence"].tolist()
# 加载 AllenNLP 的预训练 CCG Supertagger 模型
predictor = Predictor.from_path(".02.10.tar.gz")
# 定义预测函数:输入句子,输出 “词/范畴” 序列
def get_ccg_tags(sentence):
output = predictor.predict(sentence=sentence)
tokens = output["words"]
tags = output["ccg_tags"]
tagged = [f"{w}/{t}" for w, t in zip(tokens, tags)]
return " ".join(tagged)
# 批量处理每个句子,添加 ccg_tags 列
df["ccg_tags"] = df["sentence"].apply(get_ccg_tags)
# 保存结果到新文件
output_path = "validation_with_allennlp_ccg.csv"
df.to_csv(output_path, index=False)
print(f" AllenNLP CCG :{output_path}")
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744311917a4567989.html
评论列表(0条)