Select Language

AI社区

公开数据集

法国雷迪特讨论

法国雷迪特讨论

629.79M
134 浏览
0 喜欢
0 次下载
0 条讨论
Linguistics,Demographics,Languages Classification

数据结构 ? 629.79M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    LELú is a French dialog corpus that contains a rich collection of human-human, spontaneous written conversations, extracted from Reddit’s public dataset available through Google BigQuery. Our corpus is composed of 556,621 conversations with 1,583,083 utterances in total. The code to generate this dataset can be found in our [GitHub Repository][1]. The archive `spf.tar.gz` contains Reddit discussions in an XML file with the following format: The tag attributes can be described as follows: - `link_id`: ID of the parent Reddit post. - `subreddit_id`: ID of the subreddit. - `uid`: ID of the comment author. - `comment_id`: ID of the Reddit comment. - `parent_id`: ID of the parent Reddit comment. We have split up the conversation trees into short sequential conversations using a heuristic described in our paper, [LELú: A French Dialog Corpus from Reddit][2], however the full conversation trees can be reconstructed using the `comment_id` and `parent_id` attributes of the `
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 134浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享