Select Language

AI社区

公开数据集

USPTO算法挑战赛,由美国宇航局哈佛竞赛实验室和TopCoder问题:Pat数据集运行

USPTO算法挑战赛,由美国宇航局哈佛竞赛实验室和TopCoder问题:Pat数据集运行

135.92M
533 浏览
0 喜欢
0 次下载
0 条讨论
N/A Classification

-- Creator: TopCoder, Inc.-- Released under Apache License, Version 2.0http://www.apache.org/licenses/LICENSE-2.0.htmlDa......

数据结构 ? 135.92M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    -- Creator: TopCoder, Inc.
    -- Released under Apache License, Version 2.0
    http://www.apache.org/licenses/LICENSE-2.0.html


    Data Set Information:

    USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder
      Problem: Patent Labeling


    Attribute Information:

    Dataset Information:
       -- This folder contains 4 groups of USPTO patent images including ground truth information.
    -- The 4 groups are 'train1', 'train2', 'test', 'evaluation'.
    -- 'train1', 'test', 'evaluation' contains data in the original 'USPTO Algorithm Challenge' for training, testing and final evaluation, respectively.
    -- 'train2' contains additional data which was used in the 'USPTO Algorithm Followup Challenge.'  
      Notice that 'train2' includes some cover page images of patent document which is not included in other groups.

       -- In each group, there are two folders contain original images and corresponding ground truth informations.
    -- The original images are in 'jpeg' format.
    -- There are two types of ground truth: figure label ground truth and part label ground truth.
    -- The ground truth files are text files with '.ans' extension.

       -- The structure of the ground truth files are described as below:
    -- The first line is one number indicating how many instances exist in corresponding image
    -- The following lines are polygon coordinates and corresponding label contents, each line corresponds to a figure label or part label, in the form 'N x1 y1 x2 y2 a€| xN yN x1 y1 content'.
    -- In each of those lines, the first number N indicates how many polygon vertices are recorded in current instance.
    -- The following numbers are x, y coordinates of those vertices.
    -- The final word in each line is the content of figure label or part label.


    Relevant Papers:

    Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Minetti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani, 'Detecting Figures and Part Labels in Patents: A
    Competition-based Development of Image Processing Algorithms', working paper, [Web link].



    Citation Request:

    Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Minetti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani, 'Detecting Figures and Part Labels in Patents: A Competition-based Development of Image Processing Algorithms,' International Journal on document Analysis and Recognition, 1-18, DOI 10.1007/s10032-016-0260-8

    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:8 去赚积分?
    • 533浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享