公开数据集
数据结构 ? 135.92M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
-- Creator: TopCoder, Inc.
-- Released under Apache License, Version 2.0
http://www.apache.org/licenses/LICENSE-2.0.html
Data Set Information:
USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder
Problem: Patent Labeling
Attribute Information:
Dataset Information:
-- This folder contains 4 groups of USPTO patent images including ground truth information.
-- The 4 groups are 'train1', 'train2', 'test', 'evaluation'.
-- 'train1', 'test', 'evaluation' contains data in the original 'USPTO Algorithm Challenge' for training, testing and final evaluation, respectively.
-- 'train2' contains additional data which was used in the 'USPTO Algorithm Followup Challenge.'
Notice that 'train2' includes some cover page images of patent document which is not included in other groups.
-- In each group, there are two folders contain original images and corresponding ground truth informations.
-- The original images are in 'jpeg' format.
-- There are two types of ground truth: figure label ground truth and part label ground truth.
-- The ground truth files are text files with '.ans' extension.
-- The structure of the ground truth files are described as below:
-- The first line is one number indicating how many instances exist in corresponding image
-- The following lines are polygon coordinates and corresponding label contents, each line corresponds to a figure label or part label, in the form 'N x1 y1 x2 y2 a€| xN yN x1 y1 content'.
-- In each of those lines, the first number N indicates how many polygon vertices are recorded in current instance.
-- The following numbers are x, y coordinates of those vertices.
-- The final word in each line is the content of figure label or part label.
Relevant Papers:
Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Minetti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani, 'Detecting Figures and Part Labels in Patents: A
Competition-based Development of Image Processing Algorithms', working paper, [Web link].
Citation Request:
Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Minetti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani, 'Detecting Figures and Part Labels in Patents: A Competition-based Development of Image Processing Algorithms,' International Journal on document Analysis and Recognition, 1-18, DOI 10.1007/s10032-016-0260-8
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
- 分享你的想法
全部内容
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。