Select Language

AI社区

公开数据集

常规结肠镜检查数据集中的胃肠道病变

常规结肠镜检查数据集中的胃肠道病变

177K
641 浏览
0 喜欢
5 次下载
0 条讨论
Medical Classification

Pablo Mesejo, pablomesejo '@' gmail.com, Inria, FranceDaniel Pizarro, dani.pizarro '@' gmail.com, Univer......

数据结构 ? 177K

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Pablo Mesejo, pablomesejo '@' gmail.com, Inria, France
    Daniel Pizarro, dani.pizarro '@' gmail.com, University of Alcal??, Spain


    Data Set Information:

    This dataset contains the features extracted from a database of colonoscopic videos showing gastrointestinal lesions. It also contains the ground truth collected from both expert image inspection and histology (in an xlsx file). There are features vectors for 76 lesions, and there are 3 types of lesion: hyperplasic, adenoma and serrated adenoma. It is possible to consider this classification problem as a binary one by combining adenoma and serrated adenoma in the same class. According to this, hyperplasic lesions would belong to the class 'benign' while the other two types of gastrointestinal lesions would go to the 'malignant' class.

    The first line/row of the dataset corresponds to the lesion name (text label). Every lesion appears twice because it has been recorded using two types of lights: white light (WL) and narrow band imaging (NBI). The second line/row represents the type of lesion (3 for adenoma, 1 for hyperplasic, and 2 for serrated). And, finally, the third line/row is the type of light used (1 for WL and 2 for NBI). All other rows are the raw features (without any kind of preprocessing):
    422 2D TEXTURAL FEATURES
    - First 166 features: AHT: Autocorrelation Homogeneous Texture (Invariant Gabor Texture)
    - Next 256: Rotational Invariant LBP
    76 2D COLOR FEATURES
    - 16 Color Naming
    - 13 Discriminative Color
    - 7 Hue
    - 7 Opponent
    - 33 color gray-level co-occurrence matrix
    200 3D SHAPE FEATURES
    - 100 shapeDNA
    - 100 KPCA

    The main objective of this dataset is to study how good computers can be at diagnosing gastrointestinal lesions from regular colonoscopic videos. In order to compare the performance of machine learning methods with the one offered by humans, we provide the file ground_truth.xlsx that includes the ground truth after histopathology and the opinion of 7 clinicians (4 experts and 3 beginners). An automatic tissue classification approach could save clinician's time by avoiding chromoendoscopy, a time-consuming staining procedure using indigo carmine, as well as could help to assess the severity of individual lesions in patients with many polyps, so that the gastroenterologist would directly focus on those requiring polypectomy. A possible way of proceeding with the classification is to concatenate the information from the two types of light for each lesion, i.e. create a single vector of 1396 elements per lesion.

    The technical goal is to maximize accuracy while minimizing false positives (lesions that do not need resection but that are classified as if they do) and false negatives (lesions that do need resection but that are classified as if they do not need it). In particular, we are specially interested on maximizing accuracy while reducing false negatives, i.e. minimizing the number of adenoma and serrated adenoma that are classified as hyperplasic. The opposite case is not that serious: the resection of a hyperplasic polyp considering it as an adenoma or serrated adenoma. Another interesting experiment would consist on compare the performance of the best machine learning method we can get with the one provided by human operators (experts and beginners).

    The best results obtained so far, in the binary case, using leave-one-out and Random Forest with 1000 trees (using color+texture+3D with NBI), corresponded to an accuracy of ~89,5%, sensitivity ~94,5% and specificity ~76% (considering as positive condition the resection). This is the best confusion matrix found so far:
                        Classified as
    Resection No-Resection
    Resection       52 3
    No-Resection 5 16

    The best results obtained in the multi-class case, using leave-one-out and Random Subspace of SVMs (color+texture+3D using WL), were as follows:
                      Classified as
    Hyp.       Ser.     Ade.
    Hyp. 18 0 3
    Ser. 2 9 4
    Ade. 7 4 29

    Overall Accuracy : 0.7368
    Acc Hyp.   0.84
    Acc Ser. 0.87
    Acc Ade. 0.76
    Sen Hyp. 0.86
    Sen Ser. 0.6
    Sen Ade. 0.725
    Spe Hyp. 0.84
    Spe Ser. 0.93
    Spe Ade. 0.81


    Attribute Information:

    First 422 attributes: 2D TEXTURAL FEATURES
    - 166 features: AHT: Autocorrelation Homogeneous Texture (Invariant Gabor Texture)
    - Next 256: Rotational Invariant LBP

    Next 76 attributes: 2D COLOR FEATURES
    - 16 Color Naming
    - 13 Discriminative Color
    - 7 Hue
    - 7 Opponent
    - 33 color gray-level co-occurrence matrix

    Last 200 attributes: 3D SHAPE FEATURES
    - 100 shapeDNA
    - 100 KPCA


    Relevant Papers:

    This dataset was gathered and released as part of the research published in P. Mesejo et al., 'Computer-Aided Classification of Gastrointestinal Lesions in Regular colonoscopy,' in IEEE Transactions on Medical Imaging, vol. 35, no. 9, pp. 2051-2063, Sept. 2016. ([Web link])



    Citation Request:

    If you use this dataset, please, cite the following research paper: P. Mesejo et al., 'Computer-Aided Classification of Gastrointestinal Lesions in Regular colonoscopy,' in IEEE Transactions on Medical Imaging, vol. 35, no. 9, pp. 2051-2063, Sept. 2016.

    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:10 去赚积分?
    • 641浏览
    • 5下载
    • 0点赞
    • 收藏
    • 分享