Select Language

AI社区

公开数据集

分子生物学(启动子基因序列)数据集,可用于评估一种混合学习算法(KBANN)

分子生物学(启动子基因序列)数据集,可用于评估一种混合学习算法(KBANN)

5K
428 浏览
1 喜欢
5 次下载
0 条讨论
Life Classification

Data Set Information:This dataset has been developed to help evaluate a hybrid learning algorithm (KBANN) that uses exam......

数据结构 ? 5K

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Data Set Information:

    This dataset has been developed to help evaluate a "hybrid" learning algorithm ("KBANN") that uses examples to inductively refine preexisting knowledge.  Using a "leave-one-out" methodology, the following errors were produced by various ML algorithms.  (See Towell, Shavlik, & Noordewier, 1990, for details.)

    System -- Errors -- Comments
    ----------------------------------------------------------------
    KBANN -- 4/106 -- a hybrid ML system
    BP --  8/106 -- std backprop with one hidden layer
    O'Neill -- 12/106  -- ad hoc technique from the bio. lit.
    Near-Neigh -- 13/106 -- a nearest-neighbor algo (k=3)
    ID3 -- 19/106 -- Quinlan's decision-tree builder

    Type of domain: non-numeric, nominal (one of A, G, T, C)


    Note: DNA nucleotides can be grouped into a hierarchy, as shown below:

     X (any)
    /  
     (purine) R     Y (pyrimidine)
     /    /
    A   G T   C


    Here is that hierachy in a text-friendly format:

    X (any)
    . R (purine)
    . . A
    . . G
    . Y (pyrimidine)
    . . T
    . . C


    Attribute Information:

    1.   One of {+/-}, indicating the class ("+" = promoter).
    2.   The instance name (non-promoters named by position in the 1500-long nucleotide sequence provided by T. Record).
    3-59.   The remaining 57 fields are the sequence, starting at position -50 (p-50) and ending at position +7 (p7). Each of these fields is filled by one of {a, g, t, c}.


    Relevant Papers:

    Harley, C. and Reynolds, R. 1987.  "Analysis of E. Coli Promoter Sequences." Nucleic Acids Research, 15:2343-2361.
    [Web link]

    Towell, G., Shavlik, J. and Noordewier, M. 1990. "Refinement of Approximate Domain Theories by Knowledge-based Artificial Neural Networks." In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90).
    [Web link]



    Papers That Cite This Data Set1:


    Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin. Linear dimensionalityreduction using relevance weighted LDA. School of Electrical and Electronic Engineering Nanyang Technological University. 2005.  [View Context].

    Wei-Chun Kao and Kai-Min Chung and Lucas Assun and Chih-Jen Lin. Decomposition Methods for Linear Support Vector Machines. Neural Computation, 16. 2004.  [View Context].

    Aik Choon Tan and David Gilbert. An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics. APBC. 2003.  [View Context].

    Giorgio Valentini. Ensemble methods based on bias--variance analysis Theses Series DISI-TH-2003. Dipartimento di Informatica e Scienze dell'Informazione . 2003.  [View Context].

    Zoubin Ghahramani and Hyun-Chul Kim. Bayesian Classifier Combination. Gatsby Computational Neuroscience Unit University College London. 2003.  [View Context].

    Jinyan Li and Limsoon Wong. Using Rules to Analyse Bio-medical data: A Comparison between C4.5 and PCL. WAIM. 2003.  [View Context].

    Michael G. Madden. evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm. CoRR, csLG/0211003. 2002.  [View Context].

    Mukund Deshpande and George Karypis. evaluation of Techniques for Classifying Biological Sequences. PAKDD. 2002.  [View Context].

    Takashi Matsuda and Hiroshi Motoda and Tetsuya Yoshida and Takashi Washio. Mining Patterns from Structured Data by Beam-Wise Graph-based Induction. Discovery Science. 2002.  [View Context].

    Marina Meila and Michael I. Jordan. Learning with Mixtures of Trees. Journal of Machine Learning Research, 1. 2000.  [View Context].

    Jie Cheng and Russell Greiner. Comparing Bayesian Network Classifiers. UAI. 1999.  [View Context].

    Ismail Taha and Joydeep Ghosh. Symbolic Interpretation of Artificial Neural Networks. IEEE Trans. Knowl. Data Eng, 11. 1999.  [View Context].

    Cesar Guerra-Salcedo and L. Darrell Whitley. Genetic Approach to Feature Selection for Ensemble Creation. GECCO. 1999.  [View Context].

    Mark A. Hall and Lloyd A. Smith. Feature Selection for Machine Learning: Comparing a Correlation-based Filter Approach to the Wrapper. FLAIRS Conference. 1999.  [View Context].

    Mark A. Hall. Department of Computer Science Hamilton, NewZealand Correlation-based Feature Selection for Machine Learning. Doctor of Philosophy at The University of Waikato. 1999.  [View Context].

    Creators:

    1.  promoter instances: C. Harley (CHARLEY '@' McMaster.CA) and R. Reynolds

    2. non-promoter instances and domain theory: M. Noordewier
    -- (non-promoters derived from work of lab of Prof. Tom Record, University of Wisconsin Biochemistry Department)

    Donor:

    M. Noordewier and J. Shavlik, {noordewi,shavlik}@cs.wisc.edu

    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:12 去赚积分?
    • 428浏览
    • 5下载
    • 1点赞
    • 收藏
    • 分享