Select Language

AI社区

公开数据集

虹膜数据集逻辑回归

虹膜数据集逻辑回归

152 浏览
0 喜欢
0 次下载
0 条讨论
Business,Earth and Nature,Computer Science Classification

数据结构 ? 0M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    **Visualization of Iris Species Dataset:** - The data has four features. - Each subplot considers two features. - From the figure it can be observed that the data points for species Iris-setosa are clubbed together and for the other two species they sort of overlap. **Classification using Logistic Regression:** - There are 50 samples for each of the species. The data for each species is split into three sets - training, validation and test. - The training data is prepared separately for the three species. For instance, if the species is Iris-Setosa, then the corresponding outputs are set to 1 and for the other two species they are set to 0. - The training data sets are modeled separately. Three sets of model parameters(theta) are obtained. A sigmoid function is used to predict the output. - Gradient descent method is used to converge on 'theta' using a cost function. **Choosing best model:** - Polynomial features are included to train the model better. Including more polynomial features will better fit the training set, but it may not give good results on validation set. The cost for training data decreases as more polynomial features are included. - So, to know which one is the best fit, first training data set is used to find the model parameters which is then used on the validation set. Whichever gives the least cost on validation set is chosen as the better fit to the data. - A regularization term is included to keep a check overfitting of the data as more polynomial features are added. *Observations:* - For Iris-Setosa, inclusion of polynomial features did not do well on the cross validation set. - For Iris-Versicolor, it seems more polynomial features needs to be included to be more conclusive. However, polynomial features up to the third degree was being used already, hence the idea of adding more features was dropped. **Bias-Variance trade off:** - A check is done to see if the model will perform better if more features are included. The number of samples is increased in steps, the corresponding model parameters and cost are calculated. The model parameters obtained can then used to get the cost using validation set. - So if the costs for both sets converge, it is an indication that fit is good. Training error: - The heuristic function should ideally be 1 for positive outputs and 0 for negative. - It is acceptable if the heuristic function is >=0.5 for positive outputs and < 0.5 for negative outputs. - The training error is calculated for all the sets. *Observations:* It performs very well for Iris-Setosa and Iris-Virginica. Except for validation set for Iris-Versicolor, rest have been modeled pretty well. **Accuracy:** *The highest probability (from heuristic function) obtained is predicted to be the species it belongs to. The accuracy came out to be 93.33% for validation data. And surprisingly 100% for test data.* Improvements that can be done: A more sophisticated algorithm for finding the model parameters can be used instead of gradient descent. The training data, validation and test data can be chosen randomly to get the best performance.
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 152浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享