Select Language

AI社区

公开数据集

红酒的品质,用于回归或分类建模的简单而干净的练习数据集

红酒的品质,用于回归或分类建模的简单而干净的练习数据集

0.1M
389 浏览
1 喜欢
2 次下载
0 条讨论
Beginner,Earth and Nature,Education,Alcohol Classification

ContextThe two datasets are related to red and white variants of the Portuguese Vinho Verde wine. For more details, cons......

数据结构 ? 0.1M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Context

    The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

    These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).


    This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality , I just shared it to kaggle for convenience. (If I am mistaken and the public license type disallowed me from doing so, I will take this down if requested.)

    Content

    For more information, read [Cortez et al., 2009].

    Input variables (based on physicochemical tests):

    1 - fixed acidity

    2 - volatile acidity

    3 - citric acid

    4 - residual sugar

    5 - chlorides

    6 - free sulfur dioxide

    7 - total sulfur dioxide

    8 - density

    9 - pH

    10 - sulphates

    11 - alcohol

    Output variable (based on sensory data):

    12 - quality (score between 0 and 10)

    Tips

    What might be an interesting thing to do, is aside from using regression modelling, is to set an arbitrary cutoff for your dependent variable (wine quality) at e.g. 7 or higher getting classified as 'good/1' and the remainder as 'not good/0'.
    This allows you to practice with hyper parameter tuning on e.g. decision tree algorithms looking at the ROC curve and the AUC value.
    Without doing any kind of feature engineering or overfitting you should be able to get an AUC of .88 (without even using random forest algorithm)

    KNIME is a great tool (GUI) that can be used for this.

    1 - File Reader (for csv) to linear correlation node and to interactive histogram for basic EDA.

    2- File Reader to 'Rule Engine Node' to turn the 10 point scale to dichtome variable (good wine and rest), the code to put in the rule engine is something like this:

    • $quality$ > 6.5 => "good"

    • TRUE => "bad"

      3- Rule Engine Node output to input of Column Filter node to filter out your original 10point feature (this prevent leaking)

      4- Column Filter Node output to input of Partitioning Node (your standard train/tes split, e.g. 75%/25%, choose 'random' or 'stratified')

      5- Partitioning Node train data split output to input of Train data split to input Decision Tree Learner node and

      6- Partitioning Node test data split output to input Decision Tree predictor Node

      7- Decision Tree learner Node output to input Decision Tree Node input

      8- Decision Tree output to input ROC Node.. (here you can evaluate your model base on AUC value)

    Inspiration

    Use machine learning to determine which physiochemical properties make a wine 'good'!

    Acknowledgements

    This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality , I just shared it to kaggle for convenience. (I am mistaken and the public license type disallowed me from doing so, I will take this down at first request. I am not the owner of this dataset.

    Please include this citation if you plan to use this database: P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

    Relevant publication

    P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties.
    In Decision Support Systems, Elsevier, 47(4):547-553, 2009.


    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:6 去赚积分?
    • 389浏览
    • 2下载
    • 1点赞
    • 收藏
    • 分享