
37.15M
871
0
运行活动
Business,Sports,Weather and Climate,Running
Classification
前往PC端下载数据
Context I have desk job. A very interesting desk job but nevertheless a desk job. Therefore I started doing sports a while ago and now I can't stop anymore. Like every other Geek I need a gadet for every hobby I have and in this case it was a GPS-Sport-Smartwatch: The vivoactive and the vivoactive HR by Garmin. They also offer a data analysis center and as a data scientist I of course had to export the data and employ some analysis that go beyond bar charts. This dataset is basically the bulk-downloaded-not-cleaned-dataset from the mentioned data center. For those interested: there is a nice github project for bulk downloading from garmin connect: https://github.com/kjkjava/garmin-connect-export. The weather data has to be included by hand. Content You can find 155 samples, each one representing one sport activity, mainly from the black forest. There are a lot of useless colmuns, which either contain no data or the same value for every sample. You will have to identify these columns in any case and remove them. The ZIP includes the GPX tracks of all activities and can be used as well. The two devices use different GPS sensors and are from my feeling of different precision and reliability (untested). Additionally, the HR device had a lot connectivity problems since summer 2017. The devices lost the signal during a numerous amount of runs and therefore the distance value is not always correct. Usually the start and end timestamps are correct (except for one case) and the GPX files might help to figure out which track I was using. With a single exception all start and ending points are the same in all tracks. This means I started and ended recording at the same cross-roads, not exactly the same position. If you decide to open the GPX files in a GIS, you should be able to repair the affected datasets. I did this using QGIS. You will find one instance with two activities on a single day. This is actually the same activity, where I went to the peak of Rosskopf in the Black Forest. Because my brain was undersupplied after runnning up there I ended the activity instead of pausing it, that's why I ended up with two activities, that have to be merged together. Soft data For the dataset, there is also some soft data which might be helpful: - I commute 130 km to work since June 2016, usually on Mon, Tue, Wed. Possibly, I spent less time on sports since then on these days. - I wrote my Master thesis from Sep / 2015 until Mar 2016. Maybe I was doing more sports during the thesis (except for the last two weeks?) - Since I commute, I would say I do sports less frequently, but on longer distances and over higher elevation gains - I bought new shoes in August 2016, which are WAY more comfortable Acknowledgements The bulk downloading script for garmin connect was really helpful: https://github.com/kjkjava/garmin-connect-export. Without this tool I most likely would not have created this dataset. Inspiration I am personally very interested if my running performance is dependent on specific weather conditions and eventually predictable. Another interesting thing would be to see how other people rate the performance based on the given data. I use this dataset also during my teaching in a Python and a statistics class at University. I decided to upload the data and some of my teaching notebooks to Kaggle (in the near future) in oder to give my students access to external comments on my kernels, givem them the opportunity to upload their solutions to a bigger community and eventually scan your kernels on the data.
版权信息
- 数据大小37.15M
- 发布者Mirko M?licke
- 引用地址
- 许可协议CC BY-NC-SA 4.0