Religion and Belief Systems Classification

    Context [Hadith][1] (an Arabic word) refers to the words and actions of Prophet Mohammed. Those collections of Hadiths have been transmitted through generations of Muslim scholars until they have been collected and written in big collections. The chain of narrators is a main area of study in Islamic scholarship because a single hadith may have multiple chains of narrators (that may or may not overlap). However, it has mainly remained a qualitative field where scholars of Hadith try to determine the authenticity of Hadiths by investigating and validating the chains of narrators who transmitted a given hadith. Further, the raw texts of Hadiths have not yet been used in qualitative approaches in data analysis. I hope this dataset makes it easier to further progress in this direction. Content Hadith dataset contains the set of all [Hadiths][1] from the six primary hadith collections. The data is scraped from Note that the chain_indx column refers to scholar_indx column in [Hadith Narrators Dataset][2]. Notably, this is a very draft version of the dataset as it is not validated. For example, the number of Hadiths in this dataset is much higher than the real number of Hadiths contained in those sources. This may be due to a bug in my script. Further actions will be taken to further clean up this dataset. However, as it is right now, it can be used to prototype certain analyses in those areas. *Disclaimer: I scraped the data and I hold no responsibility for its accuracy or validation. Use at your own risk!* [1]: [2]: Acknowledgements This dataset wouldn't have been possible without the great people who have already transcribed this dataset from primary sources and bibliographies to & database. I only scraped this database with a Python script plus very minimal cleanup.



