Over the past five decades, technological advancements have driven significant transformations in data science. While the rise of high-speed computing has played a key role, the development of specialized data science tools and packages has been even more crucial in shaping the field. Today, data practitioners increasingly rely on these tools to streamline analysis and automate processes. However, the widespread use of such tools can sometimes lead to over-automation, where important foundational principles and assumptions are overlooked. This can result in the application of incorrect statistical methods, potentially compromising the validity of the analysis.
This book addresses these challenges by showcasing the most widely used data science methods through practical examples, specifically within the domain of credit risk management. Using both R and Python, it walks readers through real-world scenarios, offering a practical guide to applying these methods and a thorough understanding of the theoretical underpinnings. The goal is to provide readers with a clear, step-by-step approach to analyzing data, understanding the outcomes generated by pre-existing software, and making informed decisions based on those results.
The book is designed primarily for practitioners in credit risk management, though it is equally relevant to anyone interested in applied data science. It assumes that readers have a foundational knowledge of data science, particularly statistics, econometrics, finance, and banking. Familiarity with Internal Rating Based (IRB) models and International Financial Reporting Standards 9 (IFRS9) is also recommended to understand better the concepts discussed.
A supporting GitHub repository, accessible here, complements the book by expanding its scope. The repository will be regularly updated with documents covering various modeling topics.