Leanpub Header

Skip to main content

Regression Models for Data Science in R

A companion book for the Coursera Regression Models class

This book gives a brief, but rigorous, treatment of regression models intended for practicing Data Scientists.

This book is available in multiple packages!

Pick Your Package
PDF
EPUB
WEB
61,357
Readers
About

About

About the Book

The ideal reader for this book will be quantitatively literate and has a basic understanding of statistical concepts and R programming.  The student should have a basic understanding of statistical inference such as contained in https://leanpub.com/LittleInferenceBook/. The book gives a rigorous treatment of the elementary concepts of regression models from a practical perspective. After reading the book and watching the associated videos, students will be able to perform multivariable regression models and understand their interpretations.

Share this book

Categories

License

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

Packages

Pick Your Package

All packages include the ebook in the following formats: PDF, EPUB, and Web

The Book+Videos+Code

Minimum price

Suggested price$14.99

This is the book, plus the videos, plus the video solutions. All of the videos are available on YouTube as well. The book plus lecture note github repos are included as well.

$14.99

  • Video lectures
    These are the video lectures associated with the book. They are also available on YouTube and Coursera.
  • Lecture notes and code
    This is the github repo zipped up as one entity. You can get this off of github if you'd like. It also includes the book repo.

The Book

Minimum price

Suggested price$14.99

This is just the boook.

Free!

    This book is also available in the following packages:

    • The Book+Code+Lecture Videos+Solution Videos

      This is the book, the github repos (lecture notes and book) plus the video lectures plus the video HW solutions. All are available elsewhere for free (github and YouTube).

      • Video lectures
        These are the video lectures associated with the book. They are also available on YouTube and Coursera.
      • Lecture notes and code
        This is the github repo zipped up as one entity. You can get this off of github if you'd like. It also includes the book repo.
      • Video HW solutions.
        This is the video homework solutions. These are also all available on YouTube.
      Minimum price
      $19.99
      Suggested price
      $24.99

    Author

    About the Author

    Brian Caffo

    Brian Caffo, PhD is a professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. Along with Roger Peng and Jeff Leek, Dr. Caffo created the Data Science Specialization on Coursera. Dr. Caffo is leading  expert in statistics and biostatistics and is the recipient of the PECASE award, the highest honor given by the US Government for early career scientists and engineers.

    Leanpub Podcast

    Episode 21

    An Interview with Brian Caffo

    Contents

    Table of Contents

    Preface

    1. About this book
    2. About the cover

    Introduction

    1. Before beginning
    2. Regression models
    3. Motivating examples
    4. Summary notes: questions for this book
    5. Exploratory analysis of Galton’s Data
    6. The math (not required)
    7. Comparing children’s heights and their parent’s heights
    8. Regression through the origin
    9. Exercises

    Notation

    1. Some basic definitions
    2. Notation for data
    3. The empirical mean
    4. The empirical standard deviation and variance
    5. Normalization
    6. The empirical covariance
    7. Some facts about correlation
    8. Exercises

    Ordinary least squares

    1. General least squares for linear equations
    2. Revisiting Galton’s data
    3. Showing the OLS result
    4. Exercises

    Regression to the mean

    1. A historically famous idea, regression to the mean
    2. Regression to the mean
    3. Exercises

    Statistical linear regression models

    1. Basic regression model with additive Gaussian errors.
    2. Interpreting regression coefficients, the intercept
    3. Interpreting regression coefficients, the slope
    4. Using regression for prediction
    5. Example
    6. Exercises

    Residuals

    1. Residual variation
    2. Properties of the residuals
    3. Example
    4. Estimating residual variation
    5. Summarizing variation
    6. R squared
    7. Exercises

    Regression inference

    1. Reminder of the model
    2. Review
    3. Results for the regression parameters
    4. Example diamond data set
    5. Getting a confidence interval
    6. Prediction of outcomes
    7. Summary notes
    8. Exercises

    Multivariable regression analysis

    1. The linear model
    2. Estimation
    3. Example with two variables, simple linear regression
    4. The general case
    5. Simulation demonstrations
    6. Interpretation of the coefficients
    7. Fitted values, residuals and residual variation
    8. Summary notes on linear models
    9. Exercises

    Multivariable examples and tricks

    1. Data set for discussion
    2. Simulation study
    3. Back to this data set
    4. What if we include a completely unnecessary variable?
    5. Dummy variables are smart
    6. More than two levels
    7. Insect Sprays
    8. Further analysis of the swiss dataset
    9. Exercises

    Adjustment

    1. Experiment 1
    2. Experiment 2
    3. Experiment 3
    4. Experiment 4
    5. Experiment 5
    6. Some final thoughts
    7. Exercises

    Residuals, variation, diagnostics

    1. Residuals
    2. Influential, high leverage and outlying points
    3. Residuals, Leverage and Influence measures
    4. Simulation examples
    5. Example described by Stefanski
    6. Back to the Swiss data
    7. Exercises

    Multiple variables and model selection

    1. Multivariable regression
    2. The Rumsfeldian triplet
    3. General rules
    4. R squared goes up as you put regressors in the model
    5. Simulation demonstrating variance inflation
    6. Summary of variance inflation
    7. Swiss data revisited
    8. Impact of over- and under-fitting on residual variance estimation
    9. Covariate model selection
    10. How to do nested model testing in R
    11. Exercises

    Generalized Linear Models

    1. Example, linear models
    2. Example, logistic regression
    3. Example, Poisson regression
    4. How estimates are obtained
    5. Odds and ends
    6. Exercises

    Binary GLMs

    1. Example Baltimore Ravens win/loss
    2. Odds
    3. Modeling the odds
    4. Interpreting Logistic Regression
    5. Visualizing fitting logistic regression curves
    6. Ravens logistic regression
    7. Some summarizing comments
    8. Exercises

    Count data

    1. Poisson distribution
    2. Poisson distribution
    3. Linear regression
    4. Poisson regression
    5. Mean-variance relationship
    6. Rates
    7. Exercises

    Bonus material

    1. How to fit functions using linear models
    2. Notes
    3. Harmonics using linear models
    4. Thanks!

    The Leanpub 60 Day 100% Happiness Guarantee

    Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

    Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

    You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

    So, there's no reason not to click the Add to Cart button, is there?

    See full terms...

    Earn $8 on a $10 Purchase, and $16 on a $20 Purchase

    We pay 80% royalties on purchases of $7.99 or more, and 80% royalties minus a 50 cent flat fee on purchases between $0.99 and $7.98. You earn $8 on a $10 sale, and $16 on a $20 sale. So, if we sell 5000 non-refunded copies of your book for $20, you'll earn $80,000.

    (Yes, some authors have already earned much more than that on Leanpub.)

    In fact, authors have earned over $14 million writing, publishing and selling on Leanpub.

    Learn more about writing on Leanpub

    Free Updates. DRM Free.

    If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

    Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

    Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

    Learn more about Leanpub's ebook formats and where to read them

    Write and Publish on Leanpub

    You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

    Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

    Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

    Learn more about writing on Leanpub