--- title: "Regression Methods: Multiple and Logistic RWA" author: "Martin Chan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Regression Methods: Multiple and Logistic RWA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", out.width = "100%", error = FALSE, warning = FALSE, message = FALSE ) ``` ```{r setup} library(rwa) library(dplyr) library(ggplot2) ``` ## Introduction The `rwa` package provides two specialized functions for conducting Relative Weights Analysis depending on the nature of your outcome variable: - **`rwa_multiregress()`**: For continuous outcome variables (standard multiple regression) - **`rwa_logit()`**: For binary outcome variables (logistic regression) The main `rwa()` function acts as a convenient wrapper that can automatically detect which method to use, but understanding these underlying functions gives you more control and insight into your analysis. ## When to Use Each Method | Outcome Type | Function | Example Use Cases | |--------------|----------|-------------------| | Continuous | `rwa_multiregress()` | Predicting prices, scores, measurements | | Binary (0/1) | `rwa_logit()` | Predicting yes/no, pass/fail, purchase/no purchase | ## Multiple Regression with `rwa_multiregress()` ### Basic Example with mtcars The `mtcars` dataset contains continuous variables ideal for demonstrating multiple regression RWA. Let's examine what factors most influence fuel efficiency (mpg): ```{r multiregress-basic} # Direct use of rwa_multiregress() result_multi <- rwa_multiregress( df = mtcars, outcome = "mpg", predictors = c("cyl", "disp", "hp", "wt") ) # View results result_multi$result ``` ### Interpreting Multiple Regression Results The output contains several key pieces of information: ```{r multiregress-interpret} # R-squared: Total variance explained cat("R-squared:", round(result_multi$rsquare, 4), "\n") cat("This means", round(result_multi$rsquare * 100, 1), "% of variance in mpg is explained by these predictors.\n\n") # Number of observations cat("Sample size:", result_multi$n, "\n\n") # Relative importance breakdown cat("Relative Importance (Rescaled Weights sum to 100%):\n") result_multi$result %>% arrange(desc(Rescaled.RelWeight)) %>% mutate(Rescaled.RelWeight = round(Rescaled.RelWeight, 2)) %>% select(Variables, Rescaled.RelWeight) ``` The **Rescaled.RelWeight** column shows the percentage of explainable variance attributed to each predictor. In this example, we can see that weight (`wt`) is the most important predictor of fuel efficiency, followed by displacement (`disp`). ### Using the `applysigns` Parameter By default, relative weights are always positive (they represent variance contributions). Use `applysigns = TRUE` to see the direction of each relationship: ```{r multiregress-signs} # With sign information result_signed <- rwa_multiregress( df = mtcars, outcome = "mpg", predictors = c("cyl", "disp", "hp", "wt"), applysigns = TRUE ) result_signed$result %>% select(Variables, Raw.RelWeight, Sign.Rescaled.RelWeight, Sign) ``` The `Sign` column indicates whether each predictor has a positive (+) or negative (-) relationship with the outcome. Negative signs for `cyl`, `disp`, `hp`, and `wt` make intuitive sense: more cylinders, larger displacement, more horsepower, and heavier weight all tend to decrease fuel efficiency. ### Examining Correlation Structures The function also returns the correlation matrices, which can help understand relationships between predictors: ```{r multiregress-correlations} # Correlation between predictors cat("Predictor Correlation Matrix (RXX):\n") round(result_multi$RXX, 3) # Correlation of predictors with outcome cat("\nPredictor-Outcome Correlations (RXY):\n") round(result_multi$RXY, 3) ``` ## Logistic Regression with `rwa_logit()` ### Creating a Binary Outcome For logistic regression, we need a binary outcome variable. Let's create one from the `mtcars` dataset: ```{r logit-setup} # Create binary outcome: high efficiency (1) vs low efficiency (0) mtcars_binary <- mtcars %>% mutate(high_mpg = ifelse(mpg > median(mpg), 1, 0)) # Check distribution table(mtcars_binary$high_mpg) ``` ### Basic Logistic RWA ```{r logit-basic} # Logistic regression RWA result_logit <- rwa_logit( df = mtcars_binary, outcome = "high_mpg", predictors = c("cyl", "disp", "hp", "wt") ) # View results result_logit$result ``` ### Interpreting Logistic RWA Results The interpretation differs slightly from multiple regression: ```{r logit-interpret} # Lambda (analogous to R-squared for logistic regression) cat("Lambda (pseudo R-squared):", round(result_logit$lambda, 4), "\n") cat("Sample size:", result_logit$n, "\n\n") # Relative importance cat("Relative Importance for Predicting High Fuel Efficiency:\n") result_logit$result %>% arrange(desc(Rescaled.RelWeight)) %>% mutate(Rescaled.RelWeight = round(Rescaled.RelWeight, 2)) ``` Like multiple regression, the **Rescaled.RelWeight** values sum to 100%, representing the percentage of predictable variance attributed to each predictor. ### Logistic RWA with Signs ```{r logit-signs} # With direction information result_logit_signed <- rwa_logit( df = mtcars_binary, outcome = "high_mpg", predictors = c("cyl", "disp", "hp", "wt"), applysigns = TRUE ) result_logit_signed$result %>% select(Variables, Rescaled.RelWeight, Sign) ``` ## Using the `rwa()` Wrapper Function The main `rwa()` function provides a convenient interface that can automatically detect whether to use multiple or logistic regression based on your outcome variable. ### Auto-Detection of Binary Outcomes ```{r rwa-auto} # For continuous outcome - automatically uses multiple regression result_auto_multi <- rwa( df = mtcars, outcome = "mpg", predictors = c("cyl", "disp", "hp", "wt") ) # For binary outcome - automatically uses logistic regression result_auto_logit <- rwa( df = mtcars_binary, outcome = "high_mpg", predictors = c("cyl", "disp", "hp", "wt") ) ``` ### Explicit Method Selection You can also explicitly specify the method using the `method` parameter: ```{r rwa-explicit} # Force multiple regression result_explicit_multi <- rwa( df = mtcars, outcome = "mpg", predictors = c("cyl", "disp", "hp", "wt"), method = "multiple" ) # Force logistic regression (requires binary outcome) result_explicit_logit <- rwa( df = mtcars_binary, outcome = "high_mpg", predictors = c("cyl", "disp", "hp", "wt"), method = "logistic" ) ``` ### Additional Features in `rwa()` The wrapper function also provides sorting and visualization options: ```{r rwa-features} # Sort results by importance result_sorted <- rwa( df = mtcars, outcome = "mpg", predictors = c("cyl", "disp", "hp", "wt"), sort = TRUE ) result_sorted$result # Visualize with plot_rwa() plot_rwa(result_sorted) ``` ## Real-World Example: Iris Dataset Let's apply these methods to the classic `iris` dataset. ### Multiple Regression: Predicting Petal Length ```{r iris-multi} # Predict petal length from other measurements iris_result <- rwa_multiregress( df = iris, outcome = "Petal.Length", predictors = c("Sepal.Length", "Sepal.Width", "Petal.Width"), applysigns = TRUE ) cat("R-squared:", round(iris_result$rsquare, 4), "\n\n") iris_result$result # Visualize plot_rwa(iris_result) ``` ### Logistic Regression: Predicting Species For logistic regression, we need a binary outcome. Let's predict whether a flower is *Iris setosa* or not: ```{r iris-logit} # Create binary outcome for setosa classification iris_binary <- iris %>% mutate(is_setosa = ifelse(Species == "setosa", 1, 0)) # Logistic RWA iris_logit <- rwa_logit( df = iris_binary, outcome = "is_setosa", predictors = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"), applysigns = TRUE ) cat("Pseudo R-squared:", round(iris_logit$rsquare, 4), "\n\n") iris_logit$result ``` This analysis reveals which measurements are most important for distinguishing *Iris setosa* from the other species. ## Comparing Methods Let's demonstrate how the same predictors can yield different importance rankings depending on the outcome type: ```{r compare-methods} # Create comparison dataset comparison_data <- mtcars %>% mutate(high_mpg = ifelse(mpg > median(mpg), 1, 0)) # Multiple regression on continuous mpg multi_result <- rwa_multiregress( comparison_data, "mpg", c("cyl", "disp", "hp", "wt") ) # Logistic regression on binary high_mpg logit_result <- rwa_logit( comparison_data, "high_mpg", c("cyl", "disp", "hp", "wt") ) # Compare rankings comparison <- data.frame( Variable = multi_result$result$Variables, Multiple_Pct = round(multi_result$result$Rescaled.RelWeight, 1), Logistic_Pct = round(logit_result$result$Rescaled.RelWeight, 1) ) comparison %>% arrange(desc(Multiple_Pct)) ``` The relative importance of predictors may differ between continuous and binary outcomes because: 1. **Different relationships**: A predictor's linear relationship with the continuous outcome may differ from its relationship with the probability of the binary outcome. 2. **Threshold effects**: Binary outcomes are sensitive to whether predictors help distinguish cases near the classification boundary. ## Best Practices ### 1. Choose the Right Method - Use `rwa_multiregress()` for continuous outcomes - Use `rwa_logit()` for binary (0/1) outcomes - Let `rwa()` auto-detect when unsure ### 2. Check Your Data ```{r best-practices} # Always check outcome distribution for binary variables table(mtcars_binary$high_mpg) # Ensure reasonable sample size cat("Sample size:", nrow(mtcars), "\n") cat("Predictors:", 4, "\n") cat("Observations per predictor:", nrow(mtcars) / 4, "\n") ``` A general guideline is to have at least 10-20 observations per predictor. ### 3. Consider Bootstrap for Inference For statistical significance testing, combine with bootstrap methods (note: currently only available for multiple regression): ```{r bootstrap-multi} # Bootstrap with multiple regression result_boot <- rwa( df = mtcars, outcome = "mpg", predictors = c("cyl", "disp", "hp", "wt"), bootstrap = TRUE, n_bootstrap = 1000 ) result_boot$result ``` ## Summary | Function | Outcome Type | Key Output | Weights Sum To | |----------|--------------|------------|----------------| | `rwa_multiregress()` | Continuous | R², Raw & Rescaled Weights | 100% | | `rwa_logit()` | Binary (0/1) | R², Raw & Rescaled Weights | 100% | | `rwa()` | Either | Auto-detects + sorting + bootstrap | 100% | Both methods provide valuable insights into predictor importance while accounting for multicollinearity. Choose the appropriate method based on your outcome variable type, and consider using bootstrap confidence intervals for formal statistical inference. ## References * Johnson, J. W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. *Multivariate Behavioral Research*, 35(1), 1-19. * Tonidandel, S., & LeBreton, J. M. (2011). Relative importance analysis: A useful supplement to regression analysis. *Journal of Business and Psychology*, 26(1), 1-9. * Tonidandel, S., & LeBreton, J. M. (2015). RWA Web: A free, comprehensive, web-based, and user-friendly tool for relative weight analyses. *Journal of Business and Psychology*, 30(2), 207-216.