Title: | Calculates Disproportionate Impact When Binary Success Data are Disaggregated by Subgroups |
---|---|
Description: | Implements methods for calculating disproportionate impact: the percentage point gap, proportionality index, and the 80% index. California Community Colleges Chancellor's Office (2017). Percentage Point Gap Method. <https://www.cccco.edu/-/media/CCCCO-Website/About-Us/Divisions/Digital-Innovation-and-Infrastructure/Research/Files/PercentagePointGapMethod2017.ashx>. California Community Colleges Chancellor's Office (2014). Guidelines for Measuring Disproportionate Impact in Equity Plans. <https://www.cccco.edu/-/media/CCCCO-Website/Files/DII/guidelines-for-measuring-disproportionate-impact-in-equity-plans-tfa-ada.pdf>. |
Authors: | Vinh Nguyen [aut, cre] |
Maintainer: | Vinh Nguyen <[email protected]> |
License: | GPL-3 |
Version: | 0.0.22.9000 |
Built: | 2025-01-27 04:47:31 UTC |
Source: | https://github.com/vinhdizzo/disimpact |
Calculate disproportionate impact per the 80% index method.
di_80_index( success, group, cohort, weight, data, di_80_index_cutoff = 0.8, reference_group = "hpg", check_valid_reference = TRUE )
di_80_index( success, group, cohort, weight, data, di_80_index_cutoff = 0.8, reference_group = "hpg", check_valid_reference = TRUE )
success |
A vector of success indicators ( |
group |
A vector of group names of the same length as |
cohort |
(Optional) A vector of cohort names of the same length as |
weight |
(Optional) A vector of case weights of the same length as |
data |
(Optional) A data frame containing the variables of interest. If |
di_80_index_cutoff |
A numeric value between 0 and 1 that is used to determine disproportionate impact if the index comparing the success rate of the current group to the reference group falls below this threshold; defaults to 0.80. |
reference_group |
The reference group value in |
check_valid_reference |
Check whether |
This function determines disproportionate impact based on the 80% index method, as described in this reference from the California Community Colleges Chancellor's Office. It assumes that a higher rate is good ("success"). For rates that are deemed negative (eg, rate of drop-outs, high is bad), then consider looking at the converse of the non-success (eg, non drop-outs, high is good) instead in order to leverage this function properly.
A data frame consisting of:
cohort
(if used),
group
,
n
(sample size),
success
(number of successes for the cohort-group),
pct
(proportion of successes for the cohort-group),
reference_group
(the reference group used to compare and determine disproportionate impact),
reference
(the reference rate used for comparison, corresponding to reference_group),
di_80_index
(ratio of pct to the reference),
di_indicator
(1 if di_80_index < di_80_index_cutoff
),
success_needed_not_di
(the number of additional successes needed in order to no longer be considered disproportionately impacted as compared to the reference), and
success_needed_full_parity
(the number of additional successes needed in order to achieve full parity with the reference).
California Community Colleges Chancellor's Office (2014). Guidelines for Measuring Disproportionate Impact in Equity Plans.
library(dplyr) data(student_equity) di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>% as.data.frame
library(dplyr) data(student_equity) di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>% as.data.frame
Calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for data stored in a data.table object. This is the workhorse function leveraged by the di_iterate_dt function.
di_calc_dt( dt, success_var, group_var, cohort_var = "", weight_var = NULL, ppg_reference_group = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_group = "hpg", filter_subset = "" )
di_calc_dt( dt, success_var, group_var, cohort_var = "", weight_var = NULL, ppg_reference_group = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_group = "hpg", filter_subset = "" )
dt |
A data frame of class data.table. If the object is not a data table, one could surround the object with as.data.table. |
success_var |
A character value specifying the success variable name. |
group_var |
A character value specifying the group (disaggregation) variable name. |
cohort_var |
(Optional) A character value specifying the cohort variable. If not specified, then a single cohort is assumed (defaults to an empty string, |
weight_var |
(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in |
ppg_reference_group |
Either |
min_moe |
The minimum margin of error to be used in the PPG calculation; see di_ppg. |
use_prop_in_moe |
( |
prop_sub_0 |
Default is 0.50; see di_ppg. |
prop_sub_1 |
Default is 0.50; see di_ppg. |
di_prop_index_cutoff |
Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80. |
di_80_index_cutoff |
Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80. |
di_80_index_reference_group |
Either |
filter_subset |
A character value such as |
A data.table object with summarized results.
Generate SQL code that calculates disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for a specified table name, success variable, group variable, and cohort variable. This is the workhorse function leveraged by the di_iterate_sql function.
di_calc_sql( db_table_name, success_var, group_var, cohort_var = "", weight_var = 1, ppg_reference_group = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_group = "hpg", before_with_statement = "", after_with_statement = "", end_of_select_statement = "", where_statement = "", select_statement_add = "" )
di_calc_sql( db_table_name, success_var, group_var, cohort_var = "", weight_var = 1, ppg_reference_group = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_group = "hpg", before_with_statement = "", after_with_statement = "", end_of_select_statement = "", where_statement = "", select_statement_add = "" )
db_table_name |
A character value specifying a database table name. |
success_var |
A character value specifying the success variable name. |
group_var |
A character value specifying the group (disaggregation) variable name. |
cohort_var |
(Optional) A character value specifying the cohort variable. If not specified, then a single cohort is assumed (defaults to an empty string, |
weight_var |
(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in |
ppg_reference_group |
Either |
min_moe |
The minimum margin of error to be used in the PPG calculation; see di_ppg. |
use_prop_in_moe |
( |
prop_sub_0 |
Default is 0.50; see di_ppg. |
prop_sub_1 |
Default is 0.50; see di_ppg. |
di_prop_index_cutoff |
Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80. |
di_80_index_cutoff |
Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80. |
di_80_index_reference_group |
Either |
before_with_statement |
Character value to be added to the SQL query to allow for modification. Defaults to |
after_with_statement |
Character value to be added to the SQL query to allow for modification. Defaults to |
end_of_select_statement |
Character value to be added to the SQL query to allow for modification. Defaults to |
where_statement |
Character value to be added to the SQL query to allow for modification. Defaults to |
select_statement_add |
Character value to be added to the SQL query to allow for modification. Defaults to |
A character value (SQL query) that could be executed on a database.
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for many success variables, disaggregation variables, and scenarios.
di_iterate( data, success_vars, group_vars, cohort_vars = NULL, scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL, weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_groups = "hpg", check_valid_reference = TRUE, parallel = FALSE, parallel_n_cores = parallel::detectCores(), parallel_split_to_disk = FALSE )
di_iterate( data, success_vars, group_vars, cohort_vars = NULL, scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL, weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_groups = "hpg", check_valid_reference = TRUE, parallel = FALSE, parallel_n_cores = parallel::detectCores(), parallel_split_to_disk = FALSE )
data |
A data frame for which to iterate DI calculations for a set of variables. |
success_vars |
A character vector of success variable names to iterate across. |
group_vars |
A character vector of group (disaggregation) variable names to iterate across. |
cohort_vars |
(Optional) A character vector of the same length as |
scenario_repeat_by_vars |
(Optional) A character vector of variables to repeat DI calculations for across all combination of these variables. For example, the following variables could be specified:
Each combination of these variables (eg, full time, first time college students with an ed goal of degree/transfer as one combination) would constitute an iteration / sample for which to calculate disproportionate impact for outcomes listed in |
exclude_scenario_df |
(Optional) A data frame with variables that match |
weight_var |
(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in |
include_non_disagg_results |
A logical variable specifying whether or not the non-disaggregated results should be returned; defaults to |
ppg_reference_groups |
Either |
min_moe |
The minimum margin of error to be used in the PPG calculation, passed to di_ppg. |
use_prop_in_moe |
Whether the estimated proportions should be used in the margin of error calculation by the PPG, passed to di_ppg. |
prop_sub_0 |
passed to di_ppg; defaults to 0.50. |
prop_sub_1 |
passed to di_ppg; defaults to 0.50. |
di_prop_index_cutoff |
Threshold used for determining disproportionate impact using the proportionality index; passed to di_prop_index; defaults to 0.80. |
di_80_index_cutoff |
Threshold used for determining disproportionate impact using the 80% index; passed to di_80_index; defaults to 0.80. |
di_80_index_reference_groups |
Either |
check_valid_reference |
Check whether |
parallel |
If |
parallel_n_cores |
The number of CPU cores to use if |
parallel_split_to_disk |
If |
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars
, group_vars
, and cohort_vars
, for each combination of subgroups specified by scenario_repeat_by_vars
.
A summarized data set (data frame) consisting of:
success_variable
(elements of success_vars
),
disaggregation
(elements of group_vars
),
cohort
(values corresponding to the variables specified in cohort_vars
,
di_indicator_ppg
(1 if there is disproportionate impact per the percentage point gap method, 0 otherwise),
di_indicator_prop_index
(1 if there is disproportionate impact per the proportionality index, 0 otherwise),
di_indicator_80_index
(1 if there is disproportionate impact per the 80% index, 0 otherwise), and
other relevant fields returned from di_ppg, di_prop_index, and di_80_index.
library(dplyr) data(student_equity) # Multiple group variables di_iterate(data=student_equity, success_vars=c('Transfer') , group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort') , ppg_reference_groups='overall')
library(dplyr) data(student_equity) # Multiple group variables di_iterate(data=student_equity, success_vars=c('Transfer') , group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort') , ppg_reference_groups='overall')
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for many success variables, disaggregation variables, and scenarios, using data.table and collapse.
di_iterate_dt( dt, success_vars, group_vars, cohort_vars = NULL, scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL, weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_groups = "hpg", check_valid_reference = TRUE, parallel = FALSE, parallel_n_cores = parallel::detectCores()/2 )
di_iterate_dt( dt, success_vars, group_vars, cohort_vars = NULL, scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL, weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_groups = "hpg", check_valid_reference = TRUE, parallel = FALSE, parallel_n_cores = parallel::detectCores()/2 )
dt |
A data frame of class data.table. If the object is not a data table, one could surround the object with as.data.table. |
success_vars |
A character vector of success variable names to iterate across. |
group_vars |
A character vector of group (disaggregation) variable names to iterate across. |
cohort_vars |
(Optional) A character vector of the same length as |
scenario_repeat_by_vars |
(Optional) A character vector of variables to repeat DI calculations for across all combination of these variables. For example, the following variables could be specified:
Each combination of these variables (eg, full time, first time college students with an ed goal of degree/transfer as one combination) would constitute an iteration / sample for which to calculate disproportionate impact for outcomes listed in |
exclude_scenario_df |
(Optional) A data frame with variables that match |
weight_var |
(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in |
include_non_disagg_results |
A logical variable specifying whether or not the non-disaggregated results should be returned; defaults to |
ppg_reference_groups |
Either |
min_moe |
The minimum margin of error to be used in the PPG calculation; see di_ppg. |
use_prop_in_moe |
( |
prop_sub_0 |
Default is 0.50; see di_ppg. |
prop_sub_1 |
Default is 0.50; see di_ppg. |
di_prop_index_cutoff |
Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80. |
di_80_index_cutoff |
Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80. |
di_80_index_reference_groups |
Either |
check_valid_reference |
( |
parallel |
If |
parallel_n_cores |
The number of CPU cores to use if |
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars
, group_vars
, and cohort_vars
, for each combination of subgroups specified by scenario_repeat_by_vars
, using data.table and collapse.
A summarized data set of class data.table, with variables as described in di_iterate.
Calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for a "long" and summarized data set with many success variables and disaggregation variables, where the success counts and disaggregation groups are stored in a single column or variable for each.
di_iterate_on_long( data, num_var, denom_var, disagg_var_col, group_var_col, disagg_var_col_2 = NULL, group_var_col_2 = NULL, cohort_var_col = NULL, summarize_by_vars = NULL, custom_reference_group_flag_var = NULL, ... )
di_iterate_on_long( data, num_var, denom_var, disagg_var_col, group_var_col, disagg_var_col_2 = NULL, group_var_col_2 = NULL, cohort_var_col = NULL, summarize_by_vars = NULL, custom_reference_group_flag_var = NULL, ... )
data |
A data frame for which to iterate DI calculations for a set of variables. |
num_var |
A variable name (character value) from |
denom_var |
A variable name (character value) from |
disagg_var_col |
A variable name (character value) from |
group_var_col |
A variable name (character value) from |
disagg_var_col_2 |
(Optional) A variable name (character value) from |
group_var_col_2 |
(Optional) A variable name (character value) from |
cohort_var_col |
(Optional) A variable name (character value) from |
summarize_by_vars |
(Optional) A character vector of variable names in |
custom_reference_group_flag_var |
(Optional) A variable name (character value) from |
... |
(Optional) Other arguments such as |
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars
, group_vars
, and cohort_vars
, for each combination of subgroups specified by scenario_repeat_by_vars
.
A summarized data set (data frame) consisting of:
variables specified by summarize_by_vars
, disagg_var_col
, group_var_col
, disagg_var_col_2
, and group_var_col_2
,
di_indicator_ppg
(1 if there is disproportionate impact per the percentage point gap method, 0 otherwise),
di_indicator_prop_index
(1 if there is disproportionate impact per the proportionality index, 0 otherwise),
di_indicator_80_index
(1 if there is disproportionate impact per the 80% index, 0 otherwise), and
other relevant fields returned from di_ppg, di_prop_index, and di_80_index.
library(dplyr) data(ssm_cohort) di_iterate_on_long(data=ssm_cohort %>% filter(missingFlag==0) # remove missing data , num_var='value', denom_var='denom' , disagg_var_col='disagg1', group_var_col='subgroup1' , cohort_var_col='academicYear', summarize_by_vars=c('categoryLabel') , ppg_reference_groups='all but current' # PPG-1 , di_80_index_reference_groups='all but current')
library(dplyr) data(ssm_cohort) di_iterate_on_long(data=ssm_cohort %>% filter(missingFlag==0) # remove missing data , num_var='value', denom_var='denom' , disagg_var_col='disagg1', group_var_col='subgroup1' , cohort_var_col='academicYear', summarize_by_vars=c('categoryLabel') , ppg_reference_groups='all but current' # PPG-1 , di_80_index_reference_groups='all but current')
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for many success variables, disaggregation variables, and scenarios, using SQL (for data stored in a database or in a parquet data file).
di_iterate_sql( db_conn, db_table_name, success_vars, group_vars, cohort_vars = NULL, scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL, weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_groups = "hpg", check_valid_reference = TRUE, parallel = FALSE, parallel_n_cores = parallel::detectCores()/2, mssql_flag = FALSE, return_what = "data", staging_table = paste0("DisImpact_Staging_", paste0(sample(1:9, size = 5, replace = TRUE), collapse = "")), drop_staging_table = TRUE )
di_iterate_sql( db_conn, db_table_name, success_vars, group_vars, cohort_vars = NULL, scenario_repeat_by_vars = NULL, exclude_scenario_df = NULL, weight_var = NULL, include_non_disagg_results = TRUE, ppg_reference_groups = "overall", min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, di_prop_index_cutoff = 0.8, di_80_index_cutoff = 0.8, di_80_index_reference_groups = "hpg", check_valid_reference = TRUE, parallel = FALSE, parallel_n_cores = parallel::detectCores()/2, mssql_flag = FALSE, return_what = "data", staging_table = paste0("DisImpact_Staging_", paste0(sample(1:9, size = 5, replace = TRUE), collapse = "")), drop_staging_table = TRUE )
db_conn |
A database connection object, returned by dbConnect. |
db_table_name |
A character value specifying a database table name. |
success_vars |
A character vector of success variable names to iterate across. |
group_vars |
A character vector of group (disaggregation) variable names to iterate across. |
cohort_vars |
(Optional) A character vector of the same length as |
scenario_repeat_by_vars |
(Optional) A character vector of variables to repeat DI calculations for across all combination of these variables. For example, the following variables could be specified:
Each combination of these variables (eg, full time, first time college students with an ed goal of degree/transfer as one combination) would constitute an iteration / sample for which to calculate disproportionate impact for outcomes listed in |
exclude_scenario_df |
(Optional) A data frame with variables that match |
weight_var |
(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in |
include_non_disagg_results |
A logical variable specifying whether or not the non-disaggregated results should be returned; defaults to |
ppg_reference_groups |
Either |
min_moe |
The minimum margin of error to be used in the PPG calculation; see di_ppg. |
use_prop_in_moe |
( |
prop_sub_0 |
Default is 0.50; see di_ppg. |
prop_sub_1 |
Default is 0.50; see di_ppg. |
di_prop_index_cutoff |
Threshold used for determining disproportionate impact using the proportionality index; see di_prop_index; defaults to 0.80. |
di_80_index_cutoff |
Threshold used for determining disproportionate impact using the 80% index; see di_80_index; defaults to 0.80. |
di_80_index_reference_groups |
Either |
check_valid_reference |
( |
parallel |
If |
parallel_n_cores |
The number of CPU cores to use if |
mssql_flag |
User-specified logical flag ( |
return_what |
A character value specifying the return value for the function call. For |
staging_table |
A character value indicating the name of the staging or results table in the database for storing the disproportionate impact calculations. |
drop_staging_table |
|
Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars
, group_vars
, and cohort_vars
, for each combination of subgroups specified by scenario_repeat_by_vars
, using SQL (calculations done on the database engine or duckdb for parquet files).
When return_what='data'
(default), a long data frame is returned (see the return value for di_iterate). When return_what='SQL'
(default), a list object where each element is a query (character value) is returned.
Calculate disproportionate impact per the percentage point gap (PPG) method.
di_ppg( success, group, cohort, weight, reference = c("overall", "hpg", "all but current", unique(group)), data, min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, check_valid_reference = TRUE )
di_ppg( success, group, cohort, weight, reference = c("overall", "hpg", "all but current", unique(group)), data, min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5, check_valid_reference = TRUE )
success |
A vector of success indicators ( |
group |
A vector of group names of the same length as |
cohort |
(Optional) A vector of cohort names of the same length as |
weight |
(Optional) A vector of case weights of the same length as |
reference |
Either
|
data |
(Optional) A data frame containing the variables of interest. If |
min_moe |
The minimum margin of error (MOE) to be used in the calculation of disproportionate impact and is passed to ppg_moe. Defaults to |
use_prop_in_moe |
A logical value indicating whether or not the MOE formula should use the observed success rates ( |
prop_sub_0 |
For cases where |
prop_sub_1 |
For cases where |
check_valid_reference |
Check whether |
This function determines disproportionate impact based on the percentage point gap (PPG) method, as described in this reference from the California Community Colleges Chancellor's Office. It assumes that a higher rate is good ("success"). For rates that are deemed negative (eg, rate of drop-outs, high is bad), then consider looking at the converse of the non-success (eg, non drop-outs, high is good) instead in order to leverage this function properly. Note that the margin of error (MOE) is calculated using using 1.96*sqrt(0.25^2/n)
, with a min_moe
used as the minimum by default.
A data frame consisting of:
cohort
(if used),
group
,
n
(sample size),
success
(number of successes for the cohort-group),
pct
(proportion of successes for the cohort-group),
reference_group
(reference group used in DI calculation),
reference
(reference value used in DI calculation),
moe
(margin of error),
pct_lo
(lower 95% confidence limit for pct),
pct_hi
(upper 95% confidence limit for pct),
di_indicator
(1 if there is disproportionate impact, ie, when pct_hi <= reference
),
success_needed_not_di
(the number of additional successes needed in order to no longer be considered disproportionately impacted as compared to the reference), and
success_needed_full_parity
(the number of additional successes needed in order to achieve full parity with the reference).
California Community Colleges Chancellor's Office (2017). Percentage Point Gap Method.
library(dplyr) data(student_equity) # Vector di_ppg(success=student_equity$Transfer , group=student_equity$Ethnicity) %>% as.data.frame # Tidy and column reference di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>% as.data.frame # Cohort di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort , data=student_equity) %>% as.data.frame # With custom reference (single) di_ppg(success=Transfer, group=Ethnicity, reference=0.54 , data=student_equity) %>% as.data.frame # With custom reference (multiple) di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort , reference=c(0.5, 0.55), data=student_equity) %>% as.data.frame # min_moe di_ppg(success=Transfer, group=Ethnicity, data=student_equity , min_moe=0.02) %>% as.data.frame # use_prop_in_moe di_ppg(success=Transfer, group=Ethnicity, data=student_equity , min_moe=0.02 , use_prop_in_moe=TRUE) %>% as.data.frame
library(dplyr) data(student_equity) # Vector di_ppg(success=student_equity$Transfer , group=student_equity$Ethnicity) %>% as.data.frame # Tidy and column reference di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>% as.data.frame # Cohort di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort , data=student_equity) %>% as.data.frame # With custom reference (single) di_ppg(success=Transfer, group=Ethnicity, reference=0.54 , data=student_equity) %>% as.data.frame # With custom reference (multiple) di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort , reference=c(0.5, 0.55), data=student_equity) %>% as.data.frame # min_moe di_ppg(success=Transfer, group=Ethnicity, data=student_equity , min_moe=0.02) %>% as.data.frame # use_prop_in_moe di_ppg(success=Transfer, group=Ethnicity, data=student_equity , min_moe=0.02 , use_prop_in_moe=TRUE) %>% as.data.frame
Iteratively calculate disproportionate impact via the percentage point gap (PPG) method for many disaggregation variables.
di_ppg_iterate( data, success_vars, group_vars, cohort_vars, reference_groups, repeat_by_vars = NULL, weight_var = NULL, min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5 )
di_ppg_iterate( data, success_vars, group_vars, cohort_vars, reference_groups, repeat_by_vars = NULL, weight_var = NULL, min_moe = 0.03, use_prop_in_moe = FALSE, prop_sub_0 = 0.5, prop_sub_1 = 0.5 )
data |
A data frame for which to iterate DI calculation for a set of variables. |
success_vars |
A character vector of success variable names to iterate across. |
group_vars |
A character vector of group (disaggregation) variable names to iterate across. |
cohort_vars |
A character vector of cohort variable names to iterate across. |
reference_groups |
Either 'overall', 'hpg', or a character vector of the same length as 'group_vars' that indicates the reference group value for each group variable in 'group_vars'. |
repeat_by_vars |
A character vector of variables to repeat DI calculations for across all combination of these variables, including '- All' as a group for each variable. The reference rate used for DI comparison differs for every combination of the variables listed here. |
weight_var |
A character scalar specifying the weight variable if the input data set is summarized (ie, the the success variables specified in 'success_vars' contain count of successes). Weight here corresponds to the denominator when calculating the success rate. Defaults to 'NULL' for an input data set where each row describes each individual. |
min_moe |
The minimum margin of error to be used in the PPG calculation, passed to 'di_ppg'. |
use_prop_in_moe |
Whether the estimated proportions should be used in the margin of error calculation by the PPG, passed to 'di_ppg'. |
prop_sub_0 |
Passed to 'di_ppg'. |
prop_sub_1 |
Passed to 'di_ppg'. |
Iteratively calculate disproportionate impact via the percentage point gap (PPG) method for all combinations of 'success_vars', 'group_vars', and 'cohort_vars', for each combination of subgroups specified by 'repeat_by_vars'.
A data frame with all relevant returned fields from 'di_ppg' plus 'success_variable' (elements of 'success_vars'), 'disaggregation' (elements of 'group_vars'), and 'reference_group' (elements of 'reference_groups').
library(dplyr) data(student_equity) # Multiple group variables di_ppg_iterate(data=student_equity, success_vars=c('Transfer') , group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort') , reference_groups='overall')
library(dplyr) data(student_equity) # Multiple group variables di_ppg_iterate(data=student_equity, success_vars=c('Transfer') , group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort') , reference_groups='overall')
Calculate disproportionate impact per the proportionality index (PI) method.
di_prop_index(success, group, cohort, weight, data, di_prop_index_cutoff = 0.8)
di_prop_index(success, group, cohort, weight, data, di_prop_index_cutoff = 0.8)
success |
A vector of success indicators ( |
group |
A vector of group names of the same length as |
cohort |
(Optional) A vector of cohort names of the same length as |
weight |
(Optional) A vector of case weights of the same length as |
data |
(Optional) A data frame containing the variables of interest. If |
di_prop_index_cutoff |
A numeric value between 0 and 1 that is used to determine disproportionate impact if the proportionality index falls below this threshold; defaults to 0.80. |
This function determines disproportionate impact based on the proportionality index (PI) method, as described in this reference from the California Community Colleges Chancellor's Office. It assumes that a higher rate is good ("success"). For rates that are deemed negative (eg, rate of drop-outs, high is bad), then consider looking at the converse of the non-success (eg, non drop-outs, high is good) instead in order to leverage this function properly.
A data frame consisting of:
cohort
(if used),
group
,
n
(sample size),
success
(number of successes for the cohort-group),
pct_success
(proportion of successes attributed to the group within the cohort),
pct_group
(proportion of sample attributed to the group within the cohort),
di_prop_index
(ratio of pct_success to pct_group),
di_indicator
(1 if di_prop_index < di_prop_index_cutoff
), and
success_needed_not_di
(the number of additional successes needed in order to no longer be considered disproportionately impacted as compared to the reference), and
success_needed_full_parity
(the number of additional successes needed in order to achieve full parity with the reference).
When di_prop_index < 1
, then there are signs of disproportionate impact.
California Community Colleges Chancellor's Office (2014). Guidelines for Measuring Disproportionate Impact in Equity Plans.
library(dplyr) data(student_equity) di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>% as.data.frame
library(dplyr) data(student_equity) di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>% as.data.frame
Calculate the margin of error (MOE) for the percentage point gap (PPG) method.
ppg_moe(n, proportion, min_moe = 0.03, prop_sub_0 = 0.5, prop_sub_1 = 0.5)
ppg_moe(n, proportion, min_moe = 0.03, prop_sub_0 = 0.5, prop_sub_1 = 0.5)
n |
Sample size for the group of interest. |
proportion |
(Optional) The proportion of successes for the group of interest. If specified, then the proportion is used in the MOE formula. Otherwise, a default proportion of 0.50 is used (conservative and yields the maximum MOE). |
min_moe |
The minimum MOE returned even if the sample size is large. Defaults to 0.03. This equates to a minimum threshold gap for declaring disproportionate impact. |
prop_sub_0 |
For cases where 'proportion' is 0, substitute with |
prop_sub_1 |
For cases where 'proportion' is 1, substitute with |
The margin of error for the PPG given the specified sample size.
California Community Colleges Chancellor's Office (2017). Percentage Point Gap Method.
ppg_moe(n=800) ppg_moe(n=c(200, 800, 1000, 2000)) ppg_moe(n=800, proportion=0.20) ppg_moe(n=800, proportion=0.20, min_moe=0) ppg_moe(n=c(200, 800, 1000, 2000), min_moe=0.01)
ppg_moe(n=800) ppg_moe(n=c(200, 800, 1000, 2000)) ppg_moe(n=800, proportion=0.20) ppg_moe(n=800, proportion=0.20, min_moe=0) ppg_moe(n=c(200, 800, 1000, 2000), min_moe=0.01)
Sample data downloaded from the California Community College's Chancellor's Office Student Success Metrics dashboard.
data(ssm_cohort)
data(ssm_cohort)
A data frame with summarized data:
Success count (numerator).
Group size (denominator).
Metric or outcome.
Academic year for given data.
Different levels of disaggregation.
Groups corresponding to each disaggregation in disagg1
.
Second level of disaggregation: 'None' or 'Gender'.
Groups corresponding to each disaggregation in disagg2
.
Not actually a cohort, but the time-window for the outcome in categoryLabel
.
College name.
ID for current metric.
Title of visualization.
ID for categoryLabel
.
value / denom
.
All are 'Percent'.
1 if missing.
1 if FERPA-suppressed.
Ignore.
Ignore.
Ignore.
data(ssm_cohort)
data(ssm_cohort)
Data randomly generated to illustrate the use of the package.
data(student_equity)
data(student_equity)
A data frame with 20,000 rows:
ethnicity (one of: Asian
, Black
, Hispanic
, Multi-Ethnicity
, Native American
, White
).
gender (one of: Male
, Female
, Other
).
year student first enrolled in any credit course at the institution (one of: 2017
, 2018
).
1 or 0 indicating whether or not a student transferred within 2 years of first enrollment (Cohort
).
year student first enrolled in a math course at the institution; could be NA
if the student have not attempted math.
1 or 0 indicating whether or not a student completed transfer-level math within 1 year of their first math attempt (Cohort_Math
); could be NA
if the student have not attempted math.
year student first enrolled in a math course at the institution; could be NA
if the student have not attempted math.
1 or 0 indicating whether or not a student completed transfer-level English within 1 year of their first math attempt (Cohort_English
); could be NA
if the student have not attempted English.
student's educational goal (one of: Deg/Transfer
, Other
).
student's educational status (one of: First-time College
, Other
).
student's unique identifier.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Asian.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Black.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Hispanic.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Native American.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Pacific Islander.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as White.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Carribean.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as East Asian.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Southeast Asian.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Southwest Asian / North African (SWANA).
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Asian-American or Native American Pacific Islander (AANAPI).
1 (yes) or 0 (no) indicating whether or not a student self-identifies as Unknown.
1 (yes) or 0 (no) indicating whether or not a student self-identifies as two or more races.
data(student_equity)
data(student_equity)
Function used internally by di_calc_sql and di_iterate_sql to surround variable names by double quotes in SQL queries in order to support non-alphanumeric characters in variable names.
surround_quote_if_needed(value)
surround_quote_if_needed(value)
value |
A character vector. |
A character vector with double quotes surrounding value
if the first and last characters of value
aren't yet double quotes. For value
that is already surrounded by double quotes, nothing is changed.