What is the Subset (%) Significance Test?
User Journey: Subscriber > Project > Survey > Analyse V2 > Settings > Subset Significance Test
User Story: As an analysis user,
- If my table's Values are set to Row %, then I want to see if each row's percentages are significantly higher (green) or lower (red) compared to the Total (%) row.
- However if my view's Values are set to Column %, then I want to see if each column's percentages are significantly different fom the Total (%) column.
Subset Sig-Test
Will determine if two overlapping percents are significantly different from each other. This test requires Row/Column 1 to be a subset of Row/Column 2. Or Row/Column 2 is the total. For Example, Row/Column 1 is all males and Row/Column 2 is the entire population (both males and females).
Q: Is Subset Row/Column % significantly different from Total Row/Column %?
1: Row/Column 1: Subset n / Base = XX%
2: Row/Column 2: Total n / Base = YY%
3: Variance: ( (Subset n + Total n) / (Subset Base + Total Base) * (1- (Subset n + Total n) / (Subset Base + Total Base) ) ) / (1 - 1/ (Subset Base + Total Base) )
4: T-Value: (ABS (Subset XX% - Total YY%) ) / SQRT (Variance * (1 / Subset Base - 1 / Total Base) )
5: P-Value: 2 * ( 1 - NORMSDIST ( ABS ( T-Value ) ) )
6: Significance Level = 1 - P-Value
7: Confidence Level = (80%, 90%, 95%)
A: If... Significance Level > Confidence Level, Then... Significantly Different = TRUE AND
- If... Row/Column % < Total R/C %, Then... highlight Red OR
- If... Row/Column % > Total R/C %, Then... highlight Green
Example 1
Q: Is 18-40yrs Column % significantly different from Total Column for the choice: "No strong feelings"?
1: Column 1: 18-40yrs n=46 / 393= 12%
2: Column 2: Total n =150 / 1003 = 15%
3: Variance = ( (46 + 150) / (393 + 1003) * (1 - (46 + 150) / (393 + 1003) ) ) / (1 - 1/ (393 + 1003) ) = 0.12
4: T-Value = (ABS (12% - 15%) ) / SQRT (0.12 * (1 / 393 - 1 / 1003) ) = 2.377
5: P-Value = 2 * ( 1 - NORMSDIST ( ABS ( 2.377 ) ) ) = 1.74%
6: Significance Level = 1 - 1.74% = 98.26%
7: Confidence Level = (80%, 90%, 95%) = 95%
A: Column 1: 18-40yrs (12%) is significantly less [RED] than Column 2: Total (15%) for the choice: "No strong feelings"
Example 2: Let's do that again!
Q: Is 18-40yrs Column % significantly different from Total Column % for the choice: "A little passionate"?
1: Column 1: 18-40yrs n=238 / 393 = 61%
2: Column 2: Total n=566 / 1003 = 56%
3: Variance = ( (238 + 566) / (393 + 1003) * (1 - (238 + 566) / (393 + 1003) ) ) / (1 - 1/ (393 + 1003) ) = 0.24
4: T-Value = (ABS (61% - 56%) ) / SQRT (0.24 * (1 / 393 - 1 / 1003) ) = 2.123
5: P-Value = 2 * ( 1 - NORMSDIST ( ABS ( 2.123 ) ) ) = 3.37%
6: Significance Level = 1 - 3.37% = 96.93%
7: Confidence Level = (80%, 90%, 95%)
A: Column 1: 18-40yrs (61%) is significantly more [GREEN] than Column 2: Total (56%) for the choice: "A little passionate"
Example 3: And again!
Q: Is 41-60yrs Column % significantly different from Total Column % for the choice: "No strong feelings"?
1: Column 1: 41-60yrs n=49 / 393= 14%
2: Column 2: Total n=150 / 1003 = 15%
3: Variance = ( (49 + 150) / (393 + 1003) * (1 - (49 + 150) / (393 + 1003) ) ) / (1 - 1/ (393 + 1003) ) = 0.13
4: T-Value = (ABS (14% - 15%) ) / SQRT (0.13 * (1 / 393 - 1 / 1003) ) = 0.458
5: P-Value = 2 * ( 1 - NORMSDIST ( ABS ( 0.458 ) ) ) = 64.67%
6: Significance Level = 1 - 64.67% = 35.33%
7: Confidence Level = (80%, 90%, 95%) = 95%
A: Column 1: 41-60yrs (14%) is NOT significantly different [NO COLOUR] than Column 2: Total (15%) for the choice: "No strong feelings".
Example 4: Last 1!!
Q: Is 41-60yrs Column % significantly different from Total Column % for the choice: "A little passionate"?
1: Column 1: 41-60yrs n=198 / 393 = 58%
2: Column 2: Total n=566 / 1003 = 56%
3: Variance = ( (198 + 566) / (393 + 1003) * (1 - (198 + 566) / (393 + 1003) ) ) / (1 - 1/ (393+ 1003) ) = 0.25
4: T-Value = (ABS (58% - 56%) ) / SQRT (0.25 * (1 / 393 - 1 / 1003) ) = 0.520
5: P-Value = 2 * ( 1 - NORMSDIST ( ABS ( 0.520 ) ) ) = 60.27%
6: Significance Level = 1 - 60.27% = 39.73%
7: Confidence Level = (80%, 90%, 95%) = 95%
A: Column 1: 41-60yrs (58%) is NOT significantly different [NO COLOUR] than Column 2: Total (56%) for the choice: "A little passionate".
To recap...
Significance Testing V1 should be applied in 2 cases:
- when Values = Row %, then each row is sig tested against the Total (%) row at the bottom
- when Values = Column %, then each column is sig tested agaist the Total (%) column at the far-right
And will work not only for tables (green/red cells) but also for both group bar charts (green/red arrows) and stack bar charts (green/red dots?)
Note: Sig testing when Values = Average will be developed next.
Compatible Question Types
Sig Testing will work with ANY question which has Values set to either Row % or Column %. So works with Choice, Matrix, Rank as well as Scale, Numeric Scale, NPS, Score, and Hidden Variables but also works with Constant Sum, as long as the Values slot is set to Row % or Column %. Looped questions will also be sig-tested.
Sig Testing will only work if the base size of the column / row being compared is less than (not equal to nor greater than) the base size of the total column / row being compared against, i.e. it's a subset of the total.
Note: Sig Testing is best used when Filters are in either the Table Rows or Columns but will still work even if no Filters are in the Rows or Columns.
Here's a couple more examples below.
Matrix Default
In this example, because the base size of each row (1003) equals the base size of the Total row (1003), sig testing formula generates an error and no cells are highlighted green red. However, had the base size, i.e. Total (n), of each row been less than the total row, a whole bunch of cells would have turned red/green.
Matrix Crosstab
If we target the statement: Lunch and cut by age filters, we can see that 18-40yrs were sig more likely to add hot sauce 3 to 7 times p/week while 61+yrs were sig less likely to select those same choices.
Rank Crosstab
In this rank question, for the target choice: Price cut by age and gender filters, we see that 41-60yrs were sig more likey to rank Price as 4th while 61+yrs were sig less likely to rank Price as 1st. Meanwhile, Males were sig more likely to rank Price as 1st compared to total (15% vs. 12%) while Females were sig less likely.
Looped Matrix
Finally, sig testing will work just fine with Looped questions as well.