Libraries.Compute.Statistics.Tests.CompareCounts Documentation

This class conducts a Pearson's chi-squared test on a DataFrame. Pearson's chi-squared test is used to assess three types of comparison: Goodness of fit Test of independence Test of homogeneity - not implemented More information about this kind of statistical test can be found here: https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test It was adapted from the same model in Apache Commons, but was expanded upon to simplify the library and add a variety of helper actions that were missing. More information about this class can be found on its documentation page ChiSquaredTest: https://commons.apache.org/proper/commons-math/javadocs/api-3.6.1/index.html

Example Code

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts

DataFrame frame
frame:Load("data.csv")
frame:AddSelectedColumns(0)
frame:AddSelectedColumns(1)

CompareCounts compare = frame:CompareSelectedCounts()
output compare:GetSummary()

Inherits from: Libraries.Compute.Statistics.DataFrameCalculation, Libraries.Compute.Statistics.Tests.StatisticalTest, Libraries.Language.Object, Libraries.Compute.Statistics.Inputs.ColumnInput, Libraries.Compute.Statistics.Inputs.FactorInput

Actions Documentation

AddColumn(integer column)

This action adds a value to the end of the input.

Parameters

  • integer column

AddFactor(integer column)

This action adds a value to the end of the input.

Parameters

  • integer column

Calculate(Libraries.Compute.Statistics.DataFrame frame)

Compare(Libraries.Language.Object object)

This action compares two object hash codes and returns an integer. The result is larger if this hash code is larger than the object passed as a parameter, smaller, or equal. In this case, -1 means smaller, 0 means equal, and 1 means larger. This action was changed in Quorum 7 to return an integer, instead of a CompareResult object, because the previous implementation was causing efficiency issues.

Parameters

Return

integer: The Compare result, Smaller, Equal, or Larger.

Example

Object o
Object t
integer result = o:Compare(t) //1 (larger), 0 (equal), or -1 (smaller)

EmptyColumns()

This action empty's the list, clearing out all of the items contained within it.

EmptyFactors()

This action empty's the list, clearing out all of the items contained within it.

Equals(Libraries.Language.Object object)

This action determines if two objects are equal based on their hash code values.

Parameters

Return

boolean: True if the hash codes are equal and false if they are not equal.

Example

use Libraries.Language.Object
use Libraries.Language.Types.Text
Object o
Text t
boolean result = o:Equals(t)

GetColumn(integer index)

This action gets the item at a given location in an array.

Parameters

  • integer index

Return

integer: The item at the given location.

GetColumnIterator()

This action gets an iterator for the object and returns that iterator.

Return

Libraries.Containers.Iterator: Returns the iterator for an object.

GetColumnSize()

This action gets the size of the array.

Return

integer:

GetDegreesOfFreedom()

This returns the degrees of freedom if only one result exists.

Return

number: the Degrees of Freedom.

GetExpected()

This returns the expected frame if only one result exists.

Return

Libraries.Compute.Statistics.DataFrame: the expected frame.

GetFactor(integer index)

This action gets the item at a given location in an array.

Parameters

  • integer index

Return

integer: The item at the given location.

GetFactorIterator()

This action gets an iterator for the object and returns that iterator.

Return

Libraries.Containers.Iterator: Returns the iterator for an object.

GetFactorSize()

This action gets the size of the array.

Return

integer:

GetFactorText()

Return

text

GetFormalSummary()

This action summarizes the results and places them into formal academic language, in APA format. For more information: https://apastyle.apa.org/instructional-aids/numbers-statistics-guide.pdf

Return

text:

GetGroups(Libraries.Compute.Statistics.DataFrame frame)

Gets the the fully factored samples/groups in an array of dataframes. Using an array of dataframes instead of a single dataframe helps with multivariate cases.

Parameters

Return

Libraries.Containers.HashTable:

GetHashCode()

This action gets the hash code for an object.

Return

integer: The integer hash code of the object.

Example

Object o
integer hash = o:GetHashCode()

GetObserved()

This returns the observed frame if only one result exists.

Return

Libraries.Compute.Statistics.DataFrame: the observed frame.

GetProbabilityValue()

This returns the probability if only one result exists.

Return

number: the P-Value.

GetResiduals()

This returns the residuals frame if only one result exists.

Return

Libraries.Compute.Statistics.DataFrame: the residuals frame.

GetResult()

This returns a result if only one exists. If there are more than one, this action returns undefined.

Return

Libraries.Compute.Statistics.Reporting.CompareCountsResult: the CompareCountsResult.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns("region")
CompareCounts compare = frame:CompareSelectedCounts()

CompareCountsResult result = compare:GetResult()

GetResult(integer columnIndex)

This returns a result on one particular column. If no such result exists, this action returns undefined.

Parameters

  • integer columnIndex

Return

Libraries.Compute.Statistics.Reporting.CompareCountsResult: the CompareCountsResult for one group.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns(0)
CompareCounts compare = frame:CompareSelectedCounts()

CompareCountsResult result = compare:GetResult(0)

GetResult(text column1Name, text column2Name)

This returns a result between two particular columns. If no such result exists, this action returns undefined.

Parameters

  • text column1Name
  • text column2Name

Return

Libraries.Compute.Statistics.Reporting.CompareCountsResult: the CompareCountsResult between two groups.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns("region")
frame:AddSelectedColumns("age")
frame:AddSelectedColumns("bmi")
CompareCounts compare = frame:CompareSelectedCounts()

CompareCountsResult result = compare:GetResult("age", "region")

GetResult(integer column1Index, integer column2Index)

This returns a result between two particular columns. If no such result exists, this action returns undefined.

Parameters

  • integer column1Index
  • integer column2Index

Return

Libraries.Compute.Statistics.Reporting.CompareCountsResult: the CompareCountsResult between two groups.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns(0)
frame:AddSelectedColumns(1)
frame:AddSelectedColumns(2)
CompareCounts compare = frame:CompareSelectedCounts()

CompareCountsResult result = compare:GetResult(0, 1)

GetResult(text columnName)

This returns a result on one particular column. If no such result exists, this action returns undefined.

Parameters

  • text columnName

Return

Libraries.Compute.Statistics.Reporting.CompareCountsResult: the CompareCountsResult for one group.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns("region")
CompareCounts compare = frame:CompareSelectedCounts()

CompareCountsResult result = compare:GetResult("region")

GetResults()

This returns the results between all computed columns.

Return

Libraries.Containers.Array: the CompareCountsResults.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
    
CompareCounts compare
compare:AddColumn(0)
compare:AddColumn(1)
compare:AddColumn(2)
frame:Calculate(compare)

Array<CompareCountsResult> results = compare:GetResults()

GetSignificanceLevel()

A list of unique items of the factor

Return

number:

GetStatisticalFormatting()

GetSummary()

This action summarizes the results and lists them informally.

Return

text:

GetTestStatistic()

This returns the x2 test statistic if only one result exists.

Return

number: the x2 test statistic.

GoodnessOfFit(Libraries.Compute.Statistics.DataFrame frame)

This action represents a goodness of fit chi-squared test on a selected columns of data. It calculates the observed values by counting the frequencies of unique items. It then calculates the expected counts (expecting an equal distribution) and compares the two to get the x2 value. H0: The population fits a uniform distribution. Ha: The population does not fit a uniform distribution.

Parameters

Example


    use Libraries.Compute.Statistics.DataFrame
    use Libraries.Compute.Statistics.Tests.CompareCounts

    DataFrame frame
    frame:Load("data.csv")
    frame:AddSelectedColumns(0)

    CompareCounts compare = frame:CompareSelectedCounts()
    output compare:GetSummary()

GoodnessOfFitAgainstExpectedCounts(Libraries.Compute.Statistics.DataFrame frame, Libraries.Compute.Statistics.DataFrame expected)

This action represents a goodness of fit chi-squared test on a single column of data. It calculates the observed values by counting the frequencies of unique items. Then it compares the observed with the user-supplied expected counts. H0: The population fits the given distribution. Ha: The population does not fit the given distribution.

Parameters

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns("smoker")

TextColumn category
category:Add("yes")
category:Add("no")

NumberColumn count
count:Add(60)
count:Add(50)

DataFrame expected
expected:AddColumn(category)
expected:AddColumn(count)

CompareCounts compare
frame:GetSelection():CopyTo(cast(ColumnInput, compare))
compare:GoodnessOfFitAgainstExpectedCounts(frame, expected)
compare:GetSummary()

GoodnessOfFitAgainstExpectedPercents(Libraries.Compute.Statistics.DataFrame frame, Libraries.Compute.Statistics.DataFrameColumn percents)

This action represents a goodness of fit chi-squared test on one or more columns of data. For each column, it calculates the observed values by counting the frequencies of unique items. Then it compares the observed with the user-supplied expected percentages. The percentages must add up to 1.0, and there must be a percent for each category. H0: The population fits the given distribution. Ha: The population does not fit the given distribution.

Parameters

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns("smoker")

NumberColumn percent
percent:Add(0.4)
percent:Add(0.6)

CompareCounts compare
frame:GetSelection():CopyTo(cast(ColumnInput, compare))
compare:GoodnessOfFitAgainstExpectedPercents(frame, percent)
compare:GetSummary()

GoodnessOfFitAgainstExpectedPercents(Libraries.Compute.Statistics.DataFrame frame, Libraries.Compute.Statistics.DataFrame percents)

This action represents a goodness of fit chi-squared test on one or more columns of data. For each column, it calculates the observed values by counting the frequencies of unique items. Then it compares the observed with the user-supplied expected percentages. The percentages must add up to 1.0, and there must be a percent for each category. H0: The population fits the given distribution. Ha: The population does not fit the given distribution.

Parameters

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data/Data.csv")
frame:AddSelectedColumns("smoker")

TextColumn category
category:Add("yes")
category:Add("no")

NumberColumn percent
percent:Add(0.4)
percent:Add(0.6)

DataFrame expected
expected:AddColumn(category)
expected:AddColumn(percent)

CompareCounts compare
frame:GetSelection():CopyTo(cast(ColumnInput, compare))
compare:GoodnessOfFitAgainstExpectedPercents(frame, expected)
compare:GetSummary()

IsEmptyColumns()

This action returns a boolean value, true if the container is empty and false if it contains any items.

Return

boolean: Returns true when the container is empty and false when it is not.

IsEmptyFactors()

This action returns a boolean value, true if the container is empty and false if it contains any items.

Return

boolean: Returns true when the container is empty and false when it is not.

RemoveColumn(integer column)

This action removes the first occurrence of an item that is found in the Addable object.

Parameters

  • integer column

Return

boolean: Returns true if the item was removed and false if it was not removed.

RemoveColumnAt(integer index)

This action removes an item from an indexed object and returns that item.

Parameters

  • integer index

RemoveFactor(integer column)

This action removes the first occurrence of an item that is found in the Addable object.

Parameters

  • integer column

Return

boolean: Returns true if the item was removed and false if it was not removed.

RemoveFactorAt(integer index)

This action removes an item from an indexed object and returns that item.

Parameters

  • integer index

SetSignificanceLevel(number significanceLevel)

Sets the significance level of the test (default is 0.05).

Parameters

  • number significanceLevel: the significance level between 0 and 1.

SetStatisticalFormatting(Libraries.Compute.Statistics.Reporting.StatisticsFormatting formatting)

Create a new frame based on that list

Parameters

TestOfIndependence(Libraries.Compute.Statistics.DataFrame frame)

This action represents a pairwise test of independence chi-squared test on two columns of data. It calculates the observed values by counting the frequencies of unique items. It then calculates the expected counts and compares the two to get the x2 value. H0: The two variables are independent. Ha: The two variables are not independent.

Parameters

Example


    use Libraries.Compute.Statistics.DataFrame
    use Libraries.Compute.Statistics.Tests.CompareCounts

    DataFrame frame
    frame:Load("data.csv")
    frame:AddSelectedColumns(0)
    frame:AddSelectedColumns(1)

    CompareCounts compare = frame:CompareSelectedCounts()
    output compare:GetSummary()

UseFactor()

Return

boolean