Libraries.Compute.Statistics.Tests.Regression Documentation

This class conducts an Ordinary Least Squares regression on a DataFrame. By default, an intercept is calculated and included in the model. More information about this kind of statistical test can be found at here: https://en.wikipedia.org/wiki/Ordinary_least_squares. It was adapted from the same model in Apache Commons, but was expanded upon to simplify the library and add a variety of helper actions that were missing. More information about this class can be found on its documentation page: https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/regression/OLSMultipleLinearRegression.html

Example Code

use Libraries.Compute.Statistics.DataFrame
    use Libraries.Compute.Statistics.Columns.NumberColumn
    use Libraries.Containers.Array
    use Libraries.Compute.Statistics.DataFrameColumn
    use Libraries.Compute.Statistics.Tests.Regression

    DataFrame frame
    NumberColumn column0
    column0:SetHeader("y")
    column0:Add("1")
    column0:Add("2")
    column0:Add("3")
    column0:Add("4")
    column0:Add("5")
    column0:Add("6")

    NumberColumn column2
    column2:SetHeader("2")
    column2:Add("12.0")
    column2:Add("6")
    column2:Add("-4")
    column2:Add("1")
    column2:Add("97")
    column2:Add("65")

    NumberColumn column3
    column3:SetHeader("3")
    column3:Add("-51.0")
    column3:Add("167")
    column3:Add("24")
    column3:Add("2")
    column3:Add("120")
    column3:Add("69")

    NumberColumn column4
    column4:SetHeader("4")
    column4:Add("4")
    column4:Add("-68")
    column4:Add("-41")
    column4:Add("3")
    column4:Add("159")
    column4:Add("73")

    Array<DataFrameColumn> columns
    columns:Add(column0)
    columns:Add(column2)
    columns:Add(column3)
    columns:Add(column4)

    frame:SetColumns(columns)
    Regression regression
    regression:SetPredictedColumn("y")
    regression:AddPredictorColumn("2")
    regression:AddPredictorColumn("3")
    regression:AddPredictorColumn("4")
    frame:Calculate(regression)

    //Output a series of attributes about the regression
    output "Beta: " + regression:GetCoefficients():ToText()
    output "Beta-critical values: " + regression:GetCoefficientProbabilityValues():ToText()
    output "Residuals: " + regression:GetResiduals():ToText()
    output "Residual Sum of Squared: " + regression:GetResidualSumOfSquares()
    output "Total Sum of Squared: " + regression:GetTotalSumOfSquares()
    output "F " + regression:GetCriticalValue()
    output "p = " + regression:GetProbabilityValue()
    output "R^2: " + regression:GetEffectSize()

Inherits from: Libraries.Compute.Statistics.DataFrameCalculation, Libraries.Language.Object

Summary

Actions Summary Table

ActionsDescription
AddPredictorColumn(text name)This action adds a column to the set that predict the outcome variable.
Calculate(Libraries.Compute.Statistics.DataFrame frame)
CalculateAdjustedEffectSize(Libraries.Compute.Matrix predictors, number r2, boolean intercept)This action calculates an adjusted effect size.
CalculateCriticalValue(Libraries.Compute.Matrix predictors, number r2)This action returns the critical value for the matrix and the given effect size (R^2).
CalculateDenominatorDegreesOfFreedom(Libraries.Compute.Matrix predictors)This calculates the degrees of freedom of the denominator in the F-ratio.
CalculateEffectSize(number residualSumOfSquares, number totalSumOfSquares)This action returns the effect size for the calculation.
CalculateErrorVariance(Libraries.Compute.Matrix predictors, Libraries.Compute.Vector residuals)This action calculates the total error variance from the residuals.
CalculateNumeratorDegreesOfFreedom(Libraries.Compute.Matrix predictors)This calculates the degrees of freedom of the numerator in the F-ratio.
CalculatePValues(Libraries.Compute.Matrix predictors, Libraries.Compute.MatrixTransform.OrthonormalTriangularDecomposition ortho, Libraries.Compute.Vector residuals, Libraries.Compute.Vector betas)This action calculates the probability values for each beta-coefficient in the model.
CalculateProbabilityValue(Libraries.Compute.Matrix predictors, number criticalValue)This action returns the probability value (p-value) for the overall regression.
CalculateRegressionParametersStandardErrors(Libraries.Compute.Matrix predictors, Libraries.Compute.MatrixTransform.OrthonormalTriangularDecomposition ortho, Libraries.Compute.Vector residuals)This action calculates the standard errors from the residuals.
CalculateResidualSumOfSquares(Libraries.Compute.Vector residuals)This action calculates the residuals.
CalculateResiduals(Libraries.Compute.Vector y, Libraries.Compute.Matrix predictors, Libraries.Compute.Vector b)This action calculates the residuals.
CalculateTotalSumOfSquares(Libraries.Compute.Statistics.DataFrameColumn column, boolean intercept)This action calculates the sum of squares for an instance of this regression.
CalculateVarianceCovarianceMatrix(Libraries.Compute.Matrix predictors, Libraries.Compute.MatrixTransform.OrthonormalTriangularDecomposition ortho)This action calculates the variance-covariance matrix
Compare(Libraries.Language.Object object)This action compares two object hash codes and returns an integer.
Equals(Libraries.Language.Object object)This action determines if two objects are equal based on their hash code values.
GetAdjustedEffectSize()Returns the total adjusted effect size, in statistics typically termed adjusted R^2 (R-squared).
GetCoefficientProbabilityValues()Returns the probability values for the beta coefficients
GetCoefficients()Returns the total beta coefficients.
GetCriticalValue()Returns the critical value
GetEffectSize()Returns the total effect size, in statistics typically termed R^2 (R-squared).
GetFormalSummary()This action summarizes the result and places it into formal academic language, in APA format.
GetHashCode()This action gets the hash code for an object.
GetPredictedColumn()Returns the column name to be predicted (y)
GetProbabilityValue()Returns the probability value
GetResidualSumOfSquares()Returns the total residual sum of squares.
GetResiduals()Returns the residuals.
GetTotalSumOfSquares()Returns the total sum of squares.
HasIntercept()Returns whether or not this regression includes an intercept.
SetHasIntercept(boolean hasIntercept)Sets whether or not this regression includes an intercept.
SetPredictedColumn(text predictedColumn)Returns the column name to be predicted (y)

Actions Documentation

AddPredictorColumn(text name)

This action adds a column to the set that predict the outcome variable.

Parameters

Calculate(Libraries.Compute.Statistics.DataFrame frame)

Parameters

CalculateAdjustedEffectSize(Libraries.Compute.Matrix predictors, number r2, boolean intercept)

This action calculates an adjusted effect size. This adjustment accounts for the number of predictors included in the model.

Parameters

Return

number: The R^2 if the regression is calculated

CalculateCriticalValue(Libraries.Compute.Matrix predictors, number r2)

This action returns the critical value for the matrix and the given effect size (R^2). The calculation for this action is typically termed in statistics an "F-value," an esoteric way of describing the location a result rests on a distribution. We calculate this value by the following equation: R2 / (p - 1) (1 - R^2) / (n - p) While an example is included on how to calculate this value, it is complicated and we highly recommend calculating the regression and just calling GetCriticalValue instead.

Example Code

use Libraries.Compute.Statistics.DataFrame
        use Libraries.Compute.Statistics.DataFrameColumn
        use Libraries.Compute.Statistics.Tests.Regression
    
        DataFrame frame
        frame:Load("Data/Data.csv")
        Regression regression

        DataFrameColumn column = frame:GetColumn(predictedColumn)
        Vector y = column:ConvertToVector()
        Matrix predictorMatrix = transformed:ConvertToMatrix()
        OrthonormalTriangularDecomposition decomp
        decomp:Calculate(predictorMatrix)
        Vector predicted = undefined

        if column:CanConvertToVector()
            predicted = column:ConvertToVector()
        else
            return now
        end

        Vector beta = decomp:Solve(predicted)
        Vector residuals = regression:CalculateResiduals(predicted, predictorMatrix, beta)
        number residualSumOfSquares = regression:CalculateResidualSumOfSquares(residuals)
        number totalSumOfSquares = regression:CalculateTotalSumOfSquares(column, hasIntercept)

        number r2 = regression:CalculateEffectSize(residualSumOfSquares, totalSumOfSquares)
        number fValue = regression:CalculateCriticalValue(predictorMatrix, r2)
        output fValue

Parameters

Return

number: This returns the critical value, typically called "F" in statistics.

CalculateDenominatorDegreesOfFreedom(Libraries.Compute.Matrix predictors)

This calculates the degrees of freedom of the denominator in the F-ratio. It is equivalent to the number of rows in the matrix - the number of columns

Parameters

Return

number: The number of rows in the matrix - the number of columns.

CalculateEffectSize(number residualSumOfSquares, number totalSumOfSquares)

This action returns the effect size for the calculation. The technical name for this effect is "R^2" and the calculation for this is 1 - the residual sum of squares dividided by the total sum of squares.

Parameters

Return

number:

CalculateErrorVariance(Libraries.Compute.Matrix predictors, Libraries.Compute.Vector residuals)

This action calculates the total error variance from the residuals.

Parameters

Return

number:

CalculateNumeratorDegreesOfFreedom(Libraries.Compute.Matrix predictors)

This calculates the degrees of freedom of the numerator in the F-ratio. It is equivalent to the number of columns in the matrix - 1

Parameters

Return

number: The number of columns in the matrix - 1

CalculatePValues(Libraries.Compute.Matrix predictors, Libraries.Compute.MatrixTransform.OrthonormalTriangularDecomposition ortho, Libraries.Compute.Vector residuals, Libraries.Compute.Vector betas)

This action calculates the probability values for each beta-coefficient in the model. This is a complex test and should not be used externally unless you really know what you are doing.

Parameters

Return

Libraries.Compute.Vector: a vector of the probability values

CalculateProbabilityValue(Libraries.Compute.Matrix predictors, number criticalValue)

This action returns the probability value (p-value) for the overall regression. This is sometimes called an "omnibus p-value."

Example Code

use Libraries.Compute.Statistics.DataFrame
        use Libraries.Compute.Statistics.DataFrameColumn
        use Libraries.Compute.Statistics.Tests.Regression
    
        DataFrame frame
        frame:Load("Data/Data.csv")
        Matrix predictorMatrix = transformed:ConvertToMatrix()
        Regression regression
        number p = regression:CalculateProbabilityValue(predictors, 0.9)
        output p

Parameters

Return

number: The probability value.

CalculateRegressionParametersStandardErrors(Libraries.Compute.Matrix predictors, Libraries.Compute.MatrixTransform.OrthonormalTriangularDecomposition ortho, Libraries.Compute.Vector residuals)

This action calculates the standard errors from the residuals.

Parameters

Return

Libraries.Compute.Vector:

CalculateResidualSumOfSquares(Libraries.Compute.Vector residuals)

This action calculates the residuals.

Parameters

Return

number: returns the sum of squares from the residuals

CalculateResiduals(Libraries.Compute.Vector y, Libraries.Compute.Matrix predictors, Libraries.Compute.Vector b)

This action calculates the residuals.

Parameters

Return

Libraries.Compute.Vector: returns the residuals

CalculateTotalSumOfSquares(Libraries.Compute.Statistics.DataFrameColumn column, boolean intercept)

This action calculates the sum of squares for an instance of this regression. It uses a SumOfSquares if there is no Intercept and the second moment if it does.

Example Code

use Libraries.Compute.Statistics.DataFrame
        use Libraries.Compute.Statistics.DataFrameColumn
        use Libraries.Compute.Statistics.Tests.Regression
    
        DataFrame frame
        frame:Load("Data/Data.csv")

        DataFrameColumn column = frame:GetColumn("DT")

        Regression regression
        number value = regression:CalculateTotalSumOfSquares(column, false)
        output value

Parameters

Return

number: the total sum of squares

CalculateVarianceCovarianceMatrix(Libraries.Compute.Matrix predictors, Libraries.Compute.MatrixTransform.OrthonormalTriangularDecomposition ortho)

This action calculates the variance-covariance matrix

Parameters

Return

Libraries.Compute.Matrix:

Compare(Libraries.Language.Object object)

This action compares two object hash codes and returns an integer. The result is larger if this hash code is larger than the object passed as a parameter, smaller, or equal. In this case, -1 means smaller, 0 means equal, and 1 means larger. This action was changed in Quorum 7 to return an integer, instead of a CompareResult object, because the previous implementation was causing efficiency issues.

Example Code

Object o
        Object t
        integer result = o:Compare(t) //1 (larger), 0 (equal), or -1 (smaller)

Parameters

Return

integer: The Compare result, Smaller, Equal, or Larger.

Equals(Libraries.Language.Object object)

This action determines if two objects are equal based on their hash code values.

Example Code

use Libraries.Language.Object
        use Libraries.Language.Types.Text
        Object o
        Text t
        boolean result = o:Equals(t)

Parameters

Return

boolean: True if the hash codes are equal and false if they are not equal.

GetAdjustedEffectSize()

Returns the total adjusted effect size, in statistics typically termed adjusted R^2 (R-squared). This action returns 0 unless the regression has been calculated.

Return

number: The adjusted R^2 if the regression is calculated

GetCoefficientProbabilityValues()

Returns the probability values for the beta coefficients

Return

Libraries.Compute.Vector:

GetCoefficients()

Returns the total beta coefficients. This action returns 0 unless the regression has been calculated.

Return

Libraries.Compute.Vector: The beta coefficients

GetCriticalValue()

Returns the critical value

Return

number:

GetEffectSize()

Returns the total effect size, in statistics typically termed R^2 (R-squared). This action returns 0 unless the regression has been calculated.

Return

number: The R^2 if the regression is calculated

GetFormalSummary()

This action summarizes the result and places it into formal academic language, in APA format.

Return

text:

GetHashCode()

This action gets the hash code for an object.

Example Code

Object o
        integer hash = o:GetHashCode()

Return

integer: The integer hash code of the object.

GetPredictedColumn()

Returns the column name to be predicted (y)

Return

text:

GetProbabilityValue()

Returns the probability value

Return

number:

GetResidualSumOfSquares()

Returns the total residual sum of squares. This action returns 0 unless the regression has been calculated.

Return

number: The residual sum of squares

GetResiduals()

Returns the residuals. This action returns 0 unless the regression has been calculated.

Return

Libraries.Compute.Vector: The residuals

GetTotalSumOfSquares()

Returns the total sum of squares. This action returns 0 unless the regression has been calculated.

Return

number: The total sum of squares

HasIntercept()

Returns whether or not this regression includes an intercept.

Return

boolean:

SetHasIntercept(boolean hasIntercept)

Sets whether or not this regression includes an intercept.

Parameters

SetPredictedColumn(text predictedColumn)

Returns the column name to be predicted (y)

Parameters