Libraries.Compute.Statistics.DataFrame Documentation

The DataFrame class is a collection of columns and rows, like a spreadsheet, that can be used for statistics and other calculations. By default, it can load comma separated files. Other file types can be supported using the Load action with a file loader for the custom type. DataFrame objects can also be transformed using the Transform action, which is useful for sorting, filtering, or other operations. Transforms generally make a copy of the data frame and act on that copy, not the original.

Example Code

use Libraries.Compute.Statistics.DataFrame

//Load a comma separated file
DataFrame frame
frame:Load("Data.csv")

Inherits from: Libraries.Language.Object

Actions Documentation

Add(Libraries.Compute.Statistics.DataFrameSelectionListener listener)

Classes can register as listeners of the selection in the DataFrame.

Parameters

AddColumn(integer index, Libraries.Compute.Statistics.DataFrameColumn column)

This action adds a column to the data frame. It is destructive in that it changes the existing DataFrame without making a copy.

Parameters

Example

//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Columns.NumberColumn
use Libraries.Containers.Array
use Libraries.Compute.Statistics.DataFrameColumn

//Create a DataFrame, which is essentially a table that understands 
//more information about the data that is being loaded.
DataFrame frame

//This creates a NumberColumn, which contains numbers
NumberColumn column
column:SetHeader("My Column")
column:Add(1)
column:Add(2)
column:Add(3)
column:Add(4)
column:Add(5)
column:Add(6)
frame:AddColumn(0, column)

//The system loaded the file, but can also output it a text value, or the console, if we want that.
output frame:ToText()

AddColumn(Libraries.Compute.Statistics.DataFrameColumn column)

This action adds a column to the data frame. It is destructive in that it changes the existing DataFrame without making a copy.

Parameters

Example

//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Columns.NumberColumn
use Libraries.Containers.Array
use Libraries.Compute.Statistics.DataFrameColumn

//Create a DataFrame, which is essentially a table that understands 
//more information about the data that is being loaded.
DataFrame frame

//This creates a NumberColumn, which contains numbers
NumberColumn column
column:SetHeader("My Column")
column:Add(1)
column:Add(2)
column:Add(3)
column:Add(4)
column:Add(5)
column:Add(6)
frame:AddColumn(column)

//The system loaded the file, but can also output it a text value, or the console, if we want that.
output frame:ToText()

AddColumn(text column, text source)

This action takes a Quorum expression text value and then creates a new column the DataFrame. The expression follows the normal rules for Quorum, using the DataFrame's columns as the allowable variables. For example, if a DataFrame has a column, Group, and is an integer, then a value like Group * 2 would take the value of Group, multiply it by 2, and then do that for each row. If a row is an invalid type, an undefined value is placed at that position. The AddColumn(text, text) call is not destructive, meaning it adds to the DataFrame, but does not change the original data.

Parameters

  • text column
  • text source

Example

use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("file.csv")
frame:AddColumn("Group * 3")
output frame:ToText()

AddColumnOnLoad(integer index, Libraries.Compute.Statistics.DataFrameColumn column)

This action adds a column that, when the DataFrame is loaded will be used for processing a particular column. This will allow the loader to use customized type information specific to a particular file or situation.

Parameters

  • integer index: the position of the index on loading. For example, an index of means the column at index 0, if one is loaded.
  • Libraries.Compute.Statistics.DataFrameColumn: the DataFrameColumn to use and enter into the DataFrame.

Example

use Libraries.Compute.Statistics.DataFrame

DataFrame frame
NumberColumn column
frame:AddColumnOnLoad(0, column)

frame:Load("Data/Sheet.csv")
output frame:ToText()

AddSelectedCell(Libraries.Containers.Support.Pair<integer> cell)

This adds a row to the selected range.

Parameters

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Containers.Support.Pair

DataFrame frame
frame:Load("Data.csv")
Pair<integer> cell
cell:Set(0,0)
frame:AddSelectedCell(cell)
output frame:ToText()

AddSelectedCell(integer x, integer y)

This adds a row to the selected range.

Parameters

  • integer x: the x coordinate of the cell to add
  • integer y: the y coordinate of the cell to add

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedCell(0,0)
output frame:ToText()

AddSelectedColumn(integer index)

This adds a column to the selected range.

Parameters

  • integer index: the column index of the column to add

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
output frame:ToText()

AddSelectedColumnRange(integer start, integer finish)

This adds adds columns to the selected range, starting from start and ending at finish, inclusive. In this case, this means that calculations will be conducted across this entire range.

Parameters

  • integer start: the start of the range
  • integer finish: the end of the range

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0, 2)
output frame:ToText()

AddSelectedColumns(text headers)

This action reads a comma separated list of header names and determines the indices from this list. This action is inherently strict, where if the parsing fails, the headers are not unique, or there are other issues in the list, this action throws an error.

Parameters

  • text headers: the columns to select

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumns("name1,name2")
output frame:GetSelectedColumnSize()

AddSelectedFactor(integer index)

This adds a factor of a particular index anywhere from the selection.

Parameters

  • integer index: the index of the factor to add

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedFactor(0)
output frame:ToText()

AddSelectedFactorRange(integer start, integer finish)

This adds adds factors to the selected range, starting from start and ending at finish, inclusive. In this case, this means that calculations will be conducted across this entire range.

Parameters

  • integer start: the start of the range
  • integer finish: the end of the range

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedFactorRange(0, 2)
output frame:ToText()

AddSelectedFactors(text headers)

This action reads a comma separated list of header names and determines the indices from this list. This action is inherently strict, where if the parsing fails, the headers are not unique, or there are other issues in the list, this action throws an error.

Parameters

  • text headers: the columns to select

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedFactors("name1,name2")
output frame:GetSelectedColumnSize()

AddSelectedRow(integer index)

This adds a row to the selected range.

Parameters

  • integer index: the row index of the row to add

Example

use Libraries.Compute.Statistics.DataFrame
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedRow(0)
output frame:ToText()

BarChart()

This action creates a BarChart from the current column selection in the DataFrame. By default, it uses the first column in the selection as the x-axis and the second column as the y-axis. This can be reversed by changing the selection order.

Return

Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart

DataFrame frame
frame:Load("Data.csv")
frame:SetSelectedColumnRange(0,1)
BarChart chart = frame:BarChart()
chart:SetTitle("My Awesome Title")
chart:SetXAxisTitle("Time")
chart:Display()

BarChartByColumn()

This action creates a BarChart from the sum of values of selected columns grouped by the selected factors.

Return

Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)

BarChart chart = frame:BarChartByColumnSum()
chart:Display()

BarChartByColumnMaximum()

This action creates a BarChart from the max of values of selected columns grouped by the selected factors.

Return

Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)

BarChart chart = frame:BarChartByColumnMaximum()
chart:Display()

BarChartByColumnMean()

This action creates a BarChart from the mean of values of selected columns grouped by the selected factors.

Return

Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)

BarChart chart = frame:BarChartByColumnMean()
chart:Display()

BarChartByColumnMinimum()

This action creates a BarChart from the min of values of selected columns grouped by the selected factors.

Return

Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)

BarChart chart = frame:BarChartByColumnMinimum()
chart:Display()

BarChartByColumnSum()

This action creates a BarChart from the sum of values of selected columns grouped by the selected factors.

Return

Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)

BarChart chart = frame:BarChartByColumnSum()
chart:Display()

BoxPlot()

This action creates a BoxPlot from the current column selection in the DataFrame. By default, it uses all columns as separate values in the selection as the chart area. Multiple columns will result in multiple plots of different colors labeled along the the x-axis. If a factor is given, the plots will be grouped based by that factor.

Return

Libraries.Interface.Controls.Charts.BoxPlot: a chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BoxPlot

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(1)
BoxPlot chart = frame:BoxPlot()
chart:SetTitle("My Awesome Title")
chart:SetXAxisTitle("Time")
chart:Display()

BoxPlotByColumn()

This action creates a BoxPlot from the of values of selected columns grouped by the selected factors.

Return

Libraries.Interface.Controls.Charts.BoxPlot: a BoxPlot chart that can be displayed or placed into a user interface or game.

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BoxPlot

DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)

BoxPlot chart = frame:BoxPlotByColumn()
chart:Display()

Calculate(Libraries.Compute.Statistics.DataFrameCalculation calculation)

This action runs a calculation on the data frame. Calculations are not intended to be destructive to the original data.

Parameters

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.System.File

//Load a comma separated file
DataFrame frame
File file
file:SetPath("Data.csv")
frame:Load(file)

CalculateColumn(text source)

This action takes a Quorum expression text value and then creates a new column without adding it to the DataFrame. The expression follows the normal rules for Quorum, using the DataFrame's columns as the allowable variables. For example, if a DataFrame has a column, Group, and is an integer, then a value like Group * 2 would take the value of Group, multiply it by 2, and then do that for each row. If a row is an invalid type, an undefined value is placed at that position. The CalculateColumn(text) call is not destructive, meaning it does not change the original frame.

Parameters

  • text source

Return

Libraries.Compute.Statistics.DataFrameColumn:

Example

use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("file.csv")
DataFrameColumn col = frame:CalculateColumn("Group * 3")
output col:ToText()

CalculateMaximumRows()

This action calculates the total number of rows in the data frame. To do this, it traverses the columns, finds the column with the max row count, and returns that integer.

Return

integer: the row count of the column with the largest number of rows

Example

use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("file.csv")
output frame:CalculateMaximumRows()

CheckReducibility()

Check that at least some of the variables have significant correlation, a prerequisite for factor analysis. The CheckReducibility object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on how many columns are selected: 2+ Bartlett’s Test of Sphericity

Return

Libraries.Compute.Statistics.Tests.CheckReducibility: an object representing the test

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CheckReducibility
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0,4)
CheckReducibility test = frame:CheckReducibility()
output test:GetFormalSummary()

CheckReducibilityStrength()

Measures sampling adequacy for each variable in the model and for the complete model, a prerequisite for factor analysis to work. The CheckReducibilityStrength object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on how many columns are selected: 2+ Kaiser-Meyer-Olkin Measure Of Sampling Adequacy

Return

Libraries.Compute.Statistics.Tests.CheckReducibilityStrength: an object representing the test

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CheckReducibilityStrength
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0,4)
CheckReducibilityStrength test = frame:CheckReducibilityStrength()
output test:GetFormalSummary()

Compare(Libraries.Language.Object object)

This action compares two object hash codes and returns an integer. The result is larger if this hash code is larger than the object passed as a parameter, smaller, or equal. In this case, -1 means smaller, 0 means equal, and 1 means larger. This action was changed in Quorum 7 to return an integer, instead of a CompareResult object, because the previous implementation was causing efficiency issues.

Parameters

Return

integer: The Compare result, Smaller, Equal, or Larger.

Example

Object o
Object t
integer result = o:Compare(t) //1 (larger), 0 (equal), or -1 (smaller)

CompareCounts()

This action uses the selection to conduct a count comparison between one or more columns. The CompareCounts object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on how many columns are selected: 1 Chi-Squared Goodness Of Fit vs uniform expected counts 2 Chi-Squared Test Of Independence 3+ Pairwise Chi-Squared Test Of Independence

Return

Libraries.Compute.Statistics.Tests.CompareCounts: an object representing the comparison

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
    
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0,2)
CompareCounts compare = frame:CompareCounts()
output compare:GetSummary()

CompareCounts(Libraries.Compute.Statistics.Tests.ExperimentalDesign design)

This action uses an experimental design to pick and conduct the appropriate CompareCounts test.

Parameters

Return

Libraries.Compute.Statistics.Tests.CompareCounts:

Example

use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
use Libraries.Compute.Statistics.Tests.ExperimentalDesign
    
ExperimentalDesign design
design:AddBetweenSubjectsFactor("Group")
design:AddDependentVariable("Answer")

DataFrame frame
frame:Load("Data.csv")
CompareCounts compare = frame:CompareCounts(design)
output compare:GetFormalSummary()

CompareCountsPairwise(Libraries.Compute.Statistics.Tests.ExperimentalDesign design)

This action uses the selection to conduct a comparison between groups. The CompareCountsPairwise object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on the selections made in the design.

Parameters

Return

Libraries.Compute.Statis