Libraries.Compute.Statistics.DataFrame Documentation
The DataFrame class is a collection of columns and rows, like a spreadsheet, that can be used for statistics and other calculations. By default, it can load comma separated files. Other file types can be supported using the Load action with a file loader for the custom type. DataFrame objects can also be transformed using the Transform action, which is useful for sorting, filtering, or other operations. Transforms generally make a copy of the data frame and act on that copy, not the original.
Example Code
use Libraries.Compute.Statistics.DataFrame
//Load a comma separated file
DataFrame frame
frame:Load("Data.csv")
Inherits from: Libraries.Language.Object
Actions Documentation
Add(Libraries.Compute.Statistics.DataFrameSelectionListener listener)
Classes can register as listeners of the selection in the DataFrame.
Parameters
AddColumn(integer index, Libraries.Compute.Statistics.DataFrameColumn column)
This action adds a column to the data frame. It is destructive in that it changes the existing DataFrame without making a copy.
Parameters
- integer index: the position of the column
- Libraries.Compute.Statistics.DataFrameColumn: The column to add.
Example
//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Columns.NumberColumn
use Libraries.Containers.Array
use Libraries.Compute.Statistics.DataFrameColumn
//Create a DataFrame, which is essentially a table that understands
//more information about the data that is being loaded.
DataFrame frame
//This creates a NumberColumn, which contains numbers
NumberColumn column
column:SetHeader("My Column")
column:Add(1)
column:Add(2)
column:Add(3)
column:Add(4)
column:Add(5)
column:Add(6)
frame:AddColumn(0, column)
//The system loaded the file, but can also output it a text value, or the console, if we want that.
output frame:ToText()
AddColumn(Libraries.Compute.Statistics.DataFrameColumn column)
This action adds a column to the data frame. It is destructive in that it changes the existing DataFrame without making a copy.
Parameters
- Libraries.Compute.Statistics.DataFrameColumn: The column to add.
Example
//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Columns.NumberColumn
use Libraries.Containers.Array
use Libraries.Compute.Statistics.DataFrameColumn
//Create a DataFrame, which is essentially a table that understands
//more information about the data that is being loaded.
DataFrame frame
//This creates a NumberColumn, which contains numbers
NumberColumn column
column:SetHeader("My Column")
column:Add(1)
column:Add(2)
column:Add(3)
column:Add(4)
column:Add(5)
column:Add(6)
frame:AddColumn(column)
//The system loaded the file, but can also output it a text value, or the console, if we want that.
output frame:ToText()
AddColumn(text column, text source)
This action takes a Quorum expression text value and then creates a new column the DataFrame. The expression follows the normal rules for Quorum, using the DataFrame's columns as the allowable variables. For example, if a DataFrame has a column, Group, and is an integer, then a value like Group * 2 would take the value of Group, multiply it by 2, and then do that for each row. If a row is an invalid type, an undefined value is placed at that position. The AddColumn(text, text) call is not destructive, meaning it adds to the DataFrame, but does not change the original data.
Parameters
- text column
- text source
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("file.csv")
frame:AddColumn("Group * 3")
output frame:ToText()
AddColumnOnLoad(integer index, Libraries.Compute.Statistics.DataFrameColumn column)
This action adds a column that, when the DataFrame is loaded will be used for processing a particular column. This will allow the loader to use customized type information specific to a particular file or situation.
Parameters
- integer index: the position of the index on loading. For example, an index of means the column at index 0, if one is loaded.
- Libraries.Compute.Statistics.DataFrameColumn: the DataFrameColumn to use and enter into the DataFrame.
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
NumberColumn column
frame:AddColumnOnLoad(0, column)
frame:Load("Data/Sheet.csv")
output frame:ToText()
AddSelectedCell(Libraries.Containers.Support.Pair<integer> cell)
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Containers.Support.Pair
DataFrame frame
frame:Load("Data.csv")
Pair<integer> cell
cell:Set(0,0)
frame:AddSelectedCell(cell)
output frame:ToText()
AddSelectedCell(integer x, integer y)
This adds a row to the selected range.
Parameters
- integer x: the x coordinate of the cell to add
- integer y: the y coordinate of the cell to add
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedCell(0,0)
output frame:ToText()
AddSelectedColumn(integer index)
This adds a column to the selected range.
Parameters
- integer index: the column index of the column to add
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
output frame:ToText()
AddSelectedColumnRange(integer start, integer finish)
This adds adds columns to the selected range, starting from start and ending at finish, inclusive. In this case, this means that calculations will be conducted across this entire range.
Parameters
- integer start: the start of the range
- integer finish: the end of the range
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0, 2)
output frame:ToText()
AddSelectedColumns(text headers)
This action reads a comma separated list of header names and determines the indices from this list. This action is inherently strict, where if the parsing fails, the headers are not unique, or there are other issues in the list, this action throws an error.
Parameters
- text headers: the columns to select
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumns("name1,name2")
output frame:GetSelectedColumnSize()
AddSelectedFactor(integer index)
This adds a factor of a particular index anywhere from the selection.
Parameters
- integer index: the index of the factor to add
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedFactor(0)
output frame:ToText()
AddSelectedFactorRange(integer start, integer finish)
This adds adds factors to the selected range, starting from start and ending at finish, inclusive. In this case, this means that calculations will be conducted across this entire range.
Parameters
- integer start: the start of the range
- integer finish: the end of the range
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedFactorRange(0, 2)
output frame:ToText()
AddSelectedFactors(text headers)
This action reads a comma separated list of header names and determines the indices from this list. This action is inherently strict, where if the parsing fails, the headers are not unique, or there are other issues in the list, this action throws an error.
Parameters
- text headers: the columns to select
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedFactors("name1,name2")
output frame:GetSelectedColumnSize()
AddSelectedRow(integer index)
This adds a row to the selected range.
Parameters
- integer index: the row index of the row to add
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedRow(0)
output frame:ToText()
BarChart()
This action creates a BarChart from the current column selection in the DataFrame. By default, it uses the first column in the selection as the x-axis and the second column as the y-axis. This can be reversed by changing the selection order.
Return
Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart
DataFrame frame
frame:Load("Data.csv")
frame:SetSelectedColumnRange(0,1)
BarChart chart = frame:BarChart()
chart:SetTitle("My Awesome Title")
chart:SetXAxisTitle("Time")
chart:Display()
BarChartByColumn()
This action creates a BarChart from the sum of values of selected columns grouped by the selected factors.
Return
Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)
BarChart chart = frame:BarChartByColumnSum()
chart:Display()
BarChartByColumnMaximum()
This action creates a BarChart from the max of values of selected columns grouped by the selected factors.
Return
Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)
BarChart chart = frame:BarChartByColumnMaximum()
chart:Display()
BarChartByColumnMean()
This action creates a BarChart from the mean of values of selected columns grouped by the selected factors.
Return
Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)
BarChart chart = frame:BarChartByColumnMean()
chart:Display()
BarChartByColumnMinimum()
This action creates a BarChart from the min of values of selected columns grouped by the selected factors.
Return
Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)
BarChart chart = frame:BarChartByColumnMinimum()
chart:Display()
BarChartByColumnSum()
This action creates a BarChart from the sum of values of selected columns grouped by the selected factors.
Return
Libraries.Interface.Controls.Charts.BarChart: a BarChart chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BarChart
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)
BarChart chart = frame:BarChartByColumnSum()
chart:Display()
BoxPlot()
This action creates a BoxPlot from the current column selection in the DataFrame. By default, it uses all columns as separate values in the selection as the chart area. Multiple columns will result in multiple plots of different colors labeled along the the x-axis. If a factor is given, the plots will be grouped based by that factor.
Return
Libraries.Interface.Controls.Charts.BoxPlot: a chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BoxPlot
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(1)
BoxPlot chart = frame:BoxPlot()
chart:SetTitle("My Awesome Title")
chart:SetXAxisTitle("Time")
chart:Display()
BoxPlotByColumn()
This action creates a BoxPlot from the of values of selected columns grouped by the selected factors.
Return
Libraries.Interface.Controls.Charts.BoxPlot: a BoxPlot chart that can be displayed or placed into a user interface or game.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Interface.Controls.Charts.BoxPlot
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumn(0)
frame:AddSelectedFactor(3)
BoxPlot chart = frame:BoxPlotByColumn()
chart:Display()
Calculate(Libraries.Compute.Statistics.DataFrameCalculation calculation)
This action runs a calculation on the data frame. Calculations are not intended to be destructive to the original data.
Parameters
- Libraries.Compute.Statistics.DataFrameCalculation: The calculation we want conducted on this frame.
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.System.File
//Load a comma separated file
DataFrame frame
File file
file:SetPath("Data.csv")
frame:Load(file)
CalculateColumn(text source)
This action takes a Quorum expression text value and then creates a new column without adding it to the DataFrame. The expression follows the normal rules for Quorum, using the DataFrame's columns as the allowable variables. For example, if a DataFrame has a column, Group, and is an integer, then a value like Group * 2 would take the value of Group, multiply it by 2, and then do that for each row. If a row is an invalid type, an undefined value is placed at that position. The CalculateColumn(text) call is not destructive, meaning it does not change the original frame.
Parameters
- text source
Return
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("file.csv")
DataFrameColumn col = frame:CalculateColumn("Group * 3")
output col:ToText()
CalculateMaximumRows()
This action calculates the total number of rows in the data frame. To do this, it traverses the columns, finds the column with the max row count, and returns that integer.
Return
integer: the row count of the column with the largest number of rows
Example
use Libraries.Compute.Statistics.DataFrame
DataFrame frame
frame:Load("file.csv")
output frame:CalculateMaximumRows()
CheckReducibility()
Check that at least some of the variables have significant correlation, a prerequisite for factor analysis. The CheckReducibility object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on how many columns are selected: 2+ Bartlett’s Test of Sphericity
Return
Libraries.Compute.Statistics.Tests.CheckReducibility: an object representing the test
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CheckReducibility
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0,4)
CheckReducibility test = frame:CheckReducibility()
output test:GetFormalSummary()
CheckReducibilityStrength()
Measures sampling adequacy for each variable in the model and for the complete model, a prerequisite for factor analysis to work. The CheckReducibilityStrength object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on how many columns are selected: 2+ Kaiser-Meyer-Olkin Measure Of Sampling Adequacy
Return
Libraries.Compute.Statistics.Tests.CheckReducibilityStrength: an object representing the test
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CheckReducibilityStrength
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0,4)
CheckReducibilityStrength test = frame:CheckReducibilityStrength()
output test:GetFormalSummary()
Compare(Libraries.Language.Object object)
This action compares two object hash codes and returns an integer. The result is larger if this hash code is larger than the object passed as a parameter, smaller, or equal. In this case, -1 means smaller, 0 means equal, and 1 means larger. This action was changed in Quorum 7 to return an integer, instead of a CompareResult object, because the previous implementation was causing efficiency issues.
Parameters
- Libraries.Language.Object: The object to compare to.
Return
integer: The Compare result, Smaller, Equal, or Larger.
Example
Object o
Object t
integer result = o:Compare(t) //1 (larger), 0 (equal), or -1 (smaller)
CompareCounts()
This action uses the selection to conduct a count comparison between one or more columns. The CompareCounts object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on how many columns are selected: 1 Chi-Squared Goodness Of Fit vs uniform expected counts 2 Chi-Squared Test Of Independence 3+ Pairwise Chi-Squared Test Of Independence
Return
Libraries.Compute.Statistics.Tests.CompareCounts: an object representing the comparison
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
DataFrame frame
frame:Load("Data.csv")
frame:AddSelectedColumnRange(0,2)
CompareCounts compare = frame:CompareCounts()
output compare:GetSummary()
CompareCounts(Libraries.Compute.Statistics.Tests.ExperimentalDesign design)
This action uses an experimental design to pick and conduct the appropriate CompareCounts test.
Parameters
Return
Example
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Tests.CompareCounts
use Libraries.Compute.Statistics.Tests.ExperimentalDesign
ExperimentalDesign design
design:AddBetweenSubjectsFactor("Group")
design:AddDependentVariable("Answer")
DataFrame frame
frame:Load("Data.csv")
CompareCounts compare = frame:CompareCounts(design)
output compare:GetFormalSummary()
CompareCountsPairwise(Libraries.Compute.Statistics.Tests.ExperimentalDesign design)
This action uses the selection to conduct a comparison between groups. The CompareCountsPairwise object returned gives information back in several formats, including text formatted in the American Psychological Association (APA) style. This action runs a test based on the selections made in the design.
Parameters
- Libraries.Compute.Statistics.Tests.ExperimentalDesign: This is the design of the model holding all the factors and variables.