Replacing Undefined Values

This tutorial demonstrates how to replace values in our datasets

Replacing Undefined Values

In addition to removing undefined values, sometimes we want to replace them with an alternative. For example, if we had a DataFrame with integers, and a few were undefined, then we might want to replace them with zero. While we should always use caution when replacing data in a data set, as a practical matter, sometimes replacing undefined values is desirable in the field of data science.

To do this, there are two actions we can use, ReplaceUndefined(text) and ReplaceUndefinedFromSelectedColumns(text). They both do essentially the same operation, which is to ask the column to replace any undefined values with a value represented via the text.

Replacing Undefined Values Helper Functions
Class / ActionDescriptionUsage
dataFrameObject:ReplaceUndefined(text replacedElement)This action will replace any empty element within the dataset with the text value passed across the whole DataFrame.frame:ReplaceUndefined("0")
dataFrameObject:ReplaceUndefinedSelectedColumns(text replacedElement)This action will replace any empty element within the dataset with the text value passed given a specified numerical or non-numerical column.frame:ReplaceUndefinedSelectedColumns("0")

For example, suppose we have an integer column and we send it the text value of 'Dog.' In such a case, this is not a valid integer. With invalid values, any column that the text value is not related to automatically ignores the request. If, however, the value is valid for a particular column, like it would be in a text column, then the undefined values are all replaced. For the actions, the difference between them is that one does the replacement across the entire DataFrame, while the other does it for selected columns only.

The reason why we want to replace bad data values is to increase the usability of a dataset. Typically, in data science, columns with values that are all of the same type and filed in are typically more reliable. Why this is important is because when we need to summarize our data (whether it be through a chart or using some form of descriptive statistics), we would not want the risk of any corrupted data (data that is unable to be calculated by any actions we use to manipulate our DataFrame) or in a case where major outliers are present.

In this example, we are going to take a look at the 'Other monetary comp' column within the AskAManager.csv. Here is a snippet of this column from our dataset.

Snippet of AskAManager.csv
Other monetary comp

As we can see here within the first few rows, some data entries are missing, and we are going to clean that up. We will be calling the action ReplaceUndefinedSelectedColumns(text) and replace any empty boxes with the number '0' so we have a more complete dataset.

We would first want to read in our file, AskAManager.csv using the Load(text file) action and pass in the file location of our CSV file. In this case, the file is located within the main 'Data' folder in a subfolder called 'Miscellaneous.'

DataFrame frame

Next, we want to select the column number of the column we want to grab; note that columns start off from 0 which means the 'Other monetary comp' would be column 6. Finally we would want to call our action, ReplaceUndefinedFromSelectedColumns(text replacedElement) and pass the value we want to replace any undefined values as. This means any blanks in our dataset would be replaced with the number 0 in its place for that column.


Below is the full code example:

use Libraries.Compute.Statistics.DataFrame

DataFrame frame
//Add the first column in the set to the list of selected columns

//perhaps this replaces an undefined in an IntegerColumn or a TextColumn
output frame:ToText()

Try it Yourself!

Press the blue run button to execute the code in the code editor. Press the red stop button to end the program. Your program will work when the console outputs "Build Successful!"

To view the program we have made, we can download the program file

Next Tutorial

In the next tutorial, we will discuss filtering rows, which describes filtering rows.