Tutorial: Data Science Introduction

This is an introduction to Data Science and why it is important.

Introduction

This guide is designed to provide an overview of documentation for data science libraries in Quorum. In it, we discuss the major features, like loading data, analyzing it, and generating charts from it. The goal of this set of tutorials is practical in the sense that it provides how to use the libraries and interpret the results, but it is not intended as a fully data science curriculum or statistics tutorial. While some equations on how the system works will be included, the goal here is to describe how to use the system, not to provide a deep dive of the math.

Thus, perhaps the first question to answer is what this set of tutorials are for, which in turn gets at what data science is for. In plain English, data science, statistics, and similar areas are designed to help us understand certain kinds of questions. Namely, we have a data set, in it is some kind of information structured in any way we can imagine, and the goal is to answer a question about that data. In a sense, we want to turn that data into information that we can practically use.

In order to better understand what we do in data science, it sometimes helps to have data. While we could gather some, and generating our own data in scientific ways is fun and exciting, we are going to rely here on https://www.data.gov. Specifically, we are going to be looking at data on several topics. In each case, our goal is to learn something different about data science. Thus, across these tutorials we will load data, filter it, adjust it, test it, or anything else to try and understand it. The datasets we will be using are all free and public.Once we have finished with these tasks, we will examine how to manipulate our data even more such as incorporating our data into statistics or charts. All items in this track use examples that we have stored on Github in the Quorum curriculum repository.

Overview into Saving and Loading DataFrames

In any kind of data science application, whether it be to answer a question, create a model, or create charts, we first need to load our data. There are many ways to load data and Quorum has adopted an "opinionated" approach, similar to TidyVerse in the language R. We did this in part because the approach is consistent in how it stores data, but also because using the same way of loading data across the board for all data sets makes analysis easier. To understand this, we first need to talk about data frames, which are basically fancy tables we can load from a spreadsheet.

Next Tutorial

In the next tutorial, we will discuss data frames, which describes Introduction to data frames.