Data Science At Scale
Last Part, we discussed statistical tests including t-test, chi-square test, ks-test, & ANOVA on a couple features in our analytic data set, we compared and contrasted these p-values and traced the downstream effect of the feature in a simple logistic regression. This illustrates the importance of Exploratory Data Analysis. However, today’s data-scientist has to contend with data-sets consisting of hundreds or thousands of features to analyze. In this Part we will consider the scalability of our analytics - how do we perform Exploratory Data Analysis on hundreds of features in a dataset?
In define three new classes of features arising from Demographics (Section 16.1 Demographic Features), Labs (Section 16.2 Lab Features), and Examination (Section 16.3 Examination Features) Domains. Most of this is a review of use of case_when; however, the Labs do present us with some challenges worth examination in Section 16.2.1 Mapping Column Issues.
Our primary discussion will be around learning how to functionalize some of our processes using R, we have over 100 columns to analyze and copying and pasting code is ineffective.
In Chapter #sec-functional-dbplyr-purrr-and-furrr we will
- introduce several concepts including
enquoand variable resolution with!!. - showcase the
comparedfandtablebyfunctions inarsenal - use
purrrandfurrrto iterate and speed up functions
In Chapter 9 Exploratory Data Analysis at Scale as we continue to analyze the data, we will
- discuss variants of normal
dplyrfunctions with their_atbrethren:mutate_at;summarise_at,filter_at, and others. - discuss missing data analytics & mean value imputation.
- showcase a few different packages that look for correlated features and discuss why we want to look for correlated features.
- review Principal Component Analysis and k-means clustering as means of data reduction.
- give many examples of how to easily define useful functions to aide in your analysis
- showcase
DataExplorer,skimer,GGplotas packages for to assist with automation of EDA tasks