Data preparation, calculation of numerical summaries, graphical representations
Topic 1. Data manipulation and preparation. Introduction. Correct data format for
its proper processing. Good practices for the structure of a data analysis project
data: folders variable names,
Topic 2. Calculation of numerical summaries by groups. Data aggregation. Identification and
processing of missing data.
Topic 3. Graphic representations. Information visualization.
Prediction and classification
Topic 4. Prediction problems: multiple linear regression. Cost function, algorithm
gradient, numerical minimization and explicit minimization. Normal statistical model for
multiple linear regression. Confidence and prediction intervals.
Topic 5. Classification: logistic regression. Cost function. Decision boundaries. Precision and
sensitivity of a binary classification algorithm. 'One versus all' strategy for problem
multiclass.
Evaluate and simplify prediction or classification
Topic 6. Feature selection. Overfitting problem, function regularization
cost, statistical methods of variable selection,
Topic 7. Evaluation of an algorithm. Dividing the available data set into subsets
training, validation and testing. Estimation of the computational cost of an algorithm
Dimension reduction
Topic 8. Principal component analysis. Principles of component analysis
principals, decomposition of the covariance or correlation matrix. Techniques for selection
of the number of components.