Unformatted text preview:

1 Understanding Data Structures Variables Attributes Properties being measured e g income age Data Items Observations Individual instances in the dataset e g survey respondents patient records 2 Data Collection and Extraction If data is already in table form consider using software like Excel for organization For web data tools like Outwit Hub or Python can assist with scraping and automation 3 Data Cleaning Wrangling applying statistical models Standardization and Normalization Addressing Missing Values Impute values by averaging using nearby values or Min Max Normalization Scale values to a specific range Z Score Normalization Center data around the mean Outlier Management Detect and handle outliers through binning or using a logarithmic Filtering Methods like Gaussian or box filtering to smooth data Noise Reduction Through binning clustering or regression analysis to smooth and Combining datasets with common keys e g zip code for a comprehensive view Be mindful of synonymy same meaning different names and polysemy same name scale 4 Dealing with Noisy Data clean datasets 5 Data Integration and Fusion different meanings 6 Privacy and Anonymization identities 7 Data Reduction Techniques k Anonymity Generalizing or binning data e g age groups to protect individual Sampling Choose representative subsets random or stratified sampling Binning and Clustering Group similar values together for summarization Dimensionality Reduction Techniques like PCA covered in future lectures to reduce data complexity 8 Data Augmentation Techniques to create synthetic data e g image transformations jittering for numerical data which can be useful for deep learning applications


View Full Document

SBU CSE 332 - Midterm Guide 3

Download Midterm Guide 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Midterm Guide 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Midterm Guide 3 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?