3 Fundamental Statistical Techniques used in Data Analysis
There are many techniques that data analysts and data scientist regularly use to understand the data they are working with. A combination of these are used based on what kind of data is present and what exactly are they trying to get out of it. Amond those, here are 3 fundamental statistical techniques used in exploratory data analysis:
Descriptive Statistics: Descriptive statistics are methods of summarizing and describing the main features of a dataset to gain a better understanding of its characteristics. Some common measures of descriptive statistics are:
- Measures of Central Tendency: These indicate the average or typical value of the data, such as mean, median, and mode.
- Measures of Variability: These show how much the data vary or differ from each other, such as standard deviation, range, and interquartile range (IQR).
- Measures of Position: These indicate the relative position or rank of the data within the dataset, such as percentiles and quartiles.
Descriptive statistics are essential for providing an initial overview of the data, which can help identify outliers, patterns, and general trends in the dataset.
Inferential Statistics: Inferential statistics allow us to make predictions or inferences about a population using a sample of data. The aim is to extend our conclusions beyond the data we have at hand. This is especially helpful when studying an entire population is not feasible or practical. Some techniques of inferential statistics are:
- Hypothesis Testing: Evaluating whether there is a significant difference or relationship between variables in the population based on the sample data. It involves setting up a null hypothesis and an alternative hypothesis and then performing a statistical test to decide if we should reject the null hypothesis in favor of the alternative hypothesis.
- Confidence Intervals: Estimating the interval that contains a population parameter with a certain level of confidence based on the sample data.
- Sampling Methods: Methods for choosing a sample that represents the population well.
Regression Analysis: Regression analysis is a statistical method that explores how a dependent variable (also known as the response or outcome) relates to one or more independent variables (also known as predictors or features). The aim is to create a model that can estimate the dependent variable based on the independent variables. Regression analysis has two main forms:
- Simple Linear Regression: This involves only one independent variable.
- Multiple Linear Regression: This involves two or more independent variables.
Regression analysis is often used for prediction, measuring the strength and direction of relationships, finding important predictors, and testing hypotheses.
I hope you enjoyed it and found it useful. Do you want to see more of this kind of content? Let me know in the comments below. And if you’re interested in a more detailed post on this topic, I’d love to hear that too. I appreciate your feedback and support!