What are the Steps in EDA?

Today, we’ll be talking about the general steps involved in Exploratory Data Analysis (EDA).

Step 1: Generate questions

  • The first step in EDA is to generate questions about your data. Just like a detective, you want to know everything there is to know about the data. What is it about? What kind of patterns might you see? What types of relationships might exist between different variables?

Step 2: Apply visualization

  • The next step is to apply visualization techniques to the data to help you answer those questions. You can use charts, tables, graphs, and other tools to help you visualize the data and identify any patterns or relationships that might exist.

Step 3: Transform and model data to look for answers

  • Once you have some visualizations, you can start to explore the data more deeply. This might involve transforming the data in some way or applying statistical models to help you look for relationships between different variables.

Step 4: Use what you learn to refine questions or generate new questions

  • Finally, you can use what you learn from your exploratory analysis to refine your questions or generate new questions. For example, you might find that there is a relationship between two variables that you didn’t expect, which leads you to generate new questions about that relationship.

 

Now, let’s talk about some common ways to explore data.

One way is to identify and understand the variables in your data. Variables are the different pieces of information that you have in your data. For example, if you have data on students, the variables might include things like their age, gender, test scores, and so on.

Another way to explore data is to use charts and graphs to visualize the data. For example, you might use a bar chart to show how many students got different grades on a test, or a scatter plot to show the relationship between a student’s age and their test scores.

Finally, you might study the relationship between different variables. For example, you might look at whether there is a relationship between a student’s gender and their test scores, or whether there is a relationship between the number of hours a student studies and their test scores.

 

In conclusion, EDA is a fun and creative way to explore data. By following the steps of generating questions, applying visualization, transforming and modeling data, and refining or generating new questions, you can learn a lot about your data. And by identifying variables, exploring charts, and studying the relationship between variables, you can gain even more insights into the data.