Data Wrangling: Data Enrichment

Here are three important ways to achieve data enrichment: feature engineering, deriving new variables, and integrating external data sources. 
 
1) Feature Engineering: 
Think of feature engineering as a way to get the most out of your data by creating new, more informative attributes or “features”. Let’s say you own a coffee shop, and you have a database with information about the age of your customers, the number of cups of coffee purchased, and the total amount spent. While these features are helpful, we can create new ones that might be even more insightful. 
 
For example, we could compute the average spending per customer by dividing the total amount spent by the number of cups purchased. This new feature could help us identify customers who are spending more per cup, perhaps on premium coffee, and target promotions or marketing to them. 
 
2) Deriving New Variables: Coffee shop example for enrichment during data wrangling
Deriving new variables involves transforming or combining existing variables to create new insights. Going back to the coffee shop example, let’s consider we have data on the time customers visited the shop. From the visit time, you could create new variables like time of day (morning, afternoon, evening), day of the week, or weekend vs. weekday.  
 
These derived variables could help you spot patterns in customer preferences or even shape decisions like adjusting staff schedules on busier days or offering special promotions on slower days. Derived variables can reveal hidden patterns in your data, contributing to more informed decision-making. 
 
3) Integrating External Data Sources: 
Sometimes the key to enriching your data lies outside your own data sources. Imagine your coffee shop is located downtown, and you’ve noticed fluctuations in sales. To better understand the contributing factors, you could integrate external data sources like local events, weather data, or even nearby competitor opening hours. 
 
Adding this external data could help you draw more meaningful conclusions about why sales might be spiking on certain days. For example, you could find that offering umbrellas on rainy days or hosting live music during a popular local event boosts sales. 
 

In conclusion, data enrichment involves making your data more valuable and insightful by adding new features, deriving new variables, and integrating external data sources. Through these techniques, you can uncover hidden trends and make better-informed decisions.  


Related Tags: