Data Digest

Publications focused on data and technology

Venture into Golden Datasets using Power BI

Enterprise data architecture today covers data lakehouses, data fabrics, data meshes, data products, data sharing, self-service, governance, integration, discovery zones, and more. While the landscape of methodologies and frameworks is endless, the golden dataset still has a place throughout. 

In the ever-evolving landscape of data management and business intelligence, the creation of a golden dataset stands as a critical milestone for organizations aiming to harness the full potential of their data. Power BI, with its comprehensive suite of tools, offers a robust platform for designing such a dataset. This blog serves as a guide to navigating the process of creating your first golden dataset using Power BI, ensuring a solid foundation for data-driven decision-making within your organization. 

 

Understanding the Golden Dataset 

Before diving into the technicalities, let’s clarify what a golden dataset is. In essence, it’s a meticulously curated, authoritative source of data that serves as the single point of truth for your organization. It combines data from various sources into a cohesive, well-structured model, facilitating seamless access and analysis. In Power BI, this translates to a Power BI Semantic Model designed for optimal performance and accessibility. 

For more information on what golden datasets are and why they are essential to improving reporting efforts in Power BI, check out the previous article, The Case for Golden Datasets in Power BI. If you’re convinced already, here are the steps towards implementing a golden dataset at your organization: 

 

Steps to Implementing a Golden Dataset

Step 1: Start small. Build only what you need to solve the immediate problem. Don’t think of it as just one golden dataset, understand that you can always add to it (and you will if it’s successful). 

Step 2: Select a business unit or area that can benefit by having golden data. Select one who is experiencing pain bringing data together from multiple sources, has data that is difficult to understand, has strict standards or compliance that drive the need to be precise and do it consistently, and that have key measures and metrics must be governed and are well-defined. 

To demonstrate a golden dataset in action, consider this user story: “As a quality engineer in a manufacturing plant, I need a comprehensive understanding of the causes of defects in our production process to decrease downtime, reduce waste, and improve product quality.” 

Step 3: Determine the data model needed to support the output. Data modeling involves defining the relationships between different data tables, creating calculated columns, measures, and setting up hierarchies. This step structures your data in a way that’s intuitive for users and efficient for analysis. To start, the model can be simple, even a single table, or it can have multiple tables with a few one-to-many relationships. If the data is a bit more complex, consider using a dimensional model.  A dimensional model is always appropriate and allows for endless extensibility and serves you well if the golden dataset expands to the enterprise level. 

Figure 1. Semantic model of manufacturing quality defects, adopted from Obvience.

 

Step 4: Identify the key metrics, dimensions, and KPIs that are crucial for your business operations. This step involves close collaboration with stakeholders across departments to ensure that the dataset encompasses all necessary perspectives and needs. 

For the manufacturing quality example, the Quality Department provided the following KPIs: 

  • Defect Rate: The number of defective products produced per batch or time period. 
  • Defect Type Frequency: The occurrence of specific types of defects by vendor, material type, and manufacturing plant. 
  • Production Downtime: The amount of time production is halted due to defects or issues. 
  • Scrap Rate: The percentage of produced items that are discarded due to defects. 
  • Rework Time: The amount of time spent on fixing defective products. 

 

The above semantic model covers all the required KPIs except for scrap rate. However, since the model has been thoughtfully designed, a column that flags for scrap defects can easily be added. You’ll notice the defect table has both a fact and a conforming dimension associated with it. The defect dimension can be used to join though future facts, or drill across in Power BI terminology. Perhaps word gets out that the Quality Department is taking positive strides forward with their data transparency initiatives. The Supply Chain Department wants in! Again, since the dataset is designed with growth in mind, facts such as material purchase orders or inventory can be seamlessly integrated. 

Step 5: Incorporate your golden dataset into your organization’s data governance process. If you don’t have governance, ensure there is oversight of changes as the golden dataset grows. 

Step 6: Visualize using the golden dataset. With your golden dataset in place, you’re now ready to visualize the data and share insights with your organization. Create sample reports and dashboards that show how it can solve problems more quickly than before and use it as a training opportunity for others.  

 

Figure 2. Sample visual of total downtime by vendor and material category, shared by Obvience. This figure is property of Obvience LLC and has been shared solely for the purpose of demonstrating Power BI functionality with industry sample data. 

 

Conclusion 

Designing your first golden dataset in Power BI is a significant step towards unlocking the full potential of your organization’s data. By following these steps, you establish a strong foundation for data-driven decision-making. Remember, the journey to data mastery is iterative and collaborative. Engage with your stakeholders, learn from your data, and continuously strive to refine and enhance your golden dataset. 

If you need expert guidance on your data journey, Datalere’s consulting services are here to help. Our team of professionals specializes in data architecture, engineering, and analytics, ensuring you get the most out of your data investments. Contact us today to learn how we can support your organization’s data initiatives and drive impactful results. 

 

The data model used throughout this article was adopted from Obvience, an Independent Software Vendor and Intellectual Property Incubator focused on Microsoft Business Intelligence. Check out the sample dataset yourself here: Supplier Quality Analysis sample for Power BI. 

Figures are property of Obvience LLC and have been shared solely for the purpose of demonstrating Power BI functionality with industry sample data.