Demystifying data fabrics – bridging the gap between data sources and workloads

The term “data fabric” is used in the technological industry, but its definition and implementation may vary. I saw it between suppliers: in autumn last year, British Telecom (BT) spoke of their data fabric during an analyst event; Meanwhile, in storage, NetApp has reoriented its brand in an intelligent infrastructure, but previously used the term. The APPIAN application platform provider has a data fabric product, and the MongODB database supplier has also spoken of data tissue and similar ideas.

Basically, a data fabric is a unified architecture that sums up and incorporates disparate data sources to create a transparent data layer. The principle is to create a unified and synchronized layer between disparate data sources and workloads that need access to data – your applications, workloads and, more and more, your AI algorithms or your learning engines.

There are many reasons to want such a superposition. The data fabric acts as a generalized layer of integration, connecting to different data sources or by adding advanced capacities to facilitate access for applications, workloads and models, as allowing access to these sources while keeping them synchronized.

So far, so good. The challenge, however, is that we have a gap between the principle of data fabric and its actual implementation. People use the term to represent different things. To return to our four examples:

  • BT defines data fabric as a network superposition designed to optimize the transmission of data over long distances.
  • The interpretation of NetApp (even with the term intelligent data infrastructure) emphasizes the efficiency of storage and centralized management.
  • APPIAN positions its data fabric product as a tool for unifying data to the application layer, allowing faster development and personalization of user -oriented tools.
  • Mongodb (and other suppliers of structured data solutions) consider the principles of data tissue in the context of the data management infrastructure.

How do we cross all of this? An answer is to accept that we can approach it from several angles. You can talk about conceptually data fabric – recognizing the need to collect data sources, but without exceeding. You don’t need a universal “Uber-Fabric” that covers absolutely everything. Instead, focus on the specific data you need to manage.

If we rewind a few decades, we can see similarities with the principles of architecture focused on the service, which has sought to decline the provision of services from database systems. At the time, we discussed the difference between services, processes and data. The same goes for: you can request a service or request data as a service, focusing on what is necessary for your workload. Create, read, update and delete remain the simplest of data services!

I also remember the origins of the acceleration of the network, which would use cache to speed up data transfers by holding locally data versions rather than accessing the source several times. Akamai has built its activities on how to transfer unstructured content such as music and films effectively and over long distances.

This does not suggest that data tissues reinvent the wheel. We are in a different world (based on the cloud) technologically; In addition, they bring new aspects, especially around the management of metadata, monitoring the line, compliance and security features. These are particularly essential for IA workloads, where governance, quality and provenance of data have a direct impact on model performance and reliability.

If you plan to deploy a data fabric, the best starting point is to think about what you want the data for. Not only will this help you orient yourself towards the most appropriate type of data fabric, but this approach also helps to avoid the trap of trying to manage all the data in the world. Instead, you can prioritize the most precious data subset and consider the level of data tissue that works best for your needs:

  1. Network level: To integrate data into multi-naked environments, on site and edge.
  2. Infrastructure level: If your data is centralized with a storage supplier, focus on the storage layer to serve coherent data pools.
  3. Level of application: To collect disparate data sets for specific applications or platforms.

For example, in the case of BT, they found an internal value in the use of their data fabric to consolidate the data of several sources. This reduces duplication and helps rationalize operations, which makes data management more effective. This is clearly a useful tool for consolidating silos and improving the rationalization of applications.

In the end, data fabric is not a monolithic solution to a size. This is a strategic conceptual layer, saved by products and features, that you can apply where it is the most logical to add flexibility and improve data delivery. The deployment fabric is not an “define and forget” exercise: it requires continuous efforts to extend, deploy and maintain – not only the software itself but also the configuration and integration of data sources.

Although a data fabric can exist conceptually in several places, it is important not to reproduce delivery efforts unnecessarily. Thus, whether you can collect data on the network, in the infrastructure, or at the application level, the principles remain the same: use it where it is most suitable for your needs and allow it to evolve with the data it serves.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button