Table of Contents
What is Data Fabric?
Data fabric is all the threads of architecture and technologies woven together to mitigate the complexities of managing the myriad varieties of digital data.
Using multiple database management systems across a variety of platforms to orchestrate the integration of the siloes of complex digital data.
The data fabric is a new methodology to manage and integrate data that promises to unlock the power of data in ways that shatter the limits of previous generations of technology, such as data warehouses and data lakes. Because it is based on a graph data model, the data fabric is able to absorb, integrate, and maintain vast quantities of data in any of their formats.
Data Fabric Definition
Data Fabric is an architecture and set of data services that provide consistent capabilities across a choice of endpoints, spanning on-premises and multiple cloud environments.
Data Fabric simplifies and integrates data management across cloud and on-premise to accelerate digital transformation.
The Key Components of Data Fabric
Now let’s examine the “threads that make up this Fabric” AI, APIs, Analytics, MicroServices, Kubernetes, Docker, Mixed Clouds, Big Data, and Edge IoT.
Each of these unique topics deserves its own blog.
So, I will discuss each one in greater detail and focus on one topic or thread per blog.
Today, we will focus on the concept of data fabric.
The Challenge of Managing Distributed Enterprise Data
Today’s Enterprise has data deployed everywhere, comprising a variety of structured, unstructured, and semi-structured data types that are deployed on-premises in Multi-Clouds (Public, Private, and Hybrid).
With data in flat files, tagged files, SQL & NoSQL databases, Big Data repositories, graph databases, and ad infinitum.
The expanding variety of tools, technologies, platforms, and data types make it difficult to manage the processing, access, security, and integration of data across multiple platforms.
Data Fabric Capabilities and Access Protocols
Technology is emerging that creates a converged platform that supports the storage, processing, analysis, and management of diverse data.
Data maintained in existing files, tables, streams, objects, images, IoT data, and container-based applications can all be accessed with different standard interfaces.
Supported Interfaces in Data Fabric
A data fabric makes it possible for applications and tools designed to access data using many interfaces, such as:
- NFS (Network File System)
- POSIX (portable operating system interface)
- REST API (representative state transfer)
- HDFS (Hadoop distributed file system)
- ODBC (open database connectivity)
- Apache KAFKA (for real-time streaming data)
The data fabric must also allow support for future standards as they develop.
Requirements for an Effective Data Fabric
There are a number of requirements a data fabric must address:
- Speed, scale, and reliability —access to data maintained within the data fabric must meet business requirements for speed, scale, and reliability across multiple computing environments without requiring trade-offs.
- Centralized service level management: SLAs related to response times, availability, reliability, and risk containment must be measured, monitored, and managed with the same process for all data.
- Consolidated data protection: Data security, backup (BU), and disaster recovery (DR) methods are built into the data fabric framework. They are applied consistently across the infrastructure for all data, whether it be in the cloud, multi-cloud, hybrid, or on-premises deployments.
- Infrastructure elasticity: Decoupling data management processes and practices from specific deployment technologies makes for a more resilient infrastructure when adopting edge IoT or any unknown future technology innovations.
- Multiple locations —access to data from network edge, enterprise data center, multi-clouds (Public, Private, or Hybrid)
- Unified data management: Providing a single framework to manage data across multiple and disparate deployments reducing the complexity of data management.
- files must be easy to locate and access
- levels of security must be maintained at the highest level
- files must have compression to reduce storage needs
- provide snapshots of the data for backups
- support multi-tenant (multiple company) computing environments
- High Reliability and Availability —must have a highly reliable environment that self-manages and self-heals itself. Providing high availability services to meet mission-critical needs.
Overcoming the Challenges of Modern Data Management
The complexities of today’s enterprise data management are growing at an ever-accelerating rate of new technologies, new kinds of data, and new platforms.
This data is increasingly distributed across on-premise and mixed cloud environments.
The process of moving, storing, protecting, and accessing data can become fragmented, depending on where data is located and the technologies used.
Having to update data management methods with each technological change is difficult, disruptive, and expensive.
As technology innovation accelerates, it can quickly become unsustainable.
Data fabric solutions can serve to minimize this disruption by creating a highly adaptable data management environment that can be quickly adjusted as technology evolves.
Choosing the Right Data Fabric Platform: Tapestry or Burlap?
So the answer to the question “Tapestry or Burlap” is: It depends.
It depends on the threads you choose and the loom you use. The loom is the Data Fabric platform you choose, such as Azure Service Fabric, a Platform as a Service (PaaS) or Cambridge Semantics Enterprise Data Fabric. The threads you have available are Dockers, Kubernetes, APIs, MicroServices, Mixed Clouds, Big Data, Analytics, Edge IoT, and AI. How you weave them together will determine the effectiveness of your solution.
The key to long-term success is to be open to the new and disruptive technologies that are fast approaching. As more and more organizations deploy data fabric solutions, more holes can be exposed, and solutions created to fill them will make an ever tighter and more lustrous fabric.
The Goal of making “all data” available for “any purpose” at “any time” and “anywhere” is in sight, but we are not there yet.
Legacy systems and other creators of data silos combined with the different types of data and the increasing varieties of data usage seem to be constantly moving the goalposts.
There is hope the use of AI and better-designed APIs will mitigate the complexity of modern data management. This will allow better automation of the huge number of tasks required to make this look easy so that we don’t scare the people writing the checks.