Data Fabric vs Data Virtualization

What is the best way to connect data across multiple sources while supporting large data sets, complex data structures, and real-time needs?

How do Data Fabric and Data Virtualization compare when speeding up data & analytics? What do they have in common, and how do they differ?

Let’s define the two before diving into the differences, pros, and cons.

What is Data Fabric?

Forrester defines data fabric as a platform for “orchestrating disparate data sources intelligently and securely in a self-service and automated manner... to deliver a unified, trusted, and comprehensive real-time view of customer and business data across the enterprise.”

What is Data Virtualization?

Data virtualization is a logical data layer that can integrate enterprise data siloed across disparate systems, manages and unifies data for centralized security and governance, and delivers it to the business users in real-time.

How do they compare?

Functionality	Data Fabric	Data Virtualization
Data Catalog	Yes	Limited
Data Pipeline	Yes	Limited
Data Modeling	Yes	Limited
Data Types	Structured, Unstructured, Semi-Structured	Primarily Structured Data
Data Connectivity	Extensive	Extensive
Data Preparation	Yes	No
Push-Down	Yes	No
Caching, In-memory	Yes	Optional
Data Security, Governance	Yes	Yes
Natural Language Processing (NLP)	Yes	No
AI, ML Based Automation (actively uses metadata)	Yes	No
Composable, Reusable Components	Yes	No
Self-Service Data (Governed)	Yes	No
Self-Service Analytics	Yes	No

What are the Pros of Data Virtualization?

Provides a virtual approach to accessing and delivering data
Helps to integrate data siloed across enterprise systems
Returns the integrated information in real-time to the applications used by business users

What are the Cons of Data Virtualization?

Incomplete solution when compared to Data Fabric
Users have limited data pipeline capabilities
Implementing a data catalog isn’t possible
Cannot prepare data properly or effectively
Inability to use Natural Language Processing to run queries across datasets
No exposure to Artificial Intelligence or Machine Learning-based automation

What are the Pros of Data Fabric?

Data does not have to be moved; you can access it where it lives
Ingest, transform and integrate data on the fly without needing to persist data to a data lake or warehouse first
See results in real-time at each step without waiting for the data to be transformed
Save money by minimizing the amount of data duplication
When data needs to be persisted for performance or other reasons, it can be

What are the Cons of Data Fabric?

The traditional approach to a Data Fabric is to buy a bunch of tools and stitch them together - think long, expensive system integration projects
The way vendors are marketing Data Fabric is causing confusion. [read What is a Data Fabric]
Going too big on day one, instead of targeting a smaller achievable out

What is a business use case of a Data Fabric?

Let’s say your business is in the beverage industry. You have data from Salesforce, Excel and Oracle. The trick is, data about your corporate accounts live in Salesforce, data about the account managers maintaining relationships with vendors live in Excel and data about supply-chain updates Oracle. Data Fabric connects all three. Not to mention, it models the relationships between each source - all without moving any of the data and running queries across them through Natural Language Processing.

Learn how the Promethium Data Fabric connects 400+ data sources in a single data analytics platform saving go-to-market time and over 91% of integration costs.