How Do You Wire Your Enterprise With AI-Ready Data? >>> Read the blog by our CEO

June 27, 2023

Rethinking Data Catalogs: Addressing the Unique Demands of Generative AI and Large Language Models

As our technological world continues to evolve, the tools we use must adapt in kind. Data Catalogs

 Kaycee Lai

Kaycee Lai

Founder

As our technological world continues to evolve, the tools we use must adapt in kind. The advent of Generative AI and Large Language Models (LLMs) presents a challenge to traditional data catalog methods. For data stewards and governance practitioners, it is time to reassess the adequacy of existing data catalog infrastructure in meeting the demands of these advanced models.

A green series of charts on a black and green background

Rethinking Data Catalogs for Generative AI and Large Language Models

Understanding the Shortcomings of Traditional Data Catalogs

Traditional data catalogs are designed to catalog, organize, and govern data assets across an organization. They provide a means for users to discover, understand, and utilize their data efficiently. While these catalogs have served us well thus far, the growing dominance of Generative AI and LLMs has illuminated significant deficiencies.

Generative AI and LLMs present a distinctive set of challenges. To begin with, LLMs often struggle to comprehend organization-specific tags and taxonomies. They may be incapable of choosing appropriate data if multiple assets share the same name. Moreover, they may not accurately gauge the relevance of multiple data assets due to a lack of access to underlying usage patterns. As such, the conventional data catalog falls short in accommodating the needs of LLMs, undermining their efficiency and performance.

A Paradigm Shift: The New Generation of Data Catalogs

To overcome these hurdles, a fundamental shift is required. Our new breed of data catalog should be equipped with Natural Language Processing (NLP) capabilities, facilitating an intelligent understanding of data that goes beyond simple tags and names. NLP allows the catalog to comprehend, interpret, and even learn from textual data, thereby enhancing the LLMs’ comprehension of organization-specific tags and taxonomies.

Furthermore, this new data catalog should not just provide metadata about where the data resides but also grant data access – preferably via data virtualization. This capability ensures that the catalog can provide not just static, descriptive information but also access to real-time, operational data. As such, it empowers LLMs with the ability to understand the relevance of multiple data assets better and to discern usage patterns.

Why the Change is Crucial

The implications of this shift are profound. It enables LLMs to operate at their fullest potential, promoting superior outcomes across various applications of Generative AI. More efficient data handling, improved comprehension of organizational semantics, and access to underlying usage patterns all translate into enhanced accuracy and richer insights.

For data stewards and governance practitioners, embracing this new approach is not just about keeping pace with technological progress. It represents an opportunity to fundamentally transform the way we manage and leverage our data assets. By harnessing the power of NLP and data virtualization, we can ensure that our data infrastructure is not merely reacting to the demands of Generative AI and LLMs, but actively enabling their success.

Final Thoughts

The era of Generative AI and LLMs necessitates a fresh perspective on how we catalog and access our data. To fully unlock the potential of these advanced models, we must rethink our data catalogs. This involves enriching them with NLP capabilities and ensuring real-time data access through data virtualization. As we stand on the cusp of this exciting frontier, it is incumbent upon us, as data stewards and governance practitioners, to lead the charge towards more intelligent, adaptive, and effective data catalogs.

Related Blog Posts

May 26, 2026

CDAOs: Analytics Is Where AI Earns Its Keep in Data & Analytics

Most AI investment in data and analytics is going to code generation — but Gartner's 2026 CDAO survey shows it's analytics use cases that deliver the highest ROI, with up to a 42% lift in business value.

Continue Reading »
May 19, 2026

Andrew Clyne on the AI Data Fabric Show

From building Mastercard's first data warehouse to betting early on Cloudera at Visa, serial CDO Andrew Clyne reflects on three decades of data leadership — and what AI changes next.

Continue Reading »
April 28, 2026

How to Build Context Engineering in the Enterprise: A CDO’s Playbook

Context engineering has become the defining discipline in enterprise AI. Here's a practical framework for the CDOs who have to build it — without rebuilding the data stack.

Continue Reading »