How Do You Get Claude To Talk To All Your Enterprise Data? >>> Read the blog by our CEO

June 27, 2023

Rethinking Data Catalogs: Addressing the Unique Demands of Generative AI and Large Language Models

As our technological world continues to evolve, the tools we use must adapt in kind. Data Catalogs

 Kaycee Lai

Kaycee Lai

Founder

As our technological world continues to evolve, the tools we use must adapt in kind. The advent of Generative AI and Large Language Models (LLMs) presents a challenge to traditional data catalog methods. For data stewards and governance practitioners, it is time to reassess the adequacy of existing data catalog infrastructure in meeting the demands of these advanced models.

A green series of charts on a black and green background

Rethinking Data Catalogs for Generative AI and Large Language Models

Understanding the Shortcomings of Traditional Data Catalogs

Traditional data catalogs are designed to catalog, organize, and govern data assets across an organization. They provide a means for users to discover, understand, and utilize their data efficiently. While these catalogs have served us well thus far, the growing dominance of Generative AI and LLMs has illuminated significant deficiencies.

Generative AI and LLMs present a distinctive set of challenges. To begin with, LLMs often struggle to comprehend organization-specific tags and taxonomies. They may be incapable of choosing appropriate data if multiple assets share the same name. Moreover, they may not accurately gauge the relevance of multiple data assets due to a lack of access to underlying usage patterns. As such, the conventional data catalog falls short in accommodating the needs of LLMs, undermining their efficiency and performance.

A Paradigm Shift: The New Generation of Data Catalogs

To overcome these hurdles, a fundamental shift is required. Our new breed of data catalog should be equipped with Natural Language Processing (NLP) capabilities, facilitating an intelligent understanding of data that goes beyond simple tags and names. NLP allows the catalog to comprehend, interpret, and even learn from textual data, thereby enhancing the LLMs’ comprehension of organization-specific tags and taxonomies.

Furthermore, this new data catalog should not just provide metadata about where the data resides but also grant data access – preferably via data virtualization. This capability ensures that the catalog can provide not just static, descriptive information but also access to real-time, operational data. As such, it empowers LLMs with the ability to understand the relevance of multiple data assets better and to discern usage patterns.

Why the Change is Crucial

The implications of this shift are profound. It enables LLMs to operate at their fullest potential, promoting superior outcomes across various applications of Generative AI. More efficient data handling, improved comprehension of organizational semantics, and access to underlying usage patterns all translate into enhanced accuracy and richer insights.

For data stewards and governance practitioners, embracing this new approach is not just about keeping pace with technological progress. It represents an opportunity to fundamentally transform the way we manage and leverage our data assets. By harnessing the power of NLP and data virtualization, we can ensure that our data infrastructure is not merely reacting to the demands of Generative AI and LLMs, but actively enabling their success.

Final Thoughts

The era of Generative AI and LLMs necessitates a fresh perspective on how we catalog and access our data. To fully unlock the potential of these advanced models, we must rethink our data catalogs. This involves enriching them with NLP capabilities and ensuring real-time data access through data virtualization. As we stand on the cusp of this exciting frontier, it is incumbent upon us, as data stewards and governance practitioners, to lead the charge towards more intelligent, adaptive, and effective data catalogs.

Related Blog Posts

February 3, 2026

New Episode: Kjersten Moody on The AI Data Fabric Show

Former 3x CDO Kjersten Moody shares hard-won lessons from Unilever, State Farm, and Prudential on why thinking local unlocks global impact, how governance enables speed, and why AI is reshaping enterprise leadership....

Continue Reading »
A cover picture with the title 5 Key Takeaways from Our Panel on Breaking the Metadata Bottleneck for Contextual AI Insights and a funnel image with different data sources on the right.
January 30, 2026

5 Key Takeaways from Our Panel on Breaking the Metadata Bottleneck for Contextual AI Insights

Why most “talk to your data” initiatives stall — and what it actually takes to break the metadata bottleneck and deliver production-grade, trustworthy AI analytics.

Continue Reading »
January 20, 2026

The Context Engineering Challenge No One Talks About

AI accuracy doesn’t fail because models can’t write SQL — it fails because enterprises underestimate the cost and complexity of engineering business context at scale.

Continue Reading »