Unlock Business Value with AI-Driven Semantics

TL;DR - AI technology promises renewed efficiency, new products and revenue streams, and an improved customer experience. However, it requires a semantic layer so AI systems can understand the business context of your data. This blog explores how organisations can start using AI to build and expand their semantic layer, moving up to more impactful AI use cases at scale.

What does long-term success look like in terms of AI implementation? Depending on the organisation, some measurable business outcomes could be:

Reduced operational costs from agentic AI automation of critical business processes.
Faster time to insights and near real-time decision making from AI-driven analytics that allow anyone in the organisation to ask natural language questions about enterprise data and get reports and dashboards that answer those questions.
Increased customer satisfaction from AI-assistants that resolve customer queries and troubleshoot incidents.

However, to reap the benefits, generative and agentic AI systems must be provided with contextual information about how the organisation runs and its business domain.

The key enabler of success in AI projects that are driven by data is the “semantic layer”. This is essentially data about your data. You can think of it as a dictionary that AI systems can look up for any organisational data to understand:

Business meaning and context of the data
How the data is used and stored
Data relationships and constraints

Semantic Layer Benefits

At a recent retail client engagement, the delivery of a reporting and analytics platform required a new semantic layer to lay the foundation for the platform’s long-term success.

The creation of the semantic layer involved a lot of back-and-forth between our consultants, who were new to the data, and internal subject matter experts, who possessed core business and domain knowledge. The process was time-consuming and involved trial and error. However, it came with a great reward, as once it was implemented, it enabled:

Operational efficiency as we created a reliable source of truth for the organisation’s data that eliminated redundancies and duplicate efforts.
Improved data governance as the organisation could scale both centralised and federated operational models while meeting all required compliance and security standards.
Improved data-driven decision-making as new data quality KPIs provided accurate and consistent insights.
Faster time-to-insights with advanced reporting task automation.
New opportunities for business growth and revenue generation as a wealth of new use cases were now possible within the reporting platform.

How AI Expands Semantic Layer Benefits

When starting their modernisation journey, most organisations are limited by semantic layer silos. This can be in the form of both single-point-of-failure SMEs who have the working knowledge of the data, or, where the semantic layer already exists, its dependency and storage in a single tool (e.g., Power BI, Looker). In either case, this inflexibility presents a boundary. AI can be used to turn existing boundaries into opportunities.

New Growth Opportunities

AI can be used to derive, extract, and enrich your semantic layer, creating a unified foundation for your data much closer to its sources. A unified layer can work and deliver value across your entire tech stack and your various organisational domains. A consistent, big-picture data view in the hands of your analysts and future AI agents enables them to quickly identify new opportunities for efficiency, expansion, and business growth.

Faster Time to Insights

Knowledge and insights become an organisational asset once the semantic layer has been extended and decoupled from specific tools. It is no longer solely limited to BI or data science teams. Anyone can access the insights they have the authorisation to and perform their job to the best of their ability. Going beyond, it facilitates cross-domain collaboration and opens new opportunities for valuable organisational data products.

Leverage and Enrich Existing Knowledge

AI can mine existing knowledge bases (including those it has helped enrich) for their taxonomies. Using this information, it can connect the dots between scattered data insights across the organisation, allowing it to extract deeper meaning from the data landscape and accelerate data profiling, lineage mapping, and core definition creation.

Challenges in Semantic Layer Implementation

A common challenge when starting a new AI initiative is that the semantic layer is often absent and needs to be built from scratch. This involves time-consuming tasks like:

Documentation of the field’s types, constraints, relationships, and meaning, often done as code
Documentation of business logic, transformation, and bespoke or situational context
Creation of documentation pages for handover, training, and stakeholder verification

Due to the fast-paced nature of many organisations, proper documentation is often deprioritised in comparison to the delivery of new features or data products. But thanks to generative and agentic AI, some of these tasks can be made more efficient and integrated into existing workflows.

While AI can open up a myriad of opportunities for your organisation and its data products, there are still some non-negotiables that AI will not solve.

Misalignment in business definitions between domains or business units still needs to be addressed.
The data assets, including the semantic layer, still require accountability and responsibility, with stewards keeping definitions and processes up to date.

An Example of How AI Supports Semantic Layer Development

Implementing a semantic layer is a perfect starting point for your organisation’s AI journey. AI can partially automate and hasten the implementation process, while also laying the foundational groundwork for advanced AI use cases.

Getting Started

Diagram showing how a large language model and data team interact to create a semantic layer.

The simplest setup involves just a few steps using any third-party LLM. In the enterprise, care should be taken to use the AI in a way that does not create data security and regulatory risks. Deploying an open source LLM on-prem gives the most control and privacy.

For the sake of this explanation, we have used ChatGPT as an example.

Provide the LLM with a prompt to generate data dictionaries or contracts from sample data.

Here is an example prompt.

You are an assistant to a data engineering team, tasked with helping them build the semantic layer for their new data warehouse. Below is some information about your expected inputs and outputs:
Inputs: You will receive small csv files with sample data. You will also, at times, receive updated schemas from humans in the loop as feedback for areas to improve or additional context.
Outputs: You will output dbt compatible YAML with a fully populated schema.
Your schema should include:
The column's name
The data type
A detailed description
dbt tests (where applicable)
Constraints or relationship keys based on previous tables you have been shown
Important:
Always populate the YAML with your best guess for the parameters of each column. All columns from the CSV file require metadata population in your output YAML. The metadata will need to be machine-readable by other AI agents and data engineers.
If you are unsure about the correct value of any field you are filling in, flag this for review by a human in the loop.
You are to output the YAML in a code block, and a separate block of your commentary on any fields with low confidence that a human should review. Outside of this there should be no commentary.

2. Provide sample data to the LLM. In our example, we have used sample data from the dbt Jaffle Shop project.

3. The LLM returns a templated contract with its estimates of the metadata, using context and prior knowledge to infer definitions as shown below.

version: 2 models: - name: raw_stores description: > This table contains the list of physical store locations for Jaffle Shop. It includes store metadata such as unique identifier, location name, open date, and applicable local tax rates. columns: - name: id description: > Unique identifier for the store. This is a UUID used to uniquely distinguish each store in the system. data_type: string tests: - unique - not_null constraints: - type: primary_key - name: name description: > The name of the store, generally aligned with its city or location. data_type: string tests: - not_null - unique - name: opened_at description: > The timestamp representing when the store officially opened to the public. data_type: timestamp tests: - not_null - name: tax_rate description: > The local sales tax rate applicable to transactions at the store, expressed as a decimal. data_type: float tests: - not_null - accepted_range: min: 0.0 max: 0.15

4. Validate the outputs and give any relevant feedback to the LLM before signing off and implementing the finalised data contract.

Screenshot of a ChatGPT conversation about clarifying data formatting for a semantic layer.

The solution above allows your data engineer to partially automate the repetitive creation of metadata files and output a validated semantic layer more quickly.

This may seem like only a few simple steps, but it is a significant time-saver in the world of data. The ability to construct pre-filled templates with AI, as opposed to spending significant time manually bulking out YAML or knowledge base pages, means that proper documentation for your data projects and products no longer needs to be put on the back burner. Instead, it can be integrated into your engineer’s workflows with every new feature.

Advanced Implementation

The next step would be to further automate and integrate with your database, as well as implement continuous integration workflows. You could point an LLM agent to your development environment database, where it can automatically pick up any new tables, cross-check them against your existing semantic layer, and output a template without needing to be asked.

The LLM agent would need to be configured to perform the required task using the same prompts as those described above.

Diagram showing how a large language model uses organisational data, prompts, and human review to build a semantic layer.

We won’t delve too deeply into agentic solutions in this article; this is just an example of how you can progress from that starting point to the next level in your data maturity journey.

As your ecosystem matures, you can move towards conversational analytics, where anyone in your organisation can talk with data to find answers quickly. We built a solution prototype for a large Australian insurance organisation with a demo showcased at both the AWS Summit Sydney and the Google Cloud Summit. You can read about conversational analytics here.

More Advantages of AI in the Semantic Layer

The above AI solution offers further advantages across AI initiatives.

Machine and Human-Readable Templates

The AI solution can be prompted to deliver its templated outputs in a machine-readable format, such as JSON or YAML. Future AI agents can easily interpret these formats down the track. This means that not only have you partially automated the creation of a data dictionary, but you’ve also reformatted it for your upcoming reporting, analytics, or advanced AI projects.

Continuously Improving Solution

The agent can learn from both your data and human feedback. At first, it might not understand your bespoke terminology or hidden complexities in your business processes. However, with some quick feedback, it can incorporate this information, store it in its memory, and utilise it later during its next task. So while it is already speedy in delivering useful templates to the human in the loop, its accuracy, and therefore your efficiency, will improve over time.

Early Gap Identification

Using AI from the outset helps to identify gaps in processes, data structure, and security. This means that more risks and returns on investment can be identified and acted upon well before any large AI deployment is undertaken.

How V2 AI Helps

At V2 AI, we deliver enterprise-level outcomes through the development of AI use cases and solutions. For clients looking to bring AI into their operations, we cover all the vital areas:

Data maturity assessment and advice
AI use case design and implementation
Testing and deploying AI projects

We turn AI ideas into reality, from proof of concept and prototyping to secure, maintainable solutions running in production at scale across various sectors, including finance, retail, energy, healthcare, and more.

While many organisations will need an uplift of data maturity to fully realise their AI goals, this doesn't mean they need to delay AI innovation. In fact, you can leverage it right from the beginning to help lay the foundations for future AI acceleration.