Enterprise Level Solution Approach of Generative Artifi

Micro-Language-Model Architecture

Why Generative Artificial Intelligence (Gen-AI) Has Become So Popular?

The idea of artificial intelligence (AI) has been around since time immemorial in science fiction, philosophy and literature, but it first really emerged in the mid-20th century. Alan Turing's 1950 paper "Can Machines Think?" and its accompanying "Turing Test" to determine whether a machine has human-level intelligence can be considered the milestone of AI evolution. The main goal of AI was to develop algorithms and systems that mimic some aspects of human intelligence. This included human-like intelligence and competencies such as problem solving, learning, perception, perception, language, and even feeling.

In the early 60s, rule-based systems were the simplest form of artificial intelligence. In the 80s, machine learning algorithms gained momentum and started to evolve from simple rule-based structures to more complex algorithms. However, the most popular period of artificial intelligence in the past was undoubtedly in 1997 when IBM's Deep Blue defeated world chess champion Garry Kasparov. The main reason behind this popularity was that it broke a situation that no one at the corporate level or in research and development laboratories knew about or could get information about through the media.

The main reason why Generative AI has become so popular in the world so quickly was the opening of the ChatGPT application developed by OpenAI to the whole world. Artificial intelligence studies, which have always been B2B (Business to Business) until today, were opened to B2C (Business to Consumer) general consumers for the first time. The main reason for its popularity is that everyone can access it with a simple interface and start using its capabilities easily, such as the Google search engine or Facebook, Twitter, Tiktok in the social media revolution. The speed at which it became popular: ChatGPT reached 100 million monthly active users in just 2 months. Considering that this speed is 9 months for TikTok and 2.5 years for Instagram, the difference can be understood more easily.

The content that Gen-AI can produce "surprisingly", its ability to easily understand what is being written, and its interactive and iterative progress in a conversational manner have undoubtedly led to its enduring popularity.

Gen-AI challenges in Enterprise

With the "Foundation Models" that form the infrastructure of generative intelligence, namely LLMs (Large Language Models) and MLLMs (Multimodal Large Language Models) that have entered our lives in recent months, it no longer accepts only text as input, but started to receive a wide variety of inputs such as images, audio, video, database structured data, unstructured data lake data. It was able to make connections between the various inputs and started to create similarly diverse outputs, and these advances took place in weeks to months.

These developments, which have created excitement at the end-user level, have of course attracted the attention of organizations and many organizations have started to work on how to benefit from Gen-AI. When we look at the experience of organizations that want to move quickly in this field, many challenges come to the fore:

Big Language/Basic Models are Good in General but Inadequate in Particular

The first problem with (M)LLMs in institutional use is that while they can give better answers than the average person on any topic, it is difficult to have in-depth knowledge on a specific topic. Because what was wanted was, for example, a co-pilot that could assist professionals working in the field of law. It was not easy for him to master all the legal rules and regulations of that country in every detail, to examine all past cases and to establish correct correlations. The experiments resulted in "average performers at the general level" who wrote poetry that imitated Shakespeare, if not as much as Shakespeare, and gave a generic answer about the law.

Boundary setting, Authorization and Authentication

Without a proper authorization and authentication layer, there were problems with each user anonymously accessing as much information as they wanted. For example, there were problems in ensuring that salary information could be seen by HR personnel but not by another department user. A similar problem was experienced in setting boundaries. Since "giant" language models were used that were subject-specific and without clear boundaries, problems arose in responding to issues that should not be responded to. In a typical example, a chatbot developed for customer service started to respond to completely different topics in line with the user's demands, which caused damage to the reputation of the organization it represents.

Scalability Issues

Since the LLMs in the basic models in Gen-AI applications consist of billions of parameters and cause very serious resource consumption, it is a challenge to scale and manage the increasing load without performance degradation. Horizontal and vertical scaling, which came into our lives with cloud computing, was insufficient to scale such large monolithic applications.

Explainability and Transparency

The need for the results to be explainable and transparent, which is reasonable in machine learning and deep learning algorithms, has turned into a great danger when it comes to LLMs due to complex algorithms consisting of billions of parameters. Understanding from which sources and with what kind of thinking the content/outcome is produced has emerged as a major challenge.

Performance Issues

High concurrent usage led to latency and downtime, affecting the user experience. In other words, there were difficulties in meeting a large number of simultaneous client requests from systems and users within an organization.

Single Point of Failure and Security Concerns

The fact that the organization could also access sensitive and critical data was the single point of vulnerability in terms of data theft. Malicious users who abused this entry point could easily access data that the organization did not want to share.

Bias Reduction

Bias problems arose due to the fact that LLMs are "pre-trained off-the-shelf" artificial intelligence models. Trained on terabytes of data on the internet, these language models are naturally based on the data on the internet. However, the internet is a network of "majority opinion misinformation" and this led to biased results by training the LLM directly without checking the data quality, consistency and accuracy of the training data.

Trust in Artificial Intelligence

At the point where the results produced by Gen-AI are integrated into the decision-making points of an operational system, the trust concerns about the consistency and explainability of the result led to its not being used in critical systems. Unlike the rule-based or basic data science models of LLMs, the motivation of "let me give as valid a result as possible even if it is not correct" instead of "not reacting, not giving results" has created serious trust issues for organizations that aim for "zero-error" such as corporations.

Resource Consumption

Since (M)LLM applications are complex applications with very large parameters and vector computing infrastructures, they required large amounts of GPU resources as in deep learning algorithms. This resulted in large CAPEX expenditures in their on-prem data centers. Even in pay-as-you-go models, organizations using the cloud had to make financially challenging OPEX expenditures due to higher-than-expected resource consumption.

Talent Acquisition

In a world where even data scientists are difficult to retain in organizations, the scarcity of AI engineers with competence in the field of Gen-AI has posed great challenges in building a sustainable team and making new developments in this field.

A Solution Approach for Enterprise Level Generative Artificial Intelligence Applications

To summarize, using huge language models without establishing the necessary governance mechanisms has led to problems in many areas such as trust, performance and competence. When we look at the solution approach we have developed as KPMG to overcome these problems:

Instead of huge multimodal/unimodal language models in corporate use, the "Base Model" is the translation of models with a basic level of competence into "Customized"-Language Models by training them with corporate data. It is the use of "medium/small/micro" language models with "fewer parameters" but "deepened expertise" such as ERP-LM that has learned ERP data very well, HR-LM that has learned HR data very well, Customer-LM that has learned Customer data very well.
In the case where specialized small language models only have expertise in some subjects, it is necessary to have more than one customized LM working and to use orchestrating/aggregating language models that aggregate and filter the results into a single truth.
A "routing" routing language model is needed that determines which custom-LM the incoming request should go to.
There is a need for a "control" language model that can check whether the user / system making the request is suitable for the scope of the custom-LM ecosystem (eliminating out-of-competence / security breach requests), authenticate the user, and check whether the user is authorized to make the request and receive the result of the request.
In an ecosystem with many custom-small LMs, there is a need for "data governance" practices in data access and management. In the event that an LM's own data sets are accessed by another LM, it is necessary to be able to make permission / disallowance decisions with a correct data governance approach.
In order to manage simultaneous client or busy client situations well, it is necessary to micro-architect custom-LMs as much as possible and thus facilitate both horizontal and scaling, and to benefit from cloud computing and similar container architectures. To give an example from a telecom company, it is necessary to divide the campaign-LM into smaller LMs in order to respond to a campaign-LM developed to answer which campaign, which channel and which offer details should be given to 35 million subscribers. By dividing it into micro-LMs such as finding the right campaign, finding the right channel, finding the right product, resources should be used effectively and scalable as infrastructure.

As professionals who have been involved in software development and architecture roles in the IT industry for years, it can be seen how well the evolution from monolithic applications to micro-service and data-mesh architectures will support the solution here.

In fact, typical operational system application development architectures and AI application development approaches, which seem to diverge at many points, can create a synergy at this point.

To explain the main reason for the divergence with an anology: classic enterprise applications (such as CRM, ERP) are actually similar to our organs and the tissues and cells underneath. We can liken the methods developed at the atomic level to cells, object-oriented business logics consisting of methods to tissues, and applications consisting of business logics to organs. What they have in common is that they perform "specific tasks". The heart pumps blood, the stomach grinds the food we eat. The heart does not suddenly decide on its own to grind food instead of pumping blood. It has no consciousness of its own, no autonomous decision-making. But the technology that comes with generative intelligence and AGI (general artificial intelligence) is similar to tumor cells, which we expect to have good intentions but cannot be 100% sure. Since we provide the ability to develop itself autonomously, it has dimensions that surprise us a lot and can increase productivity very seriously with its creativity. Therefore, an organ such as the "brain" must constantly monitor these cells and take them under control at points where they are "going in the wrong direction". In order to prevent this difference from creating a "risk", it is necessary to build a system suitable for two different cell types. If we go back to the application architecture, we need to build Gen-AI applications that we need to allow to develop in a controlled manner, without hindering their efficiency and performance, but with architectures that we can control.

In order to explain the right architectural solution, it is first necessary to explain the logic of generative intelligence, customized multimodal language models, micro-service and data-mesh architectures:

Multimodal Language Models (MLLM)

Multimodal Language Models are artificial intelligence systems trained on various datasets, including text, images, video, audio, structured database data. They can understand and generate information across these different modalities, thus providing a level of interaction that more closely mirrors human communication.

In enterprise applications, MLLMs can offer transformative capabilities:

- Customer Interaction: Improving customer service by interpreting and responding to queries with both text and visual content.

- Content Creation: Automating the creation of marketing materials that combine written and visual elements.

- Training and Development: Providing rich, interactive training materials that combine educational text with supporting images and videos.

Specific Multimodal Language Models with Special Training (CMLLM)

Based on deep learning and natural language processing technologies, these models are capable of understanding and making connections between various types of data, including text, images and audio. Specific MLLMs are trained for a particular domain or functionality, for example, they may be optimized for medical imaging and reporting, customer service dialogues or creative design processes.

Unlike MLLMs for general use, custom-trained models are customized in terms of training data sets, algorithms and targeted outputs. This enables them to better understand and interpret the jargon, visual aesthetics or tone of voice of a particular industry. For example, a custom MLLM developed for a law firm will be much more capable than a standard model in understanding and rendering legal terms and documents.

This customization enables models to produce results with higher accuracy and in context, so they can be directly integrated into workflows and provide solutions to the specific challenges organizations face. This deepened understanding brought by specific MLLMs allows businesses to personalize customer interactions, improve decision-making processes of automated systems, and make data analytics results more precise.

The most important feature of CMLLMs in our solution is that they are competent, flexible and manageable enough to be loaded into microservices instead of complex, monolithic, huge language models consisting of billions of parameters. The key approach in the solution architecture is to transform huge language models into micro-language models, creating a CMLLM pool of atomic-level competencies.

The Role of Microservices

Microservices architecture involves structuring an application as a collection of loosely coupled services, each responsible for executing a specific business function. This design principle is modular in nature and fosters an agile development environment where services can be updated, deployed and scaled independently.

The microservice architecture is ideal for deploying MLLMs because of its inherent scalability and flexibility. Each microservice can be independently developed, deployed and scaled, which is crucial for MLLMs that require significant computational resources.

This architecture supports continuous integration and delivery, enabling rapid iteration and deployment of AI capabilities. It also facilitates resilience, as the failure of one microservice will not disable the entire application.

The Role of Data Mesh

Data-mesh is a socio-technical approach to data architecture and organizational design. It emphasizes the decentralized distributed nature of data ownership and architecture, with individual teams acting as stewards of their own data as a product. The concept is built on the principles of domain-oriented ownership, data as a product, self-serving data infrastructure and federated informatics governance.

Benefits of Data Mesh for Data Governance

- Autonomy: Teams have the autonomy to manage and optimize their data while ensuring that it is discoverable and interoperable with the rest of the organization's data ecosystem.

- Quality: Data products are built with quality by design, with clear accountability for the accuracy, format and use of data.

- Speed: By treating data as a product, teams can iterate and deliver data improvements quickly and gain faster access to trusted data.

- Compliance: Data-mesh facilitates compliance with governance standards and regulations by clearly defining ownership and control mechanisms for data products.

A data-mesh complements microservices by organizing decentralized data ownership and management. It essentially acts as connective tissue between the various microservices and ensures that they can access the data they need while adhering to the organization's data policies and standards.

In a microservice architecture, each service manages its own data, but Data-mesh provides an interoperable platform that provides consistent, managed access to that data across the entire system. This is particularly important for MLLMs, which require the integration of different types of data from various sources to operate effectively.

With microservices and Data-mesh working together, enterprises can build a robust, scalable and flexible AI infrastructure that can adapt to the changing needs of the business.

Artificial Intelligence, Microservices and Data-Mesh Future Trends

- Advances in Artificial Intelligence and Machine Learning: As computational power and algorithms improve, we can expect MLLMs to become even more sophisticated with improved understanding and generative capabilities across modalities.

- Expansion of Microservices: Microservices are likely to become more granular, leading to even more specialized services that can operate with greater efficiency and flexibility.

- Advancement of Data-mesh: The concept of data-mesh will mature and more organizations will adopt it as a standard for data architecture. This will include improvements in automated governance and quality control tools.

- Convergence of AI and Operations: The integration of AI into operational processes, known as AIOps, will become more widespread, enabling real-time data processing and decision making.

- Edge-AI: With the growth of IoT and edge computing, AI processing is expected to move closer to the data source, reducing latency and bandwidth utilization.

- CoPilot & AutoPilot: One of the next steps in Gen-AI is thought to be "capabilities" built on language models. A capability can be thought of as connecting to an ERP system to learn the latest status of an order, or performing the line opening operation of a newly subscribed customer. It is envisioned that Gen-AI applications with these capabilities will automatically perform some tasks and RPA / Intelligent Automation technologies will radically evolve.

- General Artificial Intelligence (AGI): The goal of AGI is the ability to demonstrate flexible and general-purpose intelligence, going beyond systems specialized in narrow areas. In simple language, it can be thought of as a form of super-artificial intelligence that knows everything best. Studies on this subject are ongoing in secret and developments are expected in the near future.

Integrating Customized Multimodal Models (CMLLM) with Micro-Service and Data-Mesh architecture

The architecture, which integrates multimodal language models into a microservices framework supported by a Data-mesh, includes several key components:

- Microservices: Individual services such as 'Text Analysis', 'Image Processing' and 'Data Retrieval' host specific MLLMs and operate independently.

- Data Mesh: A decentralized data infrastructure that provides a standardized way for services to access and share data.

- API Gateway: Serves as the entry point for external requests and routes them to the appropriate microservices.

- Orchestrator / Unifier: Coordinates complex workflows between microservices and aggregates their output into a unified response.

Deploying multimodal language models within a microservice and Data-mesh architecture poses several challenges that require careful consideration and strategic planning.

Technical Challenges:

- Integration Complexity: Ensuring seamless communication between microservices and Data-mesh can be technically challenging. Solutions include the implementation of robust API management and the use of service orchestration tools.

- Data Consistency: Maintaining data consistency across distributed services and databases is critical. Techniques such as event sourcing and Command Query Responsibility Segregation (CQRS) can help solve this problem.

Scalability Challenges:

- Dynamic Scaling: Microservices should be able to scale independently in response to changing loads. Container orchestration platforms such as Kubernetes can dynamically manage service scaling.

- Data Volume and Velocity: Data-mesh must handle high data volumes and velocities. Solutions include scalable storage options and real-time data processing frameworks.

Governance Challenges:

- Policy Implementation: It is difficult to enforce consistent governance policies across decentralized data domains. Data-mesh solves this problem by enabling federated governance where each domain adheres to a set of common governance standards.

- Security and Compatibility: Security strategies must be both robust and flexible enough to adapt to the independent nature of microservices. Practices such as implementing API gateways with secure access tokens and automated compliance checks are crucial.

Business Challenges:

- DevOps and MLOps Practices: Adopting DevOps and MLOps practices can facilitate the development and operation of microservices and MLLMs respectively by automating deployment, scaling and monitoring.

- Data Observability: Implementing data observability tools within Data-mesh enables proactive monitoring and management of data quality and anomalies.

By anticipating and addressing these challenges, enterprises can build flexible, efficient and governance-compliant AI systems that take full advantage of the potential of multimodal language models.

Wrap up

The aim of this paper is to present an architectural solution approach on how enterprises can use Gen-AI applications at scale.

Considering that on-prem open source LLMs LLaMA and Mistral 7B have 7 billion parameters, OpenAI GPT-3, which provides services over the cloud, has 175 billion parameters and GPT-4 has an unknown number of parameters, it is very difficult to use huge LLMs directly in organizations.

Such a Gen-AI application requires huge investments in GPUs and other hardware, even in terms of performance alone, such as variable incoming demand, high numbers of concurrent clients, and so on. In addition to the challenging financial impact of such large investments, the negative impact on the world in terms of water and energy consumption should also be considered in the sustainability goals of businesses.

As a solution, it is suggested that instead of adapting off-the-shelf LLMs, enterprises should develop open-source foundation models as micro-multimodal-language-models customized at atomic levels. While this avoids unnecessary investment costs, energy and water consumption, it introduces a new problem: the difficulty of managing Gen-AI applications, perhaps with hundreds of micro-language models. This is where we believe that the evolution of operational application architectures can help address this challenge.

Gen-AI applications and services can be managed with the atomic service and self-contained mini-data storage solution of the microservices architecture.

In addition, when we consider generative intelligence applications as digital assets that can perform tasks like company employees, data governance policies, authorization and data classification issues become a very difficult issue to regulate. With the introduction of Gen-AI, it will only be possible with data-mesh architecture and technologies to perform data governance with only human power without using a technology for automation. The data governance of the private data stores of mini language models built and loaded on microservices will have an end-to-end manageable architecture with data-mesh.

As KPMG Turkey, we are at your side with our international AI governance framework Trusted-AI-Framework with the right strategy, use case identification, roadmap, process, architecture and implementation services to realize Gen-AI solutions in your business.

Get in touch

Gökhan Mataracı

Innovation and Technology Consulting Leader, Partner

KPMG Türkiye

Profile |

Engin Şayan

Next Gen Solutions Leader, Partner