Audit Security Unveils OWASP Top 10 Guidelines for Large Language Model Applications

The OWASP Top 10 for Large Language Model ApplicationsOWASP Top 10 for Large Language Model Applications project aims to inform developers, designers, architects, managers and organisations about potential security risks when deploying and managing large language models (LLMs). The project provides a list of the 10 most critical vulnerabilities commonly found in LLM applications, highlighting their potential impact, ease of use, and prevalence in real-world applications. Examples of vulnerabilities include rapid injection, data leakage, inadequate isolated programming environment and unauthorised code execution, among others. The goal is to raise awareness of these vulnerabilities, suggest remediation strategies, and ultimately improve the security posture of LLM applications.

Checklist

Adversarial Risk

Adversarial Risk includes competitors and attackers.

Scrutinize how competitors are investing in artiﬁcial intelligence. Although there are risks in AI adoption, there are also business beneﬁts that may impact future market positions.
Investigate the impact of current controls, such as password resets, which use voice recognition which may no longer provide the appropriate defensive security from new GenAI enhanced attacks.
Update the Incident Response Plan and playbooks for GenAI enhanced attacks and AIML speciﬁc incidents.

Threat Modeling

Threat modeling is highly recommended to identify threats and examine processes and security defenses. Threat modeling is a set of systematic, repeatable processes that enable making reasonable security decisions for applications, software, and systems. Threat modeling for GenAI accelerated attacks and before deploying LLMs is the most cost effective way to Identify and mitigate risks, protect data, protect privacy, and ensure a secure, compliant integration within the business.

□ How will attackers accelerate exploit attacks against the organization, employees, executives, or users? Organizations should anticipate “hyper-personalized” attacks at scale using Generative AI. LLM-assisted Spear Phishing attacks are now exponentially more effective, targeted, and weaponized for an attack.

□ How could GenAI be used for attacks on the business’s customers or clients through spooﬁng

or GenAI generated content?
□ Can the business detect and neutralize harmful or malicious inputs or queries to LLM solutions?

□ Can the business safeguard connections with existing systems and databases with secure

integrations at all LLM trust boundaries?
□ Does the business have insider threat mitigation to prevent misuse by authorized users?

□ Can the business prevent unauthorized access to proprietary models or data to protect

Intellectual Property?
□ Can the business prevent the generation of harmful or inappropriate content with automated

content ﬁltering?

AI Asset Inventory

An AI asset inventory should apply to both internally developed and external or third-party solutions.

□ Catalog existing AI services, tools, and owners. Designate a tag in asset management for

speciﬁc inventory.
□ Include AI components in the Software Bill of Material (SBOM), a comprehensive list of all the

software components, dependencies, and metadata associated with applications.
□ Catalog AI data sources and the sensitivity of the data (protected, conﬁdential, public)

□ Establish if pen testing or red teaming of deployed AI solutions is required to determine the

current attack surface risk.
□ Create an AI solution onboarding process.

□ Ensure skilled IT admin staff is available either internally or externally, following SBoM

requirements.

AI Security and Privacy Training

□ Actively engage with employees to understand and address concerns with planned LLM

initiatives.
□ Establish a culture of open, and transparent communication on the organization’s use of

predictive or generative AI within the organization process, systems, employee management and support, and customer engagements and how its use is governed, managed, and risks addressed.

□ Train all users on ethics, responsibility, and legal issues such as warranty, license, and copyright. □ Update security awareness training to include GenAI related threats. Voice cloning and image

cloning, as well as in anticipation of increased spear phishing attacks
□ Any adopted GenAI solutions should include training for both DevOps and cybersecurity for

the deployment pipeline to ensure AI safety and security assurances.

Establish Business Cases

Solid business cases are essential to determining the business value of any proposed AI solution, balancing risk and beneﬁts, and evaluating and testing return on investment. There are an enormous number of potential use cases; a few examples are provided.

□ Enhance customer experience
□ Better operational efﬁciency
□ Better knowledge management

□ Enhanced innovation
□ Market Research and Competitor Analysis
□ Document creation, translation, summarization, and analysis

Governance

Corporate governance in LLM is needed to provide organizations with transparency and accountability. Identifying AI platform or process owners who are potentially familiar with the technology or the selected use cases for the business is not only advised but also necessary to ensure adequate reaction speed that prevents collateral damages to well established enterprise digital processes.

□ Establish the organization’s AI RACI chart (who is responsible, who is accountable, who should

be consulted, and who should be informed)
□ Document and assign AI risk, risk assessments, and governance responsibility within the organization.
□ Establish data management policies, including technical enforcement, regarding data classiﬁcation and usage limitations. Models should only leverage data classiﬁed for the minimum access level of any user of the system. For example, update the data protection policy to emphasize not to input protected or conﬁdential data into nonbusiness-managed tools.

□ Create an AI Policy supported by established policy (e.g., standard of good conduct, data

protection, software use)
□ Publish an acceptable use matrix for various generative AI tools for employees to use.

□ Document the sources and management of any data that the organization uses from the

generative LLM models.

Legal

Many of the legal implications of AI are undeﬁned and potentially very costly. An IT, security, and legal partnership is critical to identifying gaps and addressing obscure decisions.

□ Conﬁrm product warranties are clear in the product development stream to assign who is

responsible for product warranties with AI.
□ Review and update existing terms and conditions for any GenAI considerations.

□ Review AI EULA agreements. End-user license agreements for GenAI platforms are very different in how they handle user prompts, output rights and ownership, data privacy, compliance, liability, privacy, and limits on how output can be used.

□ Organizations EULA for customers, Modify end-user agreements to prevent the organization from incurring liabilities related to plagiarism, bias propagation, or intellectual property infringement through AI-generated content.

□ Review existing AI-assisted tools used for code development. A chatbot’s ability to write code can threaten a company’s ownership rights to its product if a chatbot is used to generate code for the product. For example, it could call into question the status and protection of the generated content and who holds the right to use the generated content.

□ Review any risks to intellectual property. Intellectual property generated by a chatbot could be in jeopardy if improperly obtained data was used during the generative process, which is subject to copyright, trademark, or patent protection. If AI products use infringing material, it creates a risk for the outputs of the AI, which may result in intellectual property infringement.

□ Review any contracts with indemniﬁcation provisions. Indemniﬁcation clauses try to put the responsibility for an event that leads to liability on the person who was more at fault for it or who had the best chance of stopping it. Establish guardrails to determine whether the provider of the AI or its user caused the event, giving rise to liability.

□ Review liability for potential injury and property damage caused by AI systems.
□ Review insurance coverage. Traditional (D&O) liability and commercial general liability

insurance policies are likely insufﬁcient to fully protect AI use.
□ Identify any copyright issues. Human authorship is required for copyright. An organization

may also be liable for plagiarism, propagation of bias, or intellectual property infringement if LLM tools are misused.

□ Ensure agreements are in place for contractors and appropriate use of AI for any development

or provided services.
□ Restrict or prohibit the use of generative AI tools for employees or contractors where

enforceable rights may be an issue or where there are IP infringement concerns.
□ Assess and AI solutions used for employee management or hiring could result in disparate

treatment claims or disparate impact claims.
□ Make sure the AI solutions do not collect or share sensitive information without proper consent

or authorization.

Regulatory

The EU AI Act is anticipated to be the ﬁrst comprehensive AI law but will apply in 2025 at the earliest. The EUś General Data Protection Regulation (GDPR) does not speciﬁcally address AI but includes rules for data collection, data security, fairness and transparency, accuracy and reliability, and accountability, which can impact GenAI use. In the United States, AI regulation is included within broader consumer privacy laws. Ten US states have passed laws or have laws that will go into effect by the end of 2023.

Federal organizations such as the US Equal Employment Opportunity Commission (EEOC), the Consumer Financial Protection Bureau (CFPB), the Federal Trade Commission (FTC), and the US Department of Justiceś Civil Rights Division (DOJ) are closely monitoring hiring fairness.

□ Determine Country, State, or other Government speciﬁc AI compliance requirements.
□ Determine compliance requirements for restricting electronic monitoring of employees and

employment-related automated decision systems (Vermont, California, Maryland, New York, New Jersey)

□ Determine compliance requirements for consent for facial recognition and the AI video analysis

required (Illinois, Maryland, Washington, Vermont)
□ Review any AI tools in use or being considered for employee hiring or management.

□ Conﬁrm the vendorś compliance with applicable AI laws and best practices.
□ Ask and document any products using AI during the hiring process. Ask how the model was

trained, and how it is monitored, and track any corrections made to avoid discrimination and bias.

□ Ask and document what accommodation options are included.
□ Ask and document whether the vendor collects conﬁdential data.
□ Ask how the vendor or tool stores and deletes data and regulates the use of facial recognition

and video analysis tools during pre-employment.
□ Review other organization-speciﬁc regulatory requirements with AI that may raise compliance issues. The Employee Retirement Income Security Act of 1974, for instance, has ﬁduciary duty requirements for retirement plans that a chatbot might not be able to meet.

Using or Implementing Large Language Model Solutions

□ Threat Model LLM components and architecture trust boundaries.
□ Data Security, verify how data is classiﬁed and protected based on sensitivity, including

personal and proprietary business data. (How are user permissions managed, and what safeguards are in place?)

□ Access Control, implement least privilege access controls and implement defense-in-depth measures
□ Training Pipeline Security, require rigorous control around training data governance, pipelines, models, and algorithms.
□ Input and Output Security, evaluate input validation methods, as well as how outputs are ﬁltered, sanitized, and approved.
□ Monitoring and Response, map workﬂows, monitoring, and responses to understand

automation, logging, and auditing. Conﬁrm audit records are secure.
□ Include application testing, source code review, vulnerability assessments, and red teaming in

the production release process.
□ Check for existing vulnerabilities in the LLM model or supply chain.

□ Look into the effects of threats and attacks on LLM solutions, such as prompt injection, the

release of sensitive information, and process manipulation.
□ Investigate the impact of attacks and threats to LLM models, including model poisoning,

improper data handling, supply chain attacks, and model theft.
□ Supply Chain Security, request third-party audits, penetration testing, and code reviews for

third-party providers. (both initially and on an ongoing basis)
□ Infrastructure Security, ask how often a vendor performs resilience testing? What are their

SLAs in terms of availability, scalability, and performance?
□ Update incident response playbooks and include an LLM incident in tabletop exercises.

□ Identify or expand metrics to benchmark generative cybersecurity AI against other approaches

to measure expected productivity improvements.

Testing, Evaluation, Veriﬁcation, and Validation (TEVV)

NIST AI Framework recommends a continuous TEVV process throughout the AI lifecycle which includestheAIsystemoperators, domainexperts, AIdesigners, users, productdevelopers, evaluators, and auditors. TEVV includes a range of tasks such as system validation, integration, testing, recalibration, and ongoing monitoring for periodic updates to navigate the risks and changes of the AI system.

□ Establish continuous testing, evaluation, veriﬁcation, and validation throughout the AI model

lifecycle.
□ Provide regular executive metrics and updates on AI Model functionality, security, reliability,

and robustness.

Model Cards and Risk Cards

Model cards and risk cards are foundational elements for increasing the transparency, accountability, and ethical deployment of Large Language Models (LLMs). Model cards help users understand and trust AI systems by providing standardized documentation on their design, capabilities, and constraints, leading them to make educated and safe applications. Risk cards supplement this by openly addressing potential negative consequences, such as biases, privacy problems, and security vulnerabilities, which encourages a proactive approach to harm prevention. These documents are critical for developers, users, regulators, and ethicists equally since they establish a collaborative atmosphere in which AIś social implications are carefully addressed and handled. These cards, developed and maintained by the organizations that created the models, play an important role in ensuring that AI technologies fulﬁll ethical standards and legal requirements, allowing for responsible research and deployment in the AI ecosystem.

Model cards include key attributes associated with the ML model:
• Model details: Basic information about the model, i.e., name, version, and type (neural network,

decision tree, etc.), and the intended use case.
• Model architecture: Includes a description of the structure of the model, such as the number

and type of layers, activation functions, and other key architectural choices.
• Training data and methodology: Information about the data used to train the model, such

as the size of the dataset, the data sources, and any preprocessing or data augmentation techniques used. It also includes details about the training methodology, such as the optimizer used, the loss function, and any hyperparameters that were tuned.

Performance metrics: Information about the model’s performance on various metrics, such as accuracy, precision, recall, and F1 score. It may also include information about how the model performs on different subsets of the data.
Potential biases and limitations: Lists potential biases or limitations of the model, such as imbalanced training data, overﬁtting, or biases in the model’s predictions. It may also include information about the modelś limitations, such as its ability to generalize to new data or its suitability for certain use cases.
Responsible AI considerations: Any ethical or responsible AI considerations related to the model, such as privacy concerns, fairness, and transparency, or potential societal impacts of the model’s use. It may also include recommendations for further testing, validation, or monitoring of the model.

The precise features contained in a model card may differ based on the model’s context and intended usage, but the purpose is to give openness and accountability in the creation and deployment of machine learning models.

□ Review a models model card

□ Review risk card if available
□ Establish a process to track and maintain model cards for any deployed model including

models used through a third party.

RAG: Large Language Model Optimization

Fine tuning, the traditional method for optimizing a pre-trained model, involved retraining an existing model on new, and domain-speciﬁc data, modifying it for performance on a task or application. Fine-tuning is expensive but essential to improve performance.

Retrieval-Augmented Generation (RAG) has evolved as a more effective way of optimizing and augmenting the capabilities of large language models by retrieving pertinent data from up to date available knowledge sources. RAG can be customized for speciﬁc domains, optimizing the retrieval of domain-speciﬁc information and tailoring the generation process to the nuances of specialized ﬁelds. RAG is seen as a more efﬁcient and transparent method for LLM optimization, particularly for problems where labeled data is limited or expensive to collect. One of the primary advantages of RAG is its support for continuous learning since new information can be continually updated at the retrieval stage.

The RAG implementation involves several key steps starting from embedding model deployment, indexing the knowledge library, to retrieving the most relevant documents for query processing. Efﬁcient retrieval of the relevant context is made based on vector databases which are used for storage and querying of document embeddings.

RAG Reference

□ Retrieval Augmented Generation (RAG) & LLM: Examples
□ 12 RAG Pain Points and Proposed Solutions

AI Red Teaming

AI Red Teaming is an adversarial attack test simulation of the AI System to validate there arent´ any existing vulnerabilities which can be exploited by an attacker. It is a recommended practice by many regulatory and AI governing bodies including the Biden administration. Red-teaming alone is not a comprehensive solution to validate all real-world harms associated with AI systems and should be included with other forms of testing, evaluation, veriﬁcation, and validation such as algorithmic impact assessments and external audits.

□ Incorporate Red Team testing as a standard practice for AI Models and applications.

Some useful links – checklist and LLMs

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

OWASP Top 10 for Large Language Model Applications

Checklist

Related posts