Next named Market Leader and Outperformer in GigaOm DLP Market Radar Report Read the Report
Updated: Dec 6, 2023   |   Chris Denbigh-White

Questions for generative AI vendor security review

Go back

Identifying risk when using Generative AI Tools

Developments in Artificial Intelligence (AI) and Machine Learning (ML) are booming. While these technologies have been in use for several years, the launch of ChatGPT, Google Bard, and other Large Language Models (LLM) introduced many to their powers and efficiencies, making the tools more readily available to the general public.

AI promises improved operational efficiency and the ability to scale services. Better yet, organizations no longer need to build these technologies themselves. Instead, they can leverage AI-as-a-Service (AIaaS) offerings to accelerate time to value.


A list of 8 different categories of Generative AI tools
Categories of Generative AI tools

These can include:

  • Chatbots: an application that simulates human conversation. Chatbots use AI, natural language processing, and machine learning to understand questions, predict a user’s needs, and generate responses. Chatbots are used on websites to answer questions from visitors, guide them to the appropriate web pages, or gather information before directing them to a human for assistance. Predictive chatbots continue to learn and improve over time as the data set of queries and responses grows.
  • Synthetic data: AI can be used to generate artificial data sets that approximate the statistical properties of real-world data. Synthetic data sets can be used to train machine learning models or test algorithms without the need for large amounts of real-world data or compromising the privacy of humans.
  • AI-generated code: AI has the potential to automate many aspects of software development to make it faster and more efficient. Natural language processing can turn descriptions of a desired program into executable code. AI techniques like machine learning can be trained on large sets of existing code to identify patterns and generate new code similar to the training data.
  • Search: AI search tools simplify searches by providing natural language responses. However, models like ChatGPT and Google Bard are “black box” models with inner workings that are not easily understandable and are not great at logical reasoning, leading to what many call hallucinations. In reality, the system performs how it was designed: statistically choosing word combinations rather than compiling information and using critical thinking to draw conclusions. New search tools are emerging that are trained on specific data sets and designed to provide evidence-based responses.
  • Text and image content: Various tools convert natural language descriptions into images, slide presentations, video, and audio. These tools can generate concept art, visual prototypes, and other design materials or PowerPoint presentations from written or spoken descriptions. Other tools of this type can be trained to create marketing content, including emails, blog posts, 1:1 messaging, and long-form content.

Security Risks of AI as a Service


The top security challenges of generative AI tools by category
Security Challenges of Generative AI Tools

While these solutions have many benefits, organizations should also do their due diligence to understand the third-party risk they present.

Intellectual Property risk

Information entered into any AI system may become part of the system’s data set. While OpenAI states they do not use content submitted through their API to improve their service, they can use content from non-API sources for that purpose. Most services will use feedback from users to improve their responses. 

Depending on the specific system and how it is designed and implemented, this function, intended to improve the AI model, could put your IP at risk. For example, the system could use your images, text, or source code as examples or as the basis for answers provided to other users. Accidental inclusion in training data could expose trade secrets or inadvertently provide other users with similar images or product functionality. Similarly, a patent application or other IP uploaded to a language translation application may become part of that system’s training set.

This risk is why, after an employee uploaded proprietary source code to a system while trying to debug the code, Samsung banned employees from using public Chatbots. A single employee simply trying to get work done potentially exposed their IP and vulnerabilities in the semiconductors that an attacker could exploit. Another Samsung employee reportedly submitted confidential notes on an internal meeting to ChatGPT and asked it to create a presentation from the notes.

Emerging technologies bring both opportunities and challenges. As organizations adopt generative AI tools, it is essential to assess the potential risks involved. Conducting a thorough vendor assessment can help identify any security vulnerabilities and ensure appropriate security controls are in place. Risk assessments should be performed regularly to avoid emerging threats and protect sensitive data. Organizations can mitigate the risks of using generative AI tools by taking proactive measures and implementing robust security practices.

What you can do

To reduce these risks, it is essential to carefully evaluate any AI system before entering your IP into it. Evaluation may involve assessing the system's security measures, encryption standards, data handling policies, and ownership agreements. From a data protection standpoint, acknowledge that AI systems, including Chatbots, search engines, text/image converters, are potential data exfiltration channels.

Questions to ask vendors about IP risks

  • Does the service claim any rights to user inputs?
  • Do those rights vary based on the input method (i.e. through a web UI v. API)?
  • Does the service use user feedback? Is it used without restrictions?
  • How is feedback data protected? Is it attributable to users or licensees? Is it anonymized and encrypted?
  • Are there carve-outs in the confidentiality agreements for the use of feedback?
  • How long does the service provider save data submitted by users?
  • Are there opt-out provisions for the use of your data?

IP Ownership

An AI system’s only source of information is its training set. In a public system with input from multiple entities, the output can potentially be based on proprietary data from other users. For example, a training set may include copyrighted material from books, patent applications, web pages, and scientific research. Suppose this AI system is used to create a new product or invention. In that case, the output may be subject to patent protection or IP rights by the user who originally submitted that information. Also, consider the opposite situation. The US Copyright Office recently ruled that the output from AI is not eligible for copyright protection unless it includes “sufficient human authorship.” 

What you can do

While OpenAI and Google Bard explicitly state that users own both input and output and do not use user-inputted information to augment their training base, other services may differ. Data privacy is a critical consideration when using generative AI tools. Organizations must review the terms and conditions of the service to understand how data is stored and handled. Ensuring that any data submitted to the service is encrypted and protected from unauthorized access is vital. Questions to ask vendors about data privacy include whether they store data, how long it is stored, and whether it is anonymized and encrypted. Organizations can protect sensitive information and comply with privacy regulations by prioritizing data privacy. AI users should be aware of the potential IP rights associated with the output generated by AI systems and take appropriate measures to protect their IP rights and respect the IP rights of others. A system's legal terms and conditions may assign rights to its output to a user, but it is unclear whether these systems have the legal right to make those assignments. Include your legal team when evaluating any AI system.

Questions to ask vendors about IP ownership

  • Do you have full rights to outputs, including patent rights to novel algorithms or models produced by the service?
  • Are users required to attribute output to the service?

Attacks on AI Systems

Many tools have restrictions for their use. For example, most chatbots have been programmed not to provide examples of malicious code, such as ransomware and viruses. Most will not engage in hate speech or illegal activities. There may also be consequences for violating the guidelines, ranging from warnings to account suspension or termination, depending on the severity of the violation.

However, inputs to AI and ML systems can be viewed as similar to inputs to other applications. In a software application, developers implement input validation to prevent attacks like SQL injection and cross-site scripting. AI systems are just coming to terms with system hacks. For example, indirect prompt injection is when an LLM is asked to analyze some text on the web and instead starts to take instructions from that text. Researchers have shown that this could be used to trick users of website chatbots into providing sensitive information. Other “jailbreaking” techniques can cause systems to violate controls and produce hateful content.

What you can do

Include AI systems in your appropriate use policies to protect your organization against reputational damage. Have a reporting mechanism for unusual output and an incident response plan in case of an attack on the system.

Questions to ask vendors about malicious attacks

  • What steps has the vendor taken to mitigate risk from malicious input?
  • Who is responsible for activities/actions taken using your credentials?
  • Who is responsible for the output of the tool?
  • Who is legally liable if you use the service?
  • How is the service’s compliance with applicable laws measured and enforced?
  • Must the user comply with applicable laws when using the service?

Data Privacy

Data provided to AI and ML systems may be stored by their providers. Suppose said data includes personal information from users, such as names, email addresses, and phone numbers. In that case, this data is subject to privacy regulations such as the California Consumer Privacy Act (CCPA), the Virginia Consumer Data Protection Act, Europe’s General Data Protection Regulation (GDPR), and similar regulations.

Some AI service providers, such as those producing social media content and demand generation emails, may integrate with other services such as Facebook accounts, content management systems, and sales automation systems. Depending on the service provider, this could expose sensitive data or cause it to be uploaded to the provider’s servers.

What you can do

Review the terms and conditions of any service you use. If you use an AI system or service that stores data, you should ensure the data you submit is encrypted to protect it from unauthorized access. This review can help prevent data breaches and protect sensitive personal, financial, or other confidential information.

Questions to ask vendors about data privacy

  • Do you store data we submit to the service? For how long?
  • Is the data we submit anonymized and encrypted?
  • Do we have the ability to delete specific data?
  • Is the data attributable to a user or entity?
  • Is the data used for any internal or external purposes?

General Cyber-hygiene

You should train users to practice cyber hygiene like any other third-party service. Maintaining good cyber hygiene is essential when using generative AI tools. Users should be trained on security awareness and best practices, such as using strong passwords and not sharing account credentials. Multi-factor authentication should be implemented for systems handling sensitive information. Organizations should also ensure that they have valid licenses for the AI tools they use and comply with the service's license agreements. By practicing good cyber hygiene, organizations can reduce the risk of unauthorized access and protect their sensitive data.

What you can do

Train your users on security awareness and cyber hygiene. Institute strong password policies across all applications and use multi-factor authentication for systems handling sensitive information.

Questions to ask vendors about general cyber-hygeine

  • Is an account needed to access the platform?
  • Can the user access the tool instantly?
  • Is multi-factor authentication available?
  • Is accurate and complete information required to create an account?
  • Does the service provider verify user information? If so, how?
  • Can you use the service on behalf of another person?
  • Can users share login credentials outside of a licensee’s domain?
  • Is there an age restriction for using the service?

Protect Your Sensitive Data

Finally, organizations should consider leveraging Data Loss Prevention to monitor and control the information submitted to such services. It is essential to have the ability to recognize and classify sensitive data as it is accessed and used. Organizations can proactively identify and mitigate the risk of data loss or unauthorized disclosure by implementing DLP measures when using generative AI tools. Some of these services can serve as exfiltration channels. Your DLP should include recognizing and classifying sensitive data as it is accessed and used. For example, suppose a user attempts to copy sensitive data into a query or adds customer information to a service. In that case, your DLP platform should recognize that attempt, track the steps leading up to the event, and intervene before the data is leaked.

Watch an on-demand demo to learn more about how Reveal helps with insider risk management and data loss prevention.

Demo

See how Next protects your employees and prevents data loss