Skip to main content

Is It Ever Okay for AI Providers to Train on Your Company’s Data? Yes!

Is It Ever Okay for AI Providers to Train on Your Company’s Data? Yes!

Andrew (A.J.) Tibbetts and Kieran Dwyer

Access to quality data is the foundation for developing effective artificial intelligence (AI) models. As a result, one of the most common issues for companies when evaluating an AI-powered product is whether a model provider can or will train its AI on customers’ (such as your company’s) data. Customers often have good reasons to prevent a vendor’s models from training on their data, including that they need to protect their confidential information and comply with privacy and other regulatory requirements. A hardline requirement for training, however, can put some vendors in a difficult position because it can limit their ability to improve their product for their customers. And it may limit the company’s own options on which vendors to work with. There are myriad instances when both the customer and the vendor might benefit from allowing models to train on customer data. While companies are right to be careful with their data, it is important to understand that allowing a vendor access to data does not have to be an all-or-nothing decision. Instead, a fact-driven analysis may be appropriate, which may turn on factors like the purpose to which the company/customer will put the vendor’s model, the categories of company data the model will be trained on, and the sensitivity to the company of each data category.

One specific industry example may be illustrative. Allowing AI models to train on data has historically benefited the security industry. While many companies have only recently started grappling with how to use and manage AI within their organizations, cybersecurity companies have long used AI to improve their products. In fact, anomaly detection for security purposes was one of the earliest use cases for machine learning. For example, security companies may use AI to analyze network traffic to identify and prevent possible threats at a scale beyond human or algorithmic capability alone.

Today, AI is successfully used in many different security technologies, and it often makes sense for customers to allow vendors to train their algorithms on the customer’s data. These AI technologies perform best when they have access to as much threat information as possible because access to more data improves the accuracy of the model. That’s why many AI security providers’ contract terms contain the right to train on data they collect while providing a service. Without this right, the security provider likely won’t be able to identify and prevent threats with the same efficacy. In some cases, it may not even be possible for the security vendor to agree not to train on threat data because the technical architecture of their product depends on correlating threat data across multiple data sets.

Companies should still take appropriate precautions for confidential, personal, and other sensitive information when working with security vendors, but understanding the legitimate use cases for training security products can help accelerate negotiations. The good news is that many security vendors are familiar with this issue and clearly delineate the allowed uses of threat data. A word of caution, though—this does not mean all AI tools offered by vendors are made equal. Even though vendors might have legitimate and necessary reasons to train on certain data for security purposes, many security vendors are implementing supplemental features that rely on separate large language models (LLMs) that improve the user experience but are not necessary for providing security services. These AI features can be evaluated separately to help ensure confidential and sensitive data is protected.

Security, however, is not the only reason to allow AI training using a company’s data. Many new AI products are in their early stages as the technology continues to develop. In these cases, customers may benefit by allowing training on some types of data to help improve the product and get better results. A prime example is when corrections need to be made to the AI model. Corrections can improve the value of the product for the customer by identifying and addressing instances where models produce inaccurate results. Using corrections to improve a product is not a new concept in technology contracts. The right for the vendor to train on correction data is similar to the feedback clause common in technology agreements that allows the vendor the right to use feedback to improve its service and address customer requests. For AI contracts, that same principle can apply if the scope of what is considered a correction is sufficiently defined.

Another area where it is reasonable to allow for training on customer data is when the data involved does not contain sensitive or proprietary information. For example, for an AI that manages standard invoice routing and payment, it might make sense to allow the AI to train on the workflow and metadata to improve the model. The workflow and invoice routing process are likely not proprietary, and its use in improving the AI model’s accuracy could be beneficial. However, if the invoices contain competitive pricing information, then a company may want to prohibit training on its invoice content to protect sensitive information. In this example, the key is to carefully delineate the type of data and the scope of training that will occur. Similarly, there is often little downside to allowing a model to train on content that is intended to be customer-facing in the first place, such as models used to develop public marketing content. Training could be limited to categories of data that will not offer confidential or competitive insights while still enabling the vendor to improve the model. 

Companies need to protect their content when using AI to ensure confidentiality and meet regulatory requirements for AI tools, but it is important to remember this will be a fact specific analysis. In many cases, this may mean customers still require a broad prohibition on AI providers using content to train their models, but with exceptions for legitimate instances where that training improves the customer value. Each AI technology should be evaluated by customers during contract negotiations for how it might use a company’s data and whether the company could benefit from allowing such use. Companies can also anticipate new AI features and future use cases, as vendors will continue to expand the capabilities of their offerings and add new AI features. The key is finding the right balance—one that protects what matters most while allowing room for the AI improvements that will benefit the organization and encourage continued, responsible evolution.

LINKS 

Read “Is It Ever Okay for AI Providers to Train on Your Company’s Data? Yes!“ authored by Andrew (A.J.) Tibbetts and Kieran Dwyer published in Lawyers Weekly. (subscription)