
A few weeks ago, Arete’s Threat Intelligence team outlined the history of artificial intelligence (AI). Today, we continue that conversation, exploring data privacy concerns associated with AI tools. AI use cases are often showcased to consumers without warning of potential dangers in their application. When a service is free, your data is often the cost of entry.
Today, we dive into three key elements of data privacy concerns in AI:
- What information are you exposing publicly?
- What data are you putting into AI applications?
- And finally, how are you storing your data?
Operations Security (OPSEC): What information are you exposing publicly?
The public release of information can lead to both positive and negative outcomes. Classification by compilation, in which a series of seemingly harmless pieces of information are pieced together in open source, leading to exposure of proprietary, sensitive information, gives credence to the age-old saying, “Loose lips sink ships.”
You may be wondering what this has to do with AI. Any information posted publicly can be used by developers to train AI algorithms. This could lead organizations to aid their competitors indirectly, should they choose to use the same AI platforms. An example of this is a 2023 lawsuit filed by artists against a number of companies that own AI image-generating tools. The artists argued that the AI companies used their art to train algorithms without the artists being properly compensated. The court ultimately ruled against the artists, demonstrating that it is extremely difficult to prove what data was used to train AI algorithms.

What data are you putting into AI applications?
As the use of AI continues to expand, users should carefully consider what data they are exposing. When using popular public-facing AI platforms, such as those created by OpenAI, Microsoft, and Amazon, users must be aware of the type of data they input. Sensitive data, including client information, PII, and trade secrets, should not be used to prompt public-facing AI tools. Inputs into these tools are used to further train the algorithm and develop these tools.
How are you storing your data?
When an organization decides to create or collaborate on a new AI model, large amounts of data are required to train it. When considering where to store such data, cloud storage appears as an attractive option. However, it is also important to consider the options and risks associated with data storage.
One example of such risk is the May 2024 data breach suffered by cloud-based data storage company Snowflake.

The threat actor responsible for the breach, UNC5537, subsequently extorted Snowflake, leading to at least $2.7 million in ransom payments for data suppression. This attack was primarily driven by compromised credentials without MFA, demonstrating the need for organizations to not only assess their third-party risk exposure but also continually implement security best practices.
Conclusion
AI is a powerful tool for organizations looking to enable employees to work within their strengths and increase efficiency. However, the improper use of AI can have disastrous effects. It is important for organizations to develop policies and training on the implementation and use of AI to set employees up for success and ensure the security of their environments. Tune in next week for the final installment of Arete’s AI Deep Dive: Understanding Biases & How Threat Actors Use AI.