Data Security in the Era of AI

Artificial Intelligence (also commonly known as AI) is becoming increasingly pervasive in our everyday life. From Siri on your iPhone or Google Assistant on your Android in your pocket, to smart TV or smart speakers in your home, to many applications that you interact with digitally on regular basis, it is probably fair to say AI is being embedded into our daily lives whether we like or not. Looking at it from the positive perspective, it is surely poised to make our lives better. However, there are potential risks like spreading disinformation at scale which AI can bring if not managed properly, especially in the era of generative AI such as ChatGPT.
In this post, let’s take a closer look at AI in 4 key areas that can impact data security from a security professional perspective, which are Data Aggregation, AI Frameworks, AI Architecture and AI Development Tools.


Data Aggregation

Data aggregation is the process of compiling raw data from different sources on the similar concepts and entering it into a single source for further data processing. Aggregation of simple dataset could lead to higher sensitivity of data in the collection. A typical example is, a name by itself isn’t so sensitive, but a name, address or phone number aggregated together would be considered as Personal Identifiable Information (PII). Without proper control in place for the data aggregation process, a public or unclassified dataset can quickly become filled with sensitive data with little effort. This is particularly problematic with PII, health information and financial data.


AI Frameworks
AI frameworks provide data scientists, AI developers, and researchers the building blocks to architect, train, validate, and deploy models, through a high-level programming interface. There are quite few AI frameworks being used nowadays, like TensorFlow, Amazon ML, Apache Mahout, Microsoft CNTK etc. General speaking, they all have their own user communities, that is not necessary a bad thing, however, from AI security perspective, the key challenge with this situation is lack of unification of strong data security standards across different frameworks, and that would pose intrinsic data security risks while AI applications are being developed without being noticed much.


AI Architecture
There are specific AI architectures for specific AI models. The following diagram is a typical high-level generic AI architecture.

While much of the attention has been paid to functions, there isn’t much secure by design principle has been applied to, especially from end-to-end security perspective. In fact, studies show lots of AI development teams focus more on functionalities than security. A cultural change would be needed in this area.

AI Development Tools
The AI development environment is an often-overlooked vector for introducing vulnerabilities in the AI ecosystem. For example, unpatched Integrated Development Environments (IDE) like Visual Studio, Jupyter Notebook, Atom and others can pose potential security risks; unsecure passworded code repositories like GitHub, GitLab, BitBucket and others can lead to exposure of code that could be used to craft exploits. In addition, most development teams code for functionality and security are often not checked until systems are in operation. Code reviews and dynamic app scanning should be integrated into the software development life cycle from the beginning by adopting “shift-left” best practice.

Related Post