An application that runs on an enterprise’s datacentre storage systems (SAN) and does real time monitoring of the SAN I/O activity. It employs an advanced AI solution for anomaly detection that allows blocking malware from corrupting enterprise data.
We were confronted with providing a generic solution that can protect against new and unknown threats. The malware detection accuracy needs to be very good, while false positives need to be kept at a minimum or not present at all. Detection and protection need to be real-time and has to span across the whole SAN network and SAN performance must not be affected.
We have used AI anomaly detection techniques that determine what is “normal” traffic and allow it to pass while “suspicious” traffic is blocked. Model training and evaluation was done with lots of real data collected from production SAN logs. The collected data set was further processed and enhanced to obtain an even greater synthetic “real-like” dataset. Simulated SAN environments were set up and malware was released there to collect malware footprints. All this data was used to tweak model parameters to ensure good precision and recall scores.
The architecture of the system was a distributed one, with sensors on each SAN node and dedicated processing nodes that were running the detection model. The model used was based on a home-grown decision tree variant that was both accurate and lightweight enough for the use case. A lot of effort went into hyperparameter tuning to minimize the model while maintaining the accuracy.
- NetApp Clustered Data ONTAP
- REST endpoints
- Virtualization on AWS