Expecting the Unexpected!
It is a truism that if you set up a system designed to work in a certain way, someone, somewhere will find a way around it. This is encapsulated in ‘The Law of Attraction’ which states that a system specifically designed to prevent a certain type of behavior will attract those who seek to engage in that very behavior.
This highlights the importance of continually monitoring systems and implementing measures to stay ahead of potential security or other issues.
When looking to mitigate or simply highlight instances of irregularities occurring in a particular system, it is often seemingly difficult to obtain sufficient examples of the particular behavior in order to adequately train a machine learning model to find them.
This is where the science (or art) of anomaly detection comes in. There are a number of methods to ensure a sufficient number of anomalous instances are available to train a model. This may include use of generative models to synthesize synthetic anomalous data, data augmentation such as over-sampling or under-sampling techniques, and using specialized modeling techniques that do not require as much data to be trained. Whatever method is used, it needs to be relevant to the situation.
Examples of where anomaly detection can be deployed are numerous. Online fraud is often one of the first use cases to be thought of in this space. Its proliferation has meant that it is more important than ever to get on top of unexpected behavior of the web’s digital denizens.
But how unexpected is fraud? Not very, according to a report by the FTC in the US which claimed that in 2021 it received 2.8 million fraud reports translating into $5.8 billion in losses – a 70% increase on 2020. The problem is only accelerating, with the 2022 Better Business Bureau's Scam Tracker Risk report claiming online purchase scams were the riskiest form of consumer fraud in the US, with a median loss of $100 per victim. However, in the context of a particular system, fraud instances are in-fact rare, comprising only a small percentage of all digital interactions, and this is where the challenge lies.
Fraud is of course not just online and manifests all over the place – for example in B2B relationships where providers submit false or artificially expensive invoices for services they may or may not have rendered, or in the insurance industry, where exaggerated losses are occasionally reported in an attempt to gouge out increased claim payments. Here the challenge is not just the poverty of anomalous examples but also the identification of when a number is in fact an unwanted outlier.
Of course, fraud is not the only area where anomaly detection techniques can be deployed. It can, for example, be used with SaaS products to identify possible errors or unusual behavior in transactions, in manufacturing to highlight deficiencies in product quality or with industrial machinery through predictive maintenance, in retail to identify slow moving inventory, or in financial trading to spotlight unusually behaving instruments, in the waste handling industry to detect unusual concentrations of toxins or spikes in waste production both in the community or by business, and in transport and logistics where unexpected bottlenecks can be found early and addressed.
This list is not exhaustive, but in all these situations, focus is put on the outliers and further analytics can be run to determine their likely causes. Interestingly, the deployment of multiple model types employing classifiers or clustering, for example, can augment and enrich the insights derived from the anomaly detection process. As such, anomaly detection modelling is not to be seen as a stand-alone process, but one employed within the context of a broader analytic approach.
Food for thought!
If any of these use-cases speak to you or you have a machine-learning challenge that needs a look, please feel free to reach out and I’ll be happy to assist.