The task of detecting fraudulent online payments is a perfect use case for applying machine learning models that thrive in environments where data volume is high, and the characteristics of fraudulent transactions cannot be easily detected using only a handful of features.
Nonetheless, many fraud prevention systems still rely on hard-coded rule based systems that consolidate the aggregate knowledge of fraud experts. In this piece, we will shed light on the main differences between the two approaches and which use cases fit one or the other better.
Fraud detection systems are a lot more than just serialized machine learning models or sets of rules expressed in code using many “if-else” statements. There are a lot of other engineering challenges in various areas, including infrastructure, backend, and front-end programming. Those challenges tend to differ a bit depending on the chosen decision engine and business sector, but they are not the main topic of this blog post. Here, we will focus on just one crucial piece, a single cog in the machine - the decision engine that determines whether the transaction is fraudulent or not.
As the name suggests, those systems rely on hard-coded rules that are set to flag transactions if they meet certain criteria. Such rules can be developed by:
The rules are often expressed using “if-else” statements present in almost all imperative programming languages and are easily interpretable. They mirror the way in which a human would process a transaction — the engine checks if a transaction meets any of the risky patterns expressed in the rules, and if it does, it blocks it or sends it to be manually reviewed by humans. This is one of the reasons why their presence is still very strong - stakeholders trust them because they mimic how they tackle this task.
ML models address the shortcomings of rule based systems. They thrive in environments where the volume and dimensionality of data are high. Algorithms like decision trees, random forests, gradient boosting, or neural networks are designed to find complex, nonlinear patterns utilizing hundreds (if available) features of transactions. Such an approach demands a shift in focus.
For one, deploying ML models requires high-quality, labeled historical data used as a training dataset. The more data you have—in terms of the number of transactions and features capturing transactions’ characteristics—the better the model will perform. In such a scenario, we are trying to keep a record of past transactions with a detailed description in the form of a feature vector rather than trying to directly understand the fraud phenomenon.
Most modern fraud prevention systems function as hybrid solutions that gather outputs from both rule-based engines and machine learning models and then propose a synthetic recommendation based on the client's specific business logic. Since rule based systems mimic humans' reasoning process, let’s dive deeper into how machine learning algorithms find fraudulent traits in online traffic.
There has been much hype around machine learning for the past few years, but certain tasks, like fraud detection, remain difficult even for many novel methods and techniques. Extreme class imbalance, concept drift, and expectations of full explainability of models’ predictions from business stakeholders are just some examples of common difficulties.
Fraudulent transactions tend to make up a tiny fraction of traffic. This poses a few challenges.
Datasets need to be bigger than usual due to the fact that fraudulent patterns are to be observed only in a small fraction of the data. Since most of the traffic is legitimate, models need to be carefully calibrated so as not to 'suffocate' the business by frequent false positives. These data characteristics disqualify a range of ML algorithms.
Gradient boosting methods tend to excel in such environments due to the feedback loop mechanism embedded in the algorithm. During the iterative process of training, the algorithm 'focuses' on the parts of data where it was previously wrong- this mechanism is a good solution to class imbalance.
Fraudsters play a constant “cops and robbers” game with companies working on fraud prevention software. Their toolset is growing and when a new security measure becomes a new industry standard, they quickly adapt to the situation and find new ways of being efficient at their activities. This calls for frequent retraining of ML models - one trained a year ago may not address the fraud patterns found in newer data samples.
All things considered, ML models are superior to rules based system for several reasons.
Maintaining a complex rule engine with hundreds of interdependent rules that express constantly changing fraud patterns isn’t easy, and it's definitely not scalable. In contrast, ML-based solutions scale automatically via cloud service providers - the only difference in cost between processing 1k and 100k transactions is the figure on the invoice from your cloud service provider. Data scientists or machine learning engineers need to do exactly the same job provided they use proper tools and automate repetitive tasks like retraining models or data collection.
Concept drift is less troublesome for ML-based solutions. In rule based engines, changes in fraud patterns call for manual recalibration of rules and the creation of new ones, which are a research result. This is manual work that can’t be easily automated. In comparison, ML models require rerunning the training on new data samples and coming up with new features that would capture the change in detected phenomena described as concept drift. Retraining can be easily automated, so, again, ML models prove to be more effective cost-wise.
Today, you can attend an online bootcamp that teaches you how to effectively commit fraud the same way one might attend an online course to learn programming. This means that obvious fraud patterns, expressed by rule based engines that haven’t evolved as much as ML in recent years, will be swiftly bypassed by modern fraud methods. In light of this, automatic fraud pattern detection that comes with ML models is a necessity rather than a luxury.
Many modern-day ML algorithms work as ensembles (e.g. random forest, gradient boosting). This means that, under the hood, algorithms create numerous separate classifiers that are trained independently on different data subsets, learning slightly different things about fraud patterns. When deployed, they vote on the score for every transaction, solving the problem of bias. If a fraudster is coming from another part of the world and is half the age of the analyst who composes the rules, the bias transferred from analyst to code can create a gateway for fraudsters coming from different backgrounds. Ensembles partially alleviate this single point of failure.
Rule based systems hold a strong advantage over ML models in terms of explainability. In such systems, there is little ambiguity over why a certain transaction was blocked. Some ML algorithms work as black boxes - there is no easy way of saying why it returned a certain value for a certain input. Fortunately, most fraud detection datasets are imbalanced and made of structured data - this means that algorithms that utilize decision trees work really well.
Predictions of such models can be easily explained using packages like ELI5 (which stands for “Explain Like I'm 5”) that enable us to see which transaction traits contribute to its likelihood of being fraudulent just like in rule based systems. Even if the algorithm is not tree-based, there are many tools that try to demystify the internal workings of those black boxes. XAI, which stands for 'Explainable Artificial Intelligence', is a new field that gained a lot of attention recently due to the fact that many real-world applications of ML models demand explainability.
In this piece, we tried to outline the main differences between rule-based engines and machine learning models. As stated above, the best setup should contain both since they are not mutually exclusive. Each of the methods has its pros and cons, but it looks like the future belongs to machine learning, complemented by rule based systems. One way of looking at this is treating the machine learning model as just another rule in a rule-based engine - it’s just a bit smarter, that’s all.
We can help you fight fraud with high precision, thanks to our AI-powered fraud prevention solution. Contact us to learn more.