Mangopay Blog

Machine learning models vs. rule based systems in fraud prevention

Written by Jakub Karczewski | Nov 30, 2023 5:13:00 PM

 

The task of detecting fraudulent online payments is a perfect use case for applying machine learning models that thrive in environments where data volume is high, and the characteristics of fraudulent transactions cannot be easily detected using only a handful of features.

Nonetheless, many fraud prevention systems still rely on hard-coded rule based systems that consolidate the aggregate knowledge of fraud experts. In this piece, we will shed light on the main differences between the two approaches and which use cases fit one or the other better.

A crucial cog in the machine - the decision engine

Fraud detection systems are a lot more than just serialized machine learning models or sets of rules expressed in code using many “if-else” statements. There are a lot of other engineering challenges in various areas, including infrastructure, backend, and front-end programming. Those challenges tend to differ a bit depending on the chosen decision engine and business sector, but they are not the main topic of this blog post. Here, we will focus on just one crucial piece, a single cog in the machine - the decision engine that determines whether the transaction is fraudulent or not.

Rules based systems

As the name suggests, those systems rely on hard-coded rules that are set to flag transactions if they meet certain criteria. Such rules can be developed by:

  • following industry best practices - like blocking multiple transactions from a single account in a short period of time or the ones coming through VPNs or from risky areas,
  • analyzing caught/prevented fraudulent transactions and developing new rules to cover all of their suspicious characteristics.

The rules are often expressed using “if-else” statements present in almost all imperative programming languages and are easily interpretable. They mirror the way in which a human would process a transaction — the engine checks if a transaction meets any of the risky patterns expressed in the rules, and if it does, it blocks it or sends it to be manually reviewed by humans. This is one of the reasons why their presence is still very strong - stakeholders trust them because they mimic how they tackle this task.

Advantages of rule based systems

  • Full explainability out of the box - if a certain rule triggered an alert for a particular transaction, it’s 100% transparent why this happened.
  • There is no cold start problem - they are operational from day 1, so there’s no need to gather training datasets required for machine learning algorithms.
  • Low threshold of entry - you don’t need a team of data scientists and machine learning engineers - first rules can be easily implemented by the backend team since they are already familiar with translating business logic into code.

Disadvantages of rule based systems

  • Continuous need for reverse engineering fraudsters’ attacks - new rules have to be developed as new fraud patterns emerge.
  • Incremental number of rules - the cost of maintenance grows in time for recalibration and adjustments to new fraud patterns.
  • Detection of fraud cases with limited complexity - there is a limit to the number of rules and transactions’ features. Rule based systems are limited by human comprehension due to manual development of rules and maintenance.

Machine learning models in fraud prevention

ML models address the shortcomings of rule based systems. They thrive in environments where the volume and dimensionality of data are high. Algorithms like decision trees, random forests, gradient boosting, or neural networks are designed to find complex, nonlinear patterns utilizing hundreds (if available) features of transactions. Such an approach demands a shift in focus.

For one, deploying ML models requires high-quality, labeled historical data used as a training dataset. The more data you have—in terms of the number of transactions and features capturing transactions’ characteristics—the better the model will perform. In such a scenario, we are trying to keep a record of past transactions with a detailed description in the form of a feature vector rather than trying to directly understand the fraud phenomenon.

Advantages of ML models

  • Automatic fraud pattern recognition - the algorithm handles the task of figuring out what makes an event fraudulent. Our task is to provide it with a detailed description in the form of a feature vector.
  • Concept drift, defined as a change in fraud characteristics in time (new fraud methods, new tools used by fraudsters), often can be solved by retraining the models on new data — there’s no need to reverse engineer fraudsters’ methods.
  • Less manual work involved - many of the processes can be automated. Companies that have mature machine learning pipelines spend most of their time on researching new features and algorithms while keeping an eye on performance metrics of current models available through monitoring apps.
  • ML models’ economic efficiency grows along with data volume. The more data you have and the more complex it is, the harder it is to develop rule based systems. Thus, the return on developing automated fraud detection using ML models increases as data volume increases.

Disadvantages of ML models

  • Cold start problem - to run ML models, you need a significant amount of historical data.
  • Lack of explainability out of the box - not all algorithms’ predictions can be easily explained, some of them are 'black boxes' for which there are no easy explanations between inputs and outputs.

ML models deep dive

Most modern fraud prevention systems function as hybrid solutions that gather outputs from both rule-based engines and machine learning models and then propose a synthetic recommendation based on the client's specific business logic. Since rule based systems mimic humans' reasoning process, let’s dive deeper into how machine learning algorithms find fraudulent traits in online traffic.

Challenges

There has been much hype around machine learning for the past few years, but certain tasks, like fraud detection, remain difficult even for many novel methods and techniques. Extreme class imbalance, concept drift, and expectations of full explainability of models’ predictions from business stakeholders are just some examples of common difficulties.

Class imbalance

Fraudulent transactions tend to make up a tiny fraction of traffic. This poses a few challenges.

Datasets need to be bigger than usual due to the fact that fraudulent patterns are to be observed only in a small fraction of the data. Since most of the traffic is legitimate, models need to be carefully calibrated so as not to 'suffocate' the business by frequent false positives. These data characteristics disqualify a range of ML algorithms.

Gradient boosting methods tend to excel in such environments due to the feedback loop mechanism embedded in the algorithm. During the iterative process of training, the algorithm 'focuses' on the parts of data where it was previously wrong- this mechanism is a good solution to class imbalance.

Concept drift

Fraudsters play a constant “cops and robbers” game with companies working on fraud prevention software. Their toolset is growing and when a new security measure becomes a new industry standard, they quickly adapt to the situation and find new ways of being efficient at their activities. This calls for frequent retraining of ML models - one trained a year ago may not address the fraud patterns found in newer data samples.

Machine learning models vs rule based systems

All things considered, ML models are superior to rules based system for several reasons. 

Efficiency

Maintaining a complex rule engine with hundreds of interdependent rules that express constantly changing fraud patterns isn’t easy, and it's definitely not scalable. In contrast, ML-based solutions scale automatically via cloud service providers - the only difference in cost between processing 1k and 100k transactions is the figure on the invoice from your cloud service provider. Data scientists or machine learning engineers need to do exactly the same job provided they use proper tools and automate repetitive tasks like retraining models or data collection.

Automatic adaptation via retraining

Concept drift is less troublesome for ML-based solutions. In rule based engines, changes in fraud patterns call for manual recalibration of rules and the creation of new ones, which are a research result. This is manual work that can’t be easily automated. In comparison, ML models require rerunning the training on new data samples and coming up with new features that would capture the change in detected phenomena described as concept drift. Retraining can be easily automated, so, again, ML models prove to be more effective cost-wise.

Automatic detection of fraud patterns

Today, you can attend an online bootcamp that teaches you how to effectively commit fraud the same way one might attend an online course to learn programming. This means that obvious fraud patterns, expressed by rule based engines that haven’t evolved as much as ML in recent years, will be swiftly bypassed by modern fraud methods. In light of this, automatic fraud pattern detection that comes with ML models is a necessity rather than a luxury.

Power of ensembles

Many modern-day ML algorithms work as ensembles (e.g. random forest, gradient boosting). This means that, under the hood, algorithms create numerous separate classifiers that are trained independently on different data subsets, learning slightly different things about fraud patterns. When deployed, they vote on the score for every transaction, solving the problem of bias. If a fraudster is coming from another part of the world and is half the age of the analyst who composes the rules, the bias transferred from analyst to code can create a gateway for fraudsters coming from different backgrounds. Ensembles partially alleviate this single point of failure.

Explainability

Rule based systems hold a strong advantage over ML models in terms of explainability. In such systems, there is little ambiguity over why a certain transaction was blocked. Some ML algorithms work as black boxes - there is no easy way of saying why it returned a certain value for a certain input. Fortunately, most fraud detection datasets are imbalanced and made of structured data - this means that algorithms that utilize decision trees work really well.

Predictions of such models can be easily explained using packages like ELI5 (which stands for “Explain Like I'm 5”) that enable us to see which transaction traits contribute to its likelihood of being fraudulent just like in rule based systems. Even if the algorithm is not tree-based, there are many tools that try to demystify the internal workings of those black boxes. XAI, which stands for 'Explainable Artificial Intelligence', is a new field that gained a lot of attention recently due to the fact that many real-world applications of ML models demand explainability.

Conclusions

In this piece, we tried to outline the main differences between rule-based engines and machine learning models. As stated above, the best setup should contain both since they are not mutually exclusive. Each of the methods has its pros and cons, but it looks like the future belongs to machine learning, complemented by rule based systems. One way of looking at this is treating the machine learning model as just another rule in a rule-based engine - it’s just a bit smarter, that’s all.

We can help you fight fraud with high precision, thanks to our AI-powered fraud prevention solution. Contact us to learn more.