Estimating Transaction Processing Times with Bayesian Methods

At mx51, we are obsessed with the merchant experience. This doesn’t stop with our first-class payment applications, it also includes proactively supporting merchants to get the best out of their payments systems. I’m sure the experience of awkwardly standing in front of a payment terminal, waiting for the confirmation of a successful transaction is familiar to a lot of people. The merchant invariably breaks the silence to complain about the constant slow processing times they experience.

To identify merchants with slow terminals, I created a Bayesian linear model to estimate the processing time of payment terminals. In this post I will go through the process for creating this model, I will also explain why I chose a Bayesian model over the more popular frequentist option. I won’t delve into Bayes’ theorem as there is already a treasure trove of content available on this subject, including this excellent 15-minute video from 3Blue1Brown.


Before looking at the data, I wanted to design a system that would allow us to identify a terminal that is processing transactions outside of a normal range. To keep the system simple, I initially fitted a linear regression before moving to Bayesian methods. Fitting a linear regression ignores most of the volatility in the transaction times but would provide two parameters of interest:

  • A constant term (a), and
  • A coefficient (b).
y = a + bx

The constant term represents the performance of the terminal at the start of the period (the y intercept). If this performance is outside a predefined band, it means the terminal is underperforming. The coefficient term tells us the direction and velocity of the change in transaction times; a high coefficient would mean the transaction times are degrading over time.

The Data

The below Python code generates transaction time data for 100 terminals each with 1-120 data points. Each data point represents the median transaction time per hour over a 120-hour (5-day) period. I have also simulated missing data points and outliers to make the problem more realistic.

import numpy as np

def generate_data(number_of_terminals):
    Generate data representing terminal transaction times.
    Each data point is the transaction time in milliseconds.
    data = {}
    for i in range(number_of_terminals):
        # create exponential transaction time data
        _scale = int(np.random.exponential(1, 1)) + 1
        _offset = abs(np.random.normal(1000, 1000))
        terminal_data = np.random.exponential(_scale, 120) * 1000 + _offset
        # simulate missing data points
        _p = np.min([abs(np.random.normal(0.5, 0.2)), 1.0])
        missingness_filter = np.random.binomial(n = 1, p = _p, size = 120)
        # simulate outliers
        _p = abs(np.random.normal(0.01, 0.001))
        outliers_filter = np.random.binomial(n = 1, p = _p, size = 120) * 10 + 1
        # apply filters
        terminal_data = terminal_data * missingness_filter * outliers_filter
        # add data to dictionary
        data[f'terminal_{i}'] = terminal_data
    return data

# generate data for 100 terminals
terminal_data_list = generate_data(100)

Randomly sampling from an exponential distribution will produce a stationary time series. The only reasons the time series will be non-stationary are:

  • The influence of outliers, or
  • A small sample size.

Ideally, a statistical system would ignore both outliers and small sample sizes.

The code above outputs a dictionary, and before modelling the data we need to turn this into a pandas data frame and add a new hour column.

import pandas as pd
# create data frame
df = pd.DataFrame(terminal_data_dic)
# create long dataset
df = df.melt()
# convert 0 values into NaN
df.loc[df['value']==0, 'value'] = np.NaN
# rename columns
# add hour column 1 - 120
df['HOUR'] = df.groupby('TERMINAL').cumcount()
df['HOUR'] = df['HOUR'] + 1

The result of the data generation process is below.

Density plot of data generated

Figure 1: Shows the density of all the terminal data. It shows a typical exponential distribution for time series data. The majority of the terminals are clustered around ~2 seconds and there is a long tail. This is pretty close to what the real data looks like.

Sample of terminal data

Figure 2: Shows four terminals I have picked out to compare between the normal linear regression model and the Bayesian linear regression model. I have chosen these terminals to show some of the problems Bayesian methods overcome, particularly small sample sizes and outliers.

Linear Regression Model

It’s time to fit a model to the data. I’m starting with a simple linear regression.

The Python function below fits a simple linear regression to the data, where the transaction time is simply a function of the hour. The model learns the best intercept and coefficient to reduce the overall error.

# Model function
transaction speed = intercept + coefficient x hour
import statsmodels.api as sm
import statsmodels.formula.api as smf

def glm_model(df):
    Fit a linear GLM model to the data.
    Return the constant term and the coefficient of x.
    # formula
    formula = 'TRANSACTION_SPEED ~ HOUR'
    # fitting the model
    result = smf.glm(formula=formula, data=df).fit()
    # return the constant and coefficient point estimates
    return {
        'CONSTANT': result.params[0],
        'COEFFICIENT': result.params[1]

# running the model
glm_results = {}
for terminal in terminals:
    glm_df = df[df['TERMINAL'] == terminal]
    glm_model_df = glm_df.dropna()
    glm_results[terminal] = glm_model(glm_model_df)

Linear Regression Results

OLS Results

Figure 3: Results of the linear regression.

The results of the linear regression look reasonable, but I would like to look at terminal 28 and 68 closer. These two terminals highlight two problems with the linear model.

  • The linear regression model needs a sufficient amount of data to make accurate estimates. The estimate for terminal 28 looks accurate, but is an estimate based on less than 10 data points. Small sample sizes are likely to result in extreme parameter estimations, as we seen here.
  • The linear regression model is heavily influenced by outliers and a single data point is enough to change the coefficient completely, as seen in terminal 68.

Both of these problems can be overcome with Bayesian statistics.

Bayesian Linear Regression Model

To solve the problems with the linear regression model, we can estimate its parameters using Bayesian statistics.

Bayesian methods use data we pass to them to update a prior belief that we provide to the model. For example, if we believe transaction times are clustered around 3 seconds we can give the model that information. If we then had a terminal that gave us a single transaction time of 20 seconds, this belief would be updated to a higher value, perhaps ~4 seconds. If that terminal continued recording 20 second transactions, after receiving enough evidence the Bayesian estimation would eventually converge to 20 seconds.

Simply put, with Bayesian statistics, our assumption that it takes ~3 seconds to process a transaction is unaltered until we receive enough evidence to say otherwise.

By starting with a prior belief about the data, this process is also resilient to outliers and small transaction volumes.

To fit the Bayesian model, I have used the bambi package, which is built on top of PyMC3 and provides the framework for the Bayesian multilevel linear regression out of the box.

import arviz as az
import pymc3 as pm
import theano.tensor as tt
import bambi

model = bambi.Model("TRANSACTION_SPEED ~ HOUR + (HOUR|TERMINAL)", df.dropna())
results =

I have used the default values for this model, but it is possible to tune the model by changing the distribution and parameters for the priors.

Bayesian Linear Regression Results

Bayesian GLM Results

Figure 4: Results of the Bayesian linear regression

In figure 4, we can see that the problems with the standard linear model have been resolved by using Bayesian estimation for the two parameters. Additionally, due to the method the PyMC3 package uses to converge to an answer (Markov chain Monte Carlo) we get the full distribution of expected results, which are shown in green on the charts.


To show the benefits of Bayesian methods I purposely created a stationary time series and then added outliers and removed data points to simulate real-world data. Despite these adjustments the underlying time series is still stationary. The results, shown in the table below show that the coefficient for the normal linear regression was non-stationary for two of the four terminals. Where in all cases the Bayesian linear regression estimated coefficients are consistent with a stationary time series.

By using a Bayesian linear regression model we are able to identify, with confidence, payment terminals with slow or degrading transaction times and ultimately proactively support merchants.

Terminal Parameter Regression Bayesian Regression
28 Intercept 1967.48 3115.95
  Coefficient 22.35 0.63
76 Intercept 1127.02 1282.41
  Coefficient 1.33 0.21
68 Intercept 2032.23 3955.31
  Coefficient 35.28 1.31
96 Intercept 6790.88 5507.28
  Coefficient -3.70 1.008