MQTT was originally designed to work with telemetry data over unreliable connections.
Generally with telemetry data some data loss is acceptable although undesirable.
In this article I want to discuss message delivery in detail and look at various configurations to improve message reliability.
As stated previously MQTT was designed for sending telemetry data over insecure connections.
The data sending was triggered by a polling mechanism from a central server.
Because of the bandwidth available at that time and the end devices themselves messages would be short.
In a polling mechanism if the data sender doesn’t receive a poll it doesn’t send and it must wait for a poll until it does.
The data would not necessarily be lost but could be overwritten with fresh data.
A loss of network would result in gaps in data which was generally acceptable.
In any case the application would need to take this into consideration.
There are really only two categories of client
- Simple Client
- Complex Client
For a simple client data can be collected directly from the client. For example an Arduino with connected temperature sensors.
In this case it would not be possible to buffer data in the event of a network or broker failure and MQTT QOS 2 may not even be possible.
Alternatively a complex client could be a Node-red flow that retrieves data from a modbus server(s) and sends it to the broker.
In this case it would be possible to buffer data in the event of a network or broker failure.
QOS and Data Buffering
MQTT overs three QOS levels in order to improve message delivery. They are:
- QOS 0 – Once (not guaranteed)
- QOS 1 – At Least Once (guaranteed)
- QOS 2 – Only Once (guaranteed)
Although at first glance setting the QOS to 1 and 2 should more or less guaranteed message delivery it is not always the case and in many cases it may give unwanted results.
Network and Broker Failures
The diagram below shows MQTT clients connected to a broker. The broker can be local or cloud based.
On a MQTT connection between a client and a broker there are two points of failure.
- The network
- The broker
Generally the network for a local broker would be more reliable than that of a cloud based broker.
Detecting a Network or Broker Failure
A network failure can be detected almost immediately in some circumstances, but usually relies on the keep alive mechanism which is 60 secs by default.
So a network failure should definitely be detected in 60*1.5 =90secs.
If we want it shorter then we decrease the keep alive interval or use a different detection mechanism outside of MQTT.
Once we are aware of a network failure we then need to decide on a course of action.
There are three course of action possible
- Don’t Buffer
- Switch Broker
They can of course be combined e.g. buffer and switch.
Before you can decide on the best course of action then you need to take into account the actual data and the frequency of the data (message rate).
- Is the data real Time data?
- How is it being used?
- Are data gaps acceptable
You also need to take into account the network/broker downtime which is basically:
- Short duration (seconds)
- Long Duration (several minute to hours)
Short Duration Outages
These are adequately catered for by using a QOS of 1 or 2.
However a very high message frequency could potentially cause a problem.
Long Duration Outages
With a long duration outage buffering messages isn’t usually feasible, and can be counter productive.
Broker availability can be increased by using clustering, and all of the major broker suppliers provide this capability.
However a high degree of broker availability can be provided by having the client use multiple brokers.
Not only can this technique be used to overcome a broker outage it can also overcome network outages if the brokers are reached over a different networks. (cloud based brokers).
Broker switching is a good solution for real time data that cannot be buffered and some data loss is acceptable.
However this techniques requires that message consumers are also aware of this and use multiple brokers.
Multiple Simultaneous Brokers
As with broker switching the idea is for the sender and receiver to use multiple brokers.
However this time messages are sent to both (or all) brokers and the receiver will receive from all brokers.
This means that the receiver will receive duplicate messages and so the receiver needs to take this into account.
High Availability Broker Clusters
These facility is provided by almost all brokers vendors.
Mosquitto broker clustering is available from Cedalo who are the company behind mosquitto.
This can be used for both edge brokers and cloud based brokers.
You should note that brokers clusters offer no protection against network failure.
Although MQTT provides varies QOS levels to provide reliable message delivery it is not a simple matter of enabling QOS of 1 or 2 and the problem is solved.
In fact enabling QOS of 1 or 2 may cause more problems than it solves.
Having a knowledge of the data you are sending and the consumers receiving that data is crucial for designing an MQTT network for reliable message delivery.
This tutorial attempted to give an overview of potential problems arising from broker and network failure.
As such I would welcome feedback based on your own experience to use as scenarios.
I will not publish the comment but extract the scenario from it .
Related Tutorials and Resources
- Monitoring MQTT Brokers
- MQTT- Which QOS should you Use? 0,1,2
- MQTT Keep Alive Interval Explained With Examples