MATLAB is commonly seen as a scripting language for scientific computing as opposed to a ‘proper’ programming language used in software engineering but experienced MATLAB programmers know that the language has capabilities reaching far beyond matrix computation and graph plotting, and it can be used to create fairly sophiscated software. One aspect that MATLAB falls short on, however, is the provision for parallel programming (not to be confused with the provision for multithreaded computation). This post discusses a method to implement a multithread parallel-executing software using existing MATLAB functionalities.
What parallel support does MATLAB already have?
Some common functions, listed here, automatically benefits from multithreaded computation even without the parallel computing toolbox. Those who have tried to generate C++ code from MATLAB and run it in compiled form may find that the compiled code is sometimes slower, for this reason.
For users with the parallel computing toolbox, a list of functions has automatic parallel support allowing them to run in parallel without extra programming. These are mostly toolbox functions to do things such as training neural networks.
Common tasks such as repeating certain calculations sweeping a parameter space are easily achieved by using parfor. In many cases, this is easily achieved by replacing a for loop with a parfor loop.
MATLAB provides a table on choosing an existing parallel computing solution here.
So what is lacking?
MATLAB’s existing provision of parallel computing is probably sufficient for the needs of 99% of the users. However, as readers can tell at this stage, the existing parallel computing support is mainly focused on processing existing data, and not on supporting needs where a software program must be written to have several processes with different functionalities running in parallel, often for indefinite times. This could be, for example, software that runs multiple closed-loop control cycles of different pieces of hardware while allowing the user to adjust the set point interactively. Another example would be data acquisition software which must communicate to many items of instruments on a variety of protocols and frequencies simultaneously to stream the data while handling tasks such as data computation, display, and storage at the same time. Writing such software using only one thread is extremely challenging if not impossible, even when the processing is entirely interrupt-driven, mainly for three reasons:
- The performance is not as good as parallel software. Suppose software is required to run 10 PID control loops simultaneously, and the dominating loop latency is in the software as opposed to, for example, in the sensors or actuators, then a parallel software can potentially reduce the closed loop time delay by a factor of 10. As another example, suppose a data acquisition software needs to communicate with two instruments A and B. A responds within 1 ms after receiving the command, while B, being an older instrument, responds in around 300 ms but with an uncertainty of 100 ms. A cycle to acquire data from A and then B is obviously limited by the latency of B, and the performance of A is largely wasted.
- For data acquisition applications, it may be necessary to obtain a continuous and evenly spaced data-trace. Consider the second example given above, even if the single-threaded software is optimised to continuously acquire data from A while waiting for B to respond, the data-trace of A will not be uniform as there will be pauses to collect data from B. For instruments without internal or external clock supplies, accurately timed software triggers from the controlling computer may be essential.
- Single-threaded software may be less robust than parallel software especially when interfacing with hardware which can behave unexpectedly. In the above examples, the malfunctioning of a piece of hardware or a communication channel (e.g. a WiFi glitch) may cause a single-threaded software to crash, hang, or lose its rhythm. In safety-critical applications requiring a high degree of robustness, the software must build extensive error checking and timeout checking which comes at the price of overall performance. On the other hand, parallel software does not tend to fail should a process crash or hang, if they are programmed to function independently.
The remainder of this post series offers an example method to implement parallel software in MATLAB, with each process programmed individually and executed independently, and offers solutions to reliably handle the transferral of (even large amounts of) data.
The parallel software architecture
To implement the parallel software, we use MATLAB’s parfeval function (parallel function evaluation) which schedules a function to run in a parallel pool.
Given that a process is essentially a running function, we just need to encapsulate the code of the process in a MATLAB function which can run indefinitely, in the form of a ‘while true’ loop. This is very similar to the programming of microcontrollers such as Arduino. It is straightforward, for example, to program an Arduino to run a PID loop or to continuously take measurements from a sensor. We break the overall task in hand to small, manageable individual elements, each of which we can write a ‘while true’ loop to handle.
We break the software into two parts. First, the ‘worker’ processes which are functions individually written to perform the tasks that need to be done in parallel, each containing a ‘while true’ loop. These functions are sent to the MATLAB parallel workers (many instances of the MATLAB intepreter) to run in parallel, by invoking the parfeval function. Second, we have a ‘housekeeper’ process, which runs in the main MATLAB session (the one to which we have command line access). This process has access to all the worker processes and needs to handle tasks such as communications and worker lifecycle management. We do not put any regular workload on this housekeeper process as the main MATLAB session needs to be kept idle so that the user can interact with the housekeeper process, which, in turn, controls the worker processes.
Three challenges need to be solved:
- We need a method to identify which function is which, in the large pool of functions scheduled to run. The functions may be identical but servicing different hardware or processes.
- We need a method to establish two-way communication with the functions.
- We need a method to transfer or share large volumes of data, often in real-time.
In the second part of this post, we will be addressing the three challenges in turn using example code. We will also discuss how to integrate this parallel programming with the object-oriented programming capabilities of MATLAB.
One thought on “[072EN] The implementation of a multithread software in MATLAB – Part 1: Introduction”