he parallel software execution is usually a topic related to advanced programming techniques. It is the broad-term concept behind a more familiar programming methodology called multi-threaded programming. The software applications based on this approach are often called multi-thread applications or MTA. The idea of a program doing several tasks in parallel comes from the design and architecture of the operating systems (OS). Their main core application called Kernel is responsible (amongst many other tasks) for ensuring that all applications running in the context of the OS (CPU) can do their computing and will not obstruct each other. This requirement is partly necessary due to the fact that the applications that we run on our computer are written by different software companies and usually don’t know anything about each other. The way to coordinate these applications in the use of the limited processor power, hard-drive capabilities, and network connection is to make sure their compiled code is given a chance to regularly use the computer resources. This usage “quota” (computing time interval) is managed by the operating system’s Kernel. Each application can work on its own without interfering in a malicious way with other applications running simultaneously on the same machine.

Parallel programming and OS multi-tasking

To solve the above-mentioned problem two general approaches have been developed over the last 20 years. The first one is the so-called non-preemptive multitasking and the other — the more advanced one — is called preemptive multitasking. With the first approach (non-preemptive) the OS Kernel relies on the fact that each application’s code is written with the “awareness” to share limited resources with other software. It needs to make sure that waiting periods are implemented (usually very small intervals lasting milliseconds) thus giving the OS the chance to free the resources for other applications. The non-preemptive approach is the simpler way, however, the disadvantage comes if the application programming code is written inappropriately so it doesn’t give back control to the OS and continues to use the computer resources aggressively. This issue has been evident in the early Windows (2.0–3.11): where if the user switches focus between applications, some applications in the background almost stop to work. On the other hand, the preemptive approach is based on the rule that the OS Kernel has the authority to “cut” the code execution of any program at any moment when the computer resources are needed. This ensures the proper functioning of vital applications like screen and keyboard drivers and OS core processes. The end-user can feel this improvement by having many applications opened and doing work by simply switching the screens and using each application at will. Windows 95 and later versions, Linux and iOS use exclusively the preemptive parallel execution model.

Multi-threading in the operating system

To allow the software developer to create multi-thread applications, the operating systems define the concept of the “thread”. A thread is a micro-program inside your program that is having its own execution life and cycle. The benefit is that programmers can code resource-intensive tasks (complex calculations and hard-drive read/write operations) in the context of a thread so the rest of the application can continue to run. The benefit is very evident when we use rich-UI/UX programs like photo editors, browsers, music, and video editing software. If we click to apply a photo filter in a photo editing program, the operation will take time, but our program can still display panels, menus and the status of the active tasks. Otherwise, we would see “frozen” user interface until the photo filter is completing its work. A similar problem is found in the so-called server applications — those processes that run hidden (no UI) in our computer or the server programs responsible to receive network requests (web servers, mail servers, etc.). Those programs need to be able to execute the code responsible for incoming network traffic while doing I/O operations using the storage drives. So by allowing the applications to create threads, the OS is making sure all our applications run smoothly at once. From an engineering point of view, the latter is a bit of an illusion though, because the OS is actually running on a single processor (CPU) and it just uses this processor swapping different threads for very small periods of time thus creating the illusion that everything is happening all at once. One huge improvement in the recent years where the multi-core processors (Intel’s dual-core, etc.) because they allowed more than one thread to run on a single CPU achieving true parallelism.

Threads in applications

From a programmer’s perspective, the threading approach is a refined way to handle the parallel tasks in a more confined and organised way. When we use threads we simply isolate part of our code to run in parallel as a small program or function inside the whole application. The small piece of code is said to have its own “context” — the state of the current execution point (the next code statement to be run on the CPU). The context allows for the OS Kernel to easily “stop-restart” the thread execution very quickly. It also gives the software developer the illusion that the code is executed in parallel. The alternative is to have one single thread responsible for all functionalities (UI, I/O, network), but then the programmer needs to create the architecture in such a way that only small chunks of code are executed for a limited amount of time in order to ensure the application responsiveness. It is the very broadly discussed “parallel vs concurrent” dilemma: having a single thread and a complex code, or many (sometimes too many) threads with simpler code but with the risk of “choking” the operating system to a halt. The good multi-thread application code design requires a fine balance between these extreme scenarios.

Challenges with synchronisation

Another aspect of multi-threading programming (this is the one making it sometimes look complex and “bad”) is the fact that software developers need to be aware which objects or allocated memory the thread will access during its code execution. If any other threads are accessing, and especially modifying these resources, it might become a problem. To solve the problem the OS provides synchronisation objects (tokens, events, semaphores). These elements can be used in our code so when one thread is executed, others can wait for it to complete before accessing a shared application resource. When the code is implemented properly the synchronisation between the threads works in perfect order. But if it is not — we are having the infamous “deadlock” issue, in which the application stops to respond, because the thread that handles the UI is waiting for another thread that is expecting some input from the UI. Another issue is the “race” condition, which occurs when two threads need to complete tasks in parallel before a third thread grabs the results and continues. If the two threads are not synchronised, the result is not valid before the code of the third thread uses it.

“Tricks” to handle multi-threading complexity

One handy design to tackle the threading complexity is to use a thread pool. As the name would suggest this is the approach, in which the application creates a limited number of threads (usually 4 to 10) and uses those to run small chunks of multi-thread code by switching consecutively threads in the pool. The advantage comes from only 2–4 threads being constantly engaged — so the multi-core CPU will handle these in a true parallel mode thus giving the maximum possible speed of code execution. It is also quicker from the OS perspective to have the threads created during application startup because the creation/deletion of a thread is a relatively time-expensive process for the OS Kernel. This approach is extensively used in .NET and Java managed execution platforms.

When a cloud-based server software architecture is discussed, the issue with multi-threading programming is an absolutely essential design decision that usually shapes the way the application will work and how the code or script will be written. There are two general approaches — a true but risky multi-thread design where each incoming request is placed in its own thread and the script inside is not interfering with the script in other requests. This is how the PHP implementations work and to a certain point, this way is preferable. But the problem comes when most of those incoming requests are trying to access the same server resources or usually the database — then the threads will have to wait for each other for the use of limited resource. This will seriously diminish the effect of the multi-thread design. A good middle ground is offered by Node.js where all the script is executed in a single thread, but the I/O and database operations can be done in an underlying multi-thread execution so those time-consuming operations are actually optimised by parallel code execution.