Yes, something else as engineer you need to worry about. This is something that software developers are familiar with but the concept has started to cripple into automation and managing infrastructure. As a infrastructure engineers cloud and on-prem we need to understand how it can benefit us and make life easier.
What is concurrency? The wikipedia definition is In computer science, concurrency is the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units. This means that even if the concurrent units of the program, algorithm, or problem are executed out-of-order or in partial order, the final outcome will remain the same. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems.
Does not help correct? So lets think about this way, lets say you are writing some code and you see an important email come in you stop coding and start answering the email but now an important phone call comes that needs you full attention and you focus on that call.
After you are done with the call you go back keep working on the program and by the end of the day you remember to finish replying to the email. So concurrency is a way to keep track of all the task at hand but not be able to do two task at the same time.
This scenario usually happens in computer land when there is only one CPU core is available. However, when a computer has multiple core available then parallelism come into place with a combination with concurrency the computer program can execute multiple tasks at the same.
Enough talk lets look at this in practice and see why languages like Golang https://golang.org/ are worth learning and understating
Scenario: You need to download 10 1GB into a computer to later on do later computation.
We are going to test how long it takes to download 10 files using Bash
We are going to test how long it takes to download 10 files using Python
Finally we are going use Golang to download the same 10 files and see if it is really faster.
4 CPU Intel Xeon E5-2676 v3 2.4Ghz
16GB of ram
Instance hosted in AWS M4.xlarge
Total Time 5 minutes 2 seconds
Total time 5 minutes and 5 seconds
Go without concurrency
Total time 4m 49 seconds
As we can see with other programming languages or serial execution it takes around 5 minutes to download 10 1GB files
GO with concurrency
Total time 2 minutes and 13 seconds
The total time of execution was less than half of other programs. Lets see what happens when we increase the number of files to 20 and 30
|LANGUAGE||10 Files||20 files||30 files|
|Bash||5m 2s||9m 58s||21m 3s|
|Python||5m 5s||9m 30s||19m 45s|
|GO no concurrent||4m 49s||9m 37s||19m 15s|
|GO concurrent||2m 13s||4m 36s||6m 12s|
SERIAL CODE CPU USAGE
GO CONCURRENCY CPU USAGE
Now that you have see why you should start looking into concurrency these are a couple of ideas where you can start using in automation
- File processing when you need move files from point A to point B
- Collect information from systems
- Process end of the day files from systems
- AWS SDK GO support
- Works on Windows, MAC, Linux
- Talk to APIs
- Create cli tools
Lastly, I only spent $0.08 doing this experimentation using AWS spot intances