Generate, Monitor and Throttle CPU Load in Go

Generate, Monitor and Throttle CPU Load in Go

The Rise of Nowhere Man

I originally wanted to write a Go vs. Rust vs. Python piece, shouted-out about it on Linkedin - But then saw in comments - and rightfully so - that there was already some prose dealing about it on the internets. I thought as I was going to focus on the "Systems" Programming abilities of these languages, my post was relevant. But then I realized I had a fresher, more personal, and presumably more vibrant story to share with the world.

I am what we can modestly qualify as an "architect" (not the J2EE kind) in the IT world. I don't mean the previous statement to be a testimony of any merit of any type, just spelled out the closest corporate job title by which the engineer that has to make it all work together goes by. It is no honor badge; it's a nightmare-ish situation. But a quite enjoyable one, as long as you consider yourself a free, always willing to learn individual. A "Nowhere Man."

By the very essence of this personal and professional trait, you can imagine my life as a succession of discoveries, akin to the Sinbad tales: At every stage of your journey, you'd be caught in the middle of a storm which will throw you into the shore of an unknown land. You know the story: you fight your way across an abundance of monsters and a scarcity of resources, only to set sails for the next storm. The "Chronicles of Nowhere Man" is following the same pattern: a stranded engineer (whom I prefer not to name) learning, and enjoying the shipwreck.

The first installment of the "Chronicles of Nowhere Man" series will narrate our stay at "Systems" Land. Let's explore it.

What is a System?

"System" is an umbrella term that means a lot of different things to many people. But I think we agree a decent definition for it would be "a concept, a design by which we deliberately make many smaller components interact to achieve a greater purpose." In things ranging from the "Immune System" to the "Solar System" - passing by the "Operating System," you can see that definition in work.

Keeping this definition in mind (and the keywords "design," "components," "interact," "greater purpose"), you can more or less guess what do I imply when using "systems programming."

"System programming" goes beyond proudly affirming you can program in Go or Rust (or C for that matter), and it certainly involves threads inter-communication, asynchronous job execution, and global state management. It can even include architectural skills, especially in today's cloud-natively orchestrated world.

Systems Land

I was stranded in "Systems Land" when I was sailing with a US-based Cybersecurity solutions provider. Our solutions were shiningly promising to bring AI to the Cybersecurity world, detecting security holes in files (This part was NLP/Regexp), and identifying Cyber attackers attempting to sneak into your network (This part wasn't).

As the architect on board, I was assigned the task of creating some "honey pot" mechanism. That is, I had to provide for lures that would look appealing to attackers, so when they try to break in, we would recognize and neutralize them. And then I've seen that there were many levels at which "Systems Designing" or "Programming can be approached."

The Architectural Systems Level

At A Global or Architectural level, we had to make a "System" of communicating honeypots, reporting malicious activity, so upstream firewalls can keep the attackers out. There was an orchestrator that could manage the pots and get back their findings. Locally, each honey pot had a local sate store and was empowered to react if it could not reach the orchestrator. We've designed an efficient and asynchronous message passing from the Honeypots and the orchestrator. We created the global "Architectural System" as a whole tolerant to network failures.

The Systems Programming Level

At a local level, these "honey pots" needed to look real. They had to act like actual software, showing actual "Systems usage" consumption, that is, a decently sized amount of memory, disk, and CPU usage, all parametrized by the domain experts. It is a reflection we had at the Systems Programming level.

The honeypots were written in GO. The disk and memory usage generation was straight forward, so let's focus on the CPU part.

We generate The CPU Load by running a tight infinite loop. At each iteration of this loop, we verify if we attained our target CPU load. When we hit the CPU Load threshold, we "relieve" this loop by parking the process:

// generate load on each of the cores
for i := 0; i < min(cores, runtime.NumCPU()); i++ {
   go func() {
    for { // The tight loop generates the Load like crazy !
                cpuLoad := CpuUsagePercent(samplingRate, debug)
                if cpuLoad >= targetCPU { // If Load threshold attained
                   waitTime := time.Second / time.Duration(WaitFactor)
                   time.Sleep(waitTime) // We hit the brakes pedal
                }
             }
   }
}

How does CPUUsagePercent Work? In a nutshell, we use time.h interface on Unices or the "psapi.h" interface on Windows to compute the number of clock seconds spend on the current process (cpu_time), which we divide by the whole time this process(real_time) has been working. The gotcha here was that time.h would compile on windows, but the results would not be relevant, so you have to watch out! I share a non copyrighted version here, but the gist of it for *nix/macOS:

package main

// #include <time.h>
import "C"

import (
    "log"
    "time"
)

var startTime = time.Now()
var startTicks = C.clock()
var WaitFactor = 1

// if samplingRate = 0 then we continue with initial startTime and startTicks
func CpuUsagePercent(samplingRate float64, debug bool) float64 {

    clockSeconds := float64(C.clock()-startTicks) / float64(C.CLOCKS_PER_SEC)
    realSeconds := time.Since(startTime).Seconds()

    if debug {
        log.Printf(" current clock  : %v, real seconds: %v, startTicks: %v", clockSeconds, realSeconds, startTicks)
    }

    if samplingRate > 0 && realSeconds >= samplingRate {
        startTime = time.Now()
        startTicks = C.clock()
        if debug {
            log.Printf("Resetting starts !! startTime : %v, startTicks: %v", startTime, startTicks)
        }
    }
    return clockSeconds / realSeconds * 100

}

And on Windows:

package main

/*
#cgo LDFLAGS: -lpsapi
#include <windows.h>
#include <psapi.h>
#include <time.h>
double get_cpu_time(){
    FILETIME a,b,c,d;
    if (GetProcessTimes(GetCurrentProcess(),&a,&b,&c,&d) != 0){
        //  Returns total user time.
        //  Can be tweaked to include kernel times as well (c).
        return
        (
            (double)(d.dwLowDateTime |
            ((unsigned long long)d.dwHighDateTime << 32)) // user time
        ) * 0.0000001;
    }else{
        //  Handle error
        return 0;
    }
}
*/
import "C"

import (
    "log"
    "time"
)

var startTime = time.Now()
var startTicksTime = C.get_cpu_time()
var WaitFactor = time.Duration(C.CLOCKS_PER_SEC)

// if samplingRate = 0 then we continue with initial startTime and startTicks
func CpuUsagePercent(samplingRate float64, debug bool) float64 {

    clockSeconds := float64(C.get_cpu_time() - startTicksTime)
    realSeconds := time.Since(startTime).Seconds()

    if debug {
        log.Printf(" current clock  : %v, real seconds: %v, startTicks: %v", clockSeconds, realSeconds, startTicksTime)
    }

    if samplingRate > 0 && realSeconds >= samplingRate {
        startTime = time.Now()
        startTicksTime = C.get_cpu_time()
        if debug {
            log.Printf("Resetting starts !! startTime : %v, startTicks: %v", startTime, startTicksTime)
        }
    }
    return clockSeconds / realSeconds * 100

}

You probably noticed the WaitFactor Param. It was the best solution I found to my knowledge to let the program perform the closest to the target CPU threshold! I let the external honey pot orchestrator set this param after a manual or automatic trial and error feedback loops (mostly manual to be honest)

And in the End

We've covered a lot of ground in this first adventure. First, we've seen what a "System" means, in terms of components, interaction, and a great purpose to achieve. Then, we've seen that "Systems" includes two levels: A Global one, which we called "Architectural Systems Level" making it possible for the whole to collaborate and synchronize, providing for resiliency and scalability And a local one, which we called the "Systems Programming Level," where we might need to interact by code with low-level components as we've seen with the CPU piece. In the end, being an architect might involve you knowing, beyond architectural design patterns, how to interact with low-level systems components. This adventure made me learn more about CPU cycles counting and interacting with it by code surely added to my infrastructure and monitoring skills. Where we'll be wandering next?