I have a habit of writing excessively long bash one-liners well beyond the threshold of it making more sense to write a script. Chaining commands and transformations in the shell feels like a natural transcription of my “Yes, and” thought process. Last week, I wrote a one-liner that I had strong and mixed feelings about. The words “abhorrent” and “elegant” pop into my mind every time I look at it.

Why?

Last Friday, I was asked to slap together a quick monitoring system to compare the CPU utilization of individual YARN containers on two servers, one with CGroups and one without. I wanted to set it up quickly, to collect metrics over the weekend.

How?

I started with finding a way to sample the CPU utilization of all processes on a server.

top -b -n 1

That was easy, but it only shows the name of the binary. Almost all YARN applications will have their application ID as part of their command line. Maybe I can get top to print the command line.

top -b -n 1 -c

Warmer… top truncates the output to match the TTY’s width. There should be some way to override it.

top -b -n 1 -c -w 10000

It looks like top has a hard cap of 512 on the width of output. Increasing the -w parameter beyond that does nothing. I might have to do something a little more drastic here. Time to get awk involved. I should start by removing the header in the output.

top -b -n 1 | awk 'NR > 7 {print $0}'

Now I can pull just the fields I want.

top -b -n 1 | awk 'NR > 7 {print $9, $1}'

Yes, and maybe I can fetch the cmdline of the process using the PID inside awk.

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline")}'

Looks like /proc/PID/cmdline has null-terminated tokens. I could replace \0 with space. This is turning into an escape sequence nightmare.

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline | tr \'\0\' \' \' ")}'

Nope, doesn’t work. Maybe…

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline | tr \"\0\" \" \" ")}'

After consulting the elder scrolls, I think I should just turn everything into hex codes.

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline | tr \x27\x5c\x30\x27 \x27 \x27 ")}'

The output looks a lot like what I want, though line endings are mangled. I should replace the print with a printf, and add a newline after the system() call. I can pull the app ID out of this, but piping the output of that grotesque tr to grep isn’t a good idea. I should use grep.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline");print ""}'

Looks like the appname can show up multiple times in the cmdline of a single process. Some processes also exit before grep can get to their cmdline. I should clean those up.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null

At this point, I was starting to get impatient and decided to prioritize getting it done fast over-optimizing it. For example, some lines are empty, so drop them using grep.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]"

Many PIDs don’t have an app ID, either because they’re not application processes, or because they’re the small subset of processes that don’t have the application ID in cmdline. For the sake of making tabulation easier, maybe I should populate that column for these rows with something.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk '{if(NF != 3){$3 = "NA"}} {print $0}'

If I’m using awk there, might as well drop processes with no CPU utilization.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk '{if(NF != 3){$3 = "NA"}; if($1!=0){print $0}}'

If I’m using awk there, might as well format it to match the output of a prom exporter.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk '{if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}'

Oh god…

At this point, I had a terrible thought… What if I just write the output of this commadn to index.html and start a python simplehttp server? I was quickly put off the idea by the need to use python. But wait… HTTP is practically TCP, with some bits and pieces added… What if…

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk 'BEGIN {printf("HTTP/1.1 200 OK\n\n")} {if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}'

That looks like a proper HTTP response… I wonder if I can pipe that to netcat and curl it…

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk 'BEGIN {printf("HTTP/1.1 200 OK\n\n")} {if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}' | nc -lp 1301

curl 127.0.0.1:1001

That works… I could put this in a loop.

while true; do top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk 'BEGIN {printf("HTTP/1.1 200 OK\n\n")} {if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}' | nc -lp 1301; done

I’ll just start this in a tmux session and tell prom to scrape it.

Confession

Going over the whole thing again as I wrote this, I’ve decided that I’m okay with this. It was an ad-hoc thing that I needed to set up quickly. It took 10 minutes to write and got the job done.

TODO: add more rationalizations