My worst one-liner yet

I have a habit of writing excessively long bash one-liners well beyond the threshold of it making more sense to write a script. Chaining commands and transformations in the shell feels like a natural transcription of my “Yes, and” thought process. Last week, I wrote a one-liner that I had strong and mixed feelings about. The words “abhorrent” and “elegant” pop into my mind every time I look at it.

Why?

Last Friday, I was asked to slap together a quick monitoring system to compare the CPU utilization of individual YARN containers on two servers, one with CGroups and one without. I wanted to set it up quickly, to collect metrics over the weekend.

How?

I started with finding a way to sample the CPU utilization of all processes on a server.

top -b -n 1

That was easy, but it only shows the name of the binary. Almost all YARN applications will have their application ID as part of their command line. Maybe I can get top to print the command line.

top -b -n 1 -c

Warmer… top truncates the output to match the TTY’s width. There should be some way to override it.

top -b -n 1 -c -w 10000

It looks like top has a hard cap of 512 on the width of output. Increasing the -w parameter beyond that does nothing. I might have to do something a little more drastic here. Time to get awk involved. I should start by removing the header in the output.

top -b -n 1 | awk 'NR > 7 {print $0}'

Now I can pull just the fields I want.

top -b -n 1 | awk 'NR > 7 {print $9, $1}'

Yes, and maybe I can fetch the cmdline of the process using the PID inside awk.

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline")}'

Looks like /proc/PID/cmdline has null-terminated tokens. I could replace \0 with space. This is turning into an escape sequence nightmare.

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline | tr \'\0\' \' \' ")}'

Nope, doesn’t work. Maybe…

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline | tr \"\0\" \" \" ")}'

After consulting the elder scrolls, I think I should just turn everything into hex codes.

top -b -n 1 | awk 'NR > 7 {print $9, $1, system("cat /proc/"$1"/cmdline | tr \x27\x5c\x30\x27 \x27 \x27 ")}'

The output looks a lot like what I want, though line endings are mangled. I should replace the print with a printf, and add a newline after the system() call. I can pull the app ID out of this, but piping the output of that grotesque tr to grep isn’t a good idea. I should use grep.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline");print ""}'

Looks like the appname can show up multiple times in the cmdline of a single process. Some processes also exit before grep can get to their cmdline. I should clean those up.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null

At this point, I was starting to get impatient and decided to prioritize getting it done fast over-optimizing it. For example, some lines are empty, so drop them using grep.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]"

Many PIDs don’t have an app ID, either because they’re not application processes, or because they’re the small subset of processes that don’t have the application ID in cmdline. For the sake of making tabulation easier, maybe I should populate that column for these rows with something.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk '{if(NF != 3){$3 = "NA"}} {print $0}'

If I’m using awk there, might as well drop processes with no CPU utilization.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk '{if(NF != 3){$3 = "NA"}; if($1!=0){print $0}}'

If I’m using awk there, might as well format it to match the output of a prom exporter.

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk '{if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}'

Oh god…

At this point, I had a terrible thought… What if I just write the output of this commadn to index.html and start a python simplehttp server? I was quickly put off the idea by the need to use python. But wait… HTTP is practically TCP, with some bits and pieces added… What if…

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk 'BEGIN {printf("HTTP/1.1 200 OK\n\n")} {if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}'

That looks like a proper HTTP response… I wonder if I can pipe that to netcat and curl it…

top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk 'BEGIN {printf("HTTP/1.1 200 OK\n\n")} {if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}' | nc -lp 1301

curl 127.0.0.1:1001

That works… I could put this in a loop.

while true; do top -b -n 1 | awk 'NR>7 {printf("%d %d ",$9,$1);system("grep --text -o \x27application_[0-9]*_[0-9]*\x27 /proc/"$1"/cmdline | head -n1");print ""}' 2>/dev/null | grep "^[0-9]" | awk 'BEGIN {printf("HTTP/1.1 200 OK\n\n")} {if(NF != 3){$3 = "NA"}; if($1!=0){print "adhoc_yarn_cpu{app=\""$3"\",pid=\""$2"\"} "$1}}' | nc -lp 1301; done

I’ll just start this in a tmux session and tell prom to scrape it.

Confession

Going over the whole thing again as I wrote this, I’ve decided that I’m okay with this. It was an ad-hoc thing that I needed to set up quickly. It took 10 minutes to write and got the job done.

TODO: add more rationalizations

2021

Back to Top ↑

2020

My worst one-liner yet

5 minute read

I have a habit of writing excessively long bash one-liners well beyond the threshold of it making more sense to write a script. Chaining commands and transfo...

Back to Top ↑

2019

Back to Top ↑

2018

Home setup part 3: IWS

5 minute read

I have a strange list of requirements, and a limited amount of hardware to satisfy them with. I needed: a Windows desktop for windows only software and ga...

Home setup part 2: The Matrix

3 minute read

A couple of weeks after moving into my apartment, I got a 100Mbps connection from Dsouza cable network, some local ISP that I had never heard of before. At f...

Home setup part 1: The Oasis

2 minute read

For the lat couple of months, I’ve been spending my weekends setting up my home PC, network and other infrastructure. Over this series of blog posts, I will ...

Back to Top ↑

2017

Back to Top ↑

2016

Headless access on Pine64

1 minute read

Quickly setting up headless access on linux SBCs like the pine64 This is a quick guide to enabling headless VNC access on the pine64 using USB serial.

My github blag

less than 1 minute read

My Github Blag I’ll mostly be posting how-tos on things that took me a long time to figure out, in case I need to do them again

Back to Top ↑