Tag: UNIX/Linux

SO – how safe is it to parse the output of a CLI command?

The original question was actually “Detect wireless devices connected to Raspberry Pi in python on Linux”, from fragon, here.

To its fullest extent, the question was this:

“In my python code I need to get the list of “physical” WiFi network devices connected to Raspberry Pi

I’ve been doing this by calling:

raw_output = check_output(‘iw dev’, shell=True)

and then extracting all the data I need from raw_output

It works ok, but in iw help it says that Do NOT screenscrape this tool, we don’t consider its output stable. Is it really unsafe to get this data the way I did it? If yes, what is the correct way to do this?”

That is a classic broader question that one probably asked to itself at one time or another as this: “how safe/OK is this to parse the output of a CLI command?”

And my answer have been this (see here for full answer with pro/cons of alternate ways):

What is meant by “Do NOT screenscrape this tool, we don’t consider its output stable” is that as new releases of iw will be made, the output formating may change. So the developers of iw warn you that if you write software depending on the parsing of its output, it may break on future releases of iw.

Take the example of the venerable ifconfig command. For many many years, its output used to be formated like so:

 eth0 Link encap:Ethernet HWaddr 00:80:C8:F8:4A:51
 inet addr: Bcast: Mask:
 RX packets:190312 errors:0 dropped:0 overruns:0 frame:0
 TX packets:86955 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:100
 RX bytes:30701229 (29.2 Mb) TX bytes:7878951 (7.5 Mb)
 Interrupt:9 Base address:0x5000

And though it was considered stable (even deprecated and unmaintained by some), it changed a couple of years ago and now looks like this:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet netmask broadcast
inet6 fe80::8e89:a5ff:fe57:103c prefixlen 64 scopeid 0x20<link>
ether 8c:89:a5:57:10:3c txqueuelen 1000 (Ethernet)
RX packets 2219946 bytes 3178868967 (2.9 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1241676 bytes 102998523 (98.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

…so let’s say I did some soft which look at the MAC address by searching the string following “HWaddr”. Nowadays it would be broken, because it should look for the string following “ether” instead.

But as long as you don’t update iw, or perform regular testing of you work, you should not encounter any problem.

It is anyway always inherently a bit fragile to parse the output of a third part tool, you just have to be aware of it. For instance, the output may depend on the LOCALE setup by the user. Real life example, some scripting I did with the output of ifconfig failed on some users environment. Root cause: here is what the output look like in French locale:

eth0 Lien encap:Ethernet HWaddr 00:FF:F2:58:32:A1
Packets reçus:0 erreurs:0 :0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 lg file transmission:1000
Octets reçus:0 (0.0 b) Octets transmis:0 (0.0 b)
Interruption:23 Adresse de base:0x2000

Notice the French “Packets reçus”, “erreurs”, and “Octets reçus” instead of “RX packets”, “errors”, and “RX bytes”.



> Is it really unsafe to get this data the way I did it?

Not really. You just have to keep in mind that your software depends on the output strings of some third part software that is somewhat out of your control and may change in the future. That will be regular testing and maintenance job for you, nothing tragic, that’s software life.

> If yes, what is the correct way to do this?

Again, “no”, but if you want to be bulletproof to that: do not depend on the textual output of third part software. This usually involve writing your own code to replace these tools, which can be quite a task. And if to do so, you use some third part libraries, well, library API change over time too… :-)

Now, I added some addendum to this, related to the OP context, and the cost vs benefits of alternate ways – again, see my original post about this. But you get the substance of my answer here.