diff --git a/sql/hpr.sql b/sql/hpr.sql index 004f906..3519370 100644 --- a/sql/hpr.sql +++ b/sql/hpr.sql @@ -20475,7 +20475,8 @@ INSERT INTO `eps` (`id`, `date`, `title`, `duration`, `summary`, `notes`, `hosti (3984,'2023-11-09','Whoppers. How Archer72 and I made moonshine. Volume one.',1730,'Sgoti assists Archer72 with his crazy plan to make moonshine.','

What is a whopper?
\nan\nextravagant or monstrous lie
\na\nbig lie
\n

\n

A work of Fiction\nis any creative work, chiefly any narrative work, portraying\nindividuals, events, or places that are imaginary or in ways that are\nimaginary.
\n

\n\n

The Bureau of Alcohol, Tobacco, Firearms and Explosives (BATFE),\ncommonly referred to as the ATF, is a domestic law enforcement agency\nwithin the United States Department of Justice.
\n

\n\n',391,0,1,'CC-BY-SA','Whoppers, Moonshine, Archer72',0,0,1), (3985,'2023-11-10','Bash snippet - be careful when feeding data to loops',1644,'A loop in a pipeline runs in a subshell','
\n

Overview

\n

Recently Ken Fallon did a show on HPR, number\n3962, in which he used a Bash\npipeline of multiple commands feeding their output into a\nwhile loop. In the loop he processed the lines produced by\nthe pipeline and used what he found to download audio files belonging to\na series with wget.

\n

This was a great show and contained some excellent advice, but the\nuse of the format:

\n
pipeline | while read variable; do ...
\n

reminded me of the \"gotcha\" I mentioned in my own show\n2699.

\n

I thought it might be a good time to revisit this subject.

\n

So, what\'s the problem?

\n

The problem can be summarised as a side effect of pipelines.

\n

What are pipelines?

\n

Pipelines are an amazingly useful feature of Bash (and other shells).\nThe general format is:

\n
command1 | command2 ...
\n

Here command1 runs in a subshell and produces output (on\nits standard output) which is connected via the pipe symbol\n(|) to command2 where it becomes its\nstandard input. Many commands can be linked together in this\nway to achieve some powerful combined effects.

\n

A very simple example of a pipeline might be:

\n
$ printf 'World\nHello\n' | sort\nHello\nWorld
\n

The printf command (≡\'command1\') writes two\nlines (separated by newlines) on standard output and this is\npassed to the sort command\'s standard input\n(≡\'command2\') which then sorts these lines\nalphabetically.

\n

Commands in the pipeline can be more complex than this, and in the\ncase we are discussing we can include a loop command such as\nwhile.

\n

For example:

\n
$ printf 'World\nHello\n' | sort | while read line; do echo "($line)"; done\n(Hello)\n(World)
\n

Here, each line output by the sort command is read into\nthe variable line in the while loop and is\nwritten out enclosed in parentheses.

\n

Note that the loop is written on one line. The semi-colons are used\ninstead of the equivalent newlines.

\n

Variables and subshells

\n

What if the lines output by the loop need to be numbered?

\n
$ i=0; printf 'World\nHello\n' | sort | while read line; do ((i++)); echo "$i) $line"; done\n1) Hello\n2) World
\n

Here the variable \'i\' is set to zero before the\npipeline. It could have been done on the line before of course. In the\nwhile loop the variable is incremented on each iteration\nand included in the output.

\n

You might expect \'i\' to be 2 once the loop exits but it\nis not. It will be zero in fact.

\n

The reason is that there are two \'i\' variables. One is\ncreated when it\'s set to zero at the start before the pipeline. The\nother one is created in the loop as a \"clone\". The expression:

\n
((i++))
\n

both creates the variable (where it is a copy of the one in the\nparent shell) and increments it.

\n

When the subshell in which the loop runs completes, it will delete\nthis version of \'i\' and the original one will simply\ncontain the zero that it was originally set to.

\n

You can see what happens in this slightly different example:

\n
$ i=1; printf 'World\nHello\n' | sort | while read line; do ((i++)); echo "$i) $line"; done\n2) Hello\n3) World\n$ echo $i\n1
\n

These examples are fine, assuming the contents of variable\n\'i\' incremented in the loop are not needed outside it.

\n

The thing to remember is that the same variable name used in a\nsubshell is a different variable; it is initialised with the value of\nthe \"parent\" variable but any changes are not passed back.

\n

How to avoid the\nloss of changes in the loop

\n

To solve this the loop needs to be run in the original shell, not a\nsubshell. The pipeline which is being read needs to be attached to the\nloop in a different way:

\n
$ i=0; while read line; do ((i++)); echo "$i) $line"; done < <(printf 'World\nHello\n' | sort)\n1) Hello\n2) World\n$ echo $i\n2
\n

What is being used here is process\nsubstitution. A list of commands or pipelines are enclosed with\nparentheses and a \'less than\' sign prepended to the list\n(with no intervening spaces). This is functionally equivalent to a\n(temporary) file of data.

\n

The redirection feature allows for data being read from a\nfile in a loop. The general format of the command is:

\n
while read variable\n    do\n       # Use the variable\n    done < file
\n

Using process substitution instead of a file will achieve what is\nrequired if computations are being done in the loop and the results are\nwanted after it has finished.

\n

Beware of this type of\nconstruct

\n

The following one-line command sequence looks similar to the version\nusing process substitution, but is just another form of pipeline:

\n
$ i=0; while read line; do echo $line; ((i++)); done < /etc/passwd | head -n 5; echo $i\nroot:x:0:0:root:/root:/bin/bash\ndaemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin\nbin:x:2:2:bin:/bin:/usr/sbin/nologin\nsys:x:3:3:sys:/dev:/usr/sbin/nologin\nsync:x:4:65534:sync:/bin:/bin/sync\n0
\n

This will display the first 5 lines of the file but does it by\nreading and writing the entire file and only showing the first 5 lines\nof what is written by the loop.

\n

What is more, because the while is in a subshell in a\npipeline changes to variable \'i\' will be lost.

\n

Advice

\n\n

Tracing pipelines (advanced)

\n

I have always wondered about processes in Unix. The process you log\nin to, normally called a shell runs a command language\ninterpreter that executes commands read from the standard input or\nfrom a file. There are several such interpreters available, but we\'re\ndealing with bash here.

\n

Processes are fairly lightweight entities in Unix/Linux. They can be\ncreated and destroyed quickly, with minimal overhead. I used to work\nwith Digital Equipment Corporation\'s OpenVMS operating system\nwhich also uses processes - but these are much more expensive to create\nand destroy, and therefore slow and less readily used!

\n

Bash pipelines, as discussed, use subshells. The description\nin the Bash man page says:

\n
\n

Each command in a multi-command pipeline, where pipes are created, is\nexecuted in a subshell, which is a separate process.

\n
\n

So a subshell in this context is basically another child\nprocess of the main login process (or other parent process), running\nBash.

\n

Processes (subshells) can be created in other ways. One is to place a\ncollection of commands in parentheses. These can be simple Bash\ncommands, separated by semi-colons, or pipelines. For example:

\n
$ (echo "World"; echo "Hello") | sort\nHello\nWorld
\n

Here the strings \"World\" and \"Hello\", each\nfollowed by a newline are created in a subshell and written to standard\noutput. These strings are piped to sort and the end result\nis as shown.

\n

Note that this is different from this example:

\n
$ echo "World"; echo "Hello" | sort\nWorld\nHello
\n

In this case \"World\" is written in a separate command,\nthen \"Hello\" is written to a pipeline. All\nsort sees is the output from the second echo,\nwhich explains the output.

\n

Each process has a unique numeric id value (the process id\nor PID). These can be seen with tools like ps or\nhtop. Each process holds its own PID in a Bash variable\ncalled BASHPID.

\n

Knowing all of this I decided to modify Ken\'s script from show\n3962 to show the processes being created - mainly for my interest,\nto get a better understanding of how Bash works. I am including it here\nin case it may be of interest to others.

\n
#!/bin/bash\n\nseries_url="https://hackerpublicradio.org/hpr_mp3_rss.php?series=42&full=1&gomax=1"\ndownload_dir="./"\n\npidfile="/tmp/hpr3962.sh.out"\ncount=0\n\necho "Starting PID is $BASHPID" > $pidfile\n\n(echo "[1] $BASHPID" >> "$pidfile"; wget -q "${series_url}" -O -) |\\n    (echo "[2] $BASHPID" >> "$pidfile"; xmlstarlet sel -T -t -m 'rss/channel/item' -v 'concat(enclosure/@url, "→", title)' -n -) |\\n    (echo "[3] $BASHPID" >> "$pidfile"; sort) |\\n    while read -r episode; do\n\n        [ $count -le 1 ] && echo "[4] $BASHPID" >> "$pidfile"\n        ((count++))\n\n        url="$( echo "${episode}" | awk -F '→' '{print $1}' )"\n        ext="$( basename "${url}" )"\n        title="$( echo "${episode}" | awk -F '→' '{print $2}' | sed -e 's/[^A-Za-z0-9]/_/g' )"\n        #wget "${url}" -O "${download_dir}/${title}.${ext}"\n    done\n\necho "Final value of \$count = $count"\necho "Run 'cat $pidfile' to see the PID numbers"
\n

The point of doing this is to get information about the pipeline\nwhich feeds data into the while loop. I kept the rest\nintact but commented out the wget command.

\n

For each component of the pipeline I added an echo\ncommand and enclosed it and the original command in parentheses, thus\nmaking a multi-command process. The echo commands write a\nfixed number so you can tell which one is being executed, and it also\nwrites the contents of BASHPID.

\n

The whole thing writes to a temporary file\n/tmp/hpr3962.sh.out which can be examined once the script\nhas finished.

\n

When the script is run it writes the following:

\n
$ ./hpr3962.sh\nFinal value of $count = 0\nRun 'cat /tmp/hpr3962.sh.out' to see the PID numbers
\n

The file mentioned contains:

\n
Starting PID is 80255\n[1] 80256\n[2] 80257\n[3] 80258\n[4] 80259\n[4] 80259
\n

Note that the PID values are incremental. There is no guarantee that\nthis will be so. It will depend on whatever else the machine is\ndoing.

\n

Message number 4 is the same for every loop iteration, so I stopped\nit being written after two instances.

\n

The initial PID is the process running the script, not the login\n(parent) PID. You can see that each command in the pipeline runs in a\nseparate process (subshell), including the loop.

\n

Given that a standard pipeline generates a process per command, I was\nslightly surprised that the PID numbers were consecutive. It seems that\nBash optimises things so that only one process is run for each element\nof the pipe. I expect that it would be possible for more processes to be\ncreated by having pipelines within these parenthesised lists, but I\nhaven\'t tried it!

\n

I found this test script quite revealing. I hope you find it useful\ntoo.

\n

Links

\n\n\n\n\n
\n',225,42,1,'CC-BY-SA','Bash,loop,process,shell',0,0,1), (3992,'2023-11-21','Test recording on a wireless mic',223,'Archer72 tests out a wireless mic with a USB C receiver','

LEKATO 2\nPack Wireless Microphone with Charging Case

\n

https://www.amazon.com/gp/product/B0C4SNT6QK

\n\n

Claims

\n\n

Axet Audio recorder on\nF-Droid

\n

https://f-droid.org/packages/com.github.axet.audiorecorder

\n\n',318,0,0,'CC-BY-SA','Recording, Microphone, Wireless, USB \'C\', F-droid, Android App',0,0,1), -(4221,'2024-10-07','HPR Community News for September 2024',0,'HPR Volunteers talk about shows released and comments posted in September 2024','',159,47,1,'CC-BY-SA','Community News',0,0,1); +(4221,'2024-10-07','HPR Community News for September 2024',0,'HPR Volunteers talk about shows released and comments posted in September 2024','',159,47,1,'CC-BY-SA','Community News',0,0,1), +(4241,'2024-11-04','HPR Community News for October 2024',0,'HPR Volunteers talk about shows released and comments posted in October 2024','',159,47,1,'CC-BY-SA','Community News',0,0,1); /*!40000 ALTER TABLE `eps` ENABLE KEYS */; UNLOCK TABLES; @@ -21399,4 +21400,4 @@ UNLOCK TABLES; /*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */; /*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */; --- Dump completed on 2023-11-03 8:39:21 +-- Dump completed on 2023-11-04 10:15:51