hpr_website/www/eps/hpr3413/hpr3413_full_shownotes.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="generator" content="pandoc">
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
  <meta name="author" content="Dave Morriss">
  <title>Bash snippet - using coproc with SQLite (HPR Show 3413)</title>
  <style type="text/css">code{white-space: pre;}</style>
  <!--[if lt IE 9]>
    <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
  <![endif]-->
  <link rel="stylesheet" href="http://hackerpublicradio.org/css/hpr.css">
</head>

<body id="home">
<div id="container" class="shadow">
<header>
<h1 class="title">Bash snippet - using coproc with SQLite (HPR Show 3413)</h1>
<h2 class="subtitle">Sending multiple queries to a running instance of sqlite3</h2>
<h2 class="author">Dave Morriss</h2>
<hr/>
</header>

<main id="maincontent">
<article>
<header>
<h1>Table of Contents</h1>
<nav id="TOC">
<ul>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#details">Details</a>
<ul>
<li><a href="#my-pending-queue">My ‘pending’ queue</a></li>
<li><a href="#more-about-coproc">More about <code>coproc</code></a>
<ul>
<li><a href="#the-coproc-command">The <code>coproc</code> command</a></li>
<li><a href="#simple-usage">Simple usage</a></li>
<li><a href="#gory-details">Gory details</a></li>
<li><a href="#coprocesses-in-gawk">Coprocesses in <code>gawk</code></a></li>
</ul></li>
<li><a href="#my-eventual-script-using-coproc">My eventual script using <code>coproc</code></a></li>
</ul></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#links">Links</a></li>
</ul>
</nav>
</header>
<h2 id="introduction">Introduction</h2>
<p>I am in the process of rewriting some scripts I use to manage <a href="http://magnatune.com/" title="Magnatune">Magnatune</a> albums. I’m a lifetime Magnatune member and have access to the whole music collection. I wrote a script for downloading albums and placing them in my <code>~/Music</code> directory which I talked about in 2013 (show <a href="http://hackerpublicradio.org/eps/hpr1204">1204</a>). The original scripts are still available on <a href="https://gitlab.com/davmo/magnatune-downloader">GitLab</a> and I know of one other person who made use of them!</p>
<p>Since 2013 I have written a few other support scripts, for example one to manage a queue of albums I want to buy and download, and one which summarises the state of this queue.</p>
<p>It’s this <code>'show_queue'</code> script I am currently updating (called <a href="hpr3413_show_queue_orig"><code>show_queue_orig</code></a>, and available in the resources to this show). The original version of this script took Magnatune album URLs from a file (acting as a queue of stuff I wanted to buy), parsed out a piece of the URL and used it to <code>grep</code> a pre-prepared summary in another file. This file of summaries had been made from a master XML file provided by Magnatune (see <code>update_albums</code> on GitLab).</p>
<p>Magnatune has moved away from this master XML file to a SQLite database in recent years, so I want to perform a database lookup for each URL to list its details.</p>
<p>The first version of the new script wasn’t difficult to write: just extract the search data as before and run a query on the database using this data. I have included this script which I call <a href="hpr3413_show_queue_db_1"><code>show_queue_db_1</code></a> amongst the resources for this episode, so you can see what I’m talking about – and what I want to improve on. It felt bad to be performing multiple calls on the <code>sqlite3</code> command in a loop, so I looked around for an alternative way.</p>
<p>In April 2019 <a href="http://hackerpublicradio.org/correspondents.php?hostid=311">clacke</a> did a show (number <a href="http://hackerpublicradio.org/eps/hpr2793">2793</a>) about the Bash <code>coproc</code> command.</p>
<p>This command creates a <em>subshell</em> running a command or group of commands which is connected to the calling (<em>parent</em>) process through two file descriptors (<em>FD</em>s). It’s possible for the calling shell to write to the input descriptor and read from the output one and thereby communicate with whatever is running in the subshell.</p>
<p>I was vaguely aware of <code>coproc</code> at the time of clacke’s show but hadn’t looked into it. I found the show fascinating but didn’t have a use for the feature at the time.</p>
<p>To solve my need to show my Magnatune queue of future purchases, it looked as if a <code>sqlite3</code> instance running in a subshell could be given queries one after the other and return the answers I needed. My journey to a Bash script using <code>coproc</code> then followed.</p>
<h2 id="details">Details</h2>
<h3 id="my-pending-queue">My ‘pending’ queue</h3>
<p>As I said, I have a file containing Magnatune URLs of the form:</p>
<pre><code>http://magnatune.com/artists/albums/antiqcool-acousticguitar</code></pre>
<p>The final component is the <a href="https://en.wikipedia.org/wiki/Stock_keeping_unit" title="Stock Keeping Unit"><code>'SKU'</code></a>, which is the key to the album in the system. In the original XML-based system I see the following example information when I run my current script:</p>
<pre><code>Artist: Antiqcool
Album:  Original Instrumental Acoustic Guitar Songs Vol 1
Genres: Alt Rock,Folk-Rock,Instrumental Rock
Code:   antiqcool-acousticguitar
----</code></pre>
<p>I store the URLs because that’s the form Magnatune uses to send album details in their periodic email messages about new albums. It’s easier to cut and paste them.</p>
<p>The original <code>show_queue</code> script just reads this queue file in a loop and looks up the SKU in a file of reformatted XML information. As mentioned, I have included this script for reference as one of the resources accompanying this show (<a href="hpr3413_show_queue_orig"><code>show_queue_orig</code></a>).</p>
<h3 id="more-about-coproc">More about <code>coproc</code></h3>
<p>In clacke’s HPR show 2793 he described the <code>coproc</code> command in some detail, with examples of how it behaves.</p>
<p>It is documented in a fairly terse fashion in the Bash Reference Manual (see link below).</p>
<h4 id="the-coproc-command">The <code>coproc</code> command</h4>
<p>In essence <code>coproc</code> runs a command as a subshell (or <em>coprocess</em>) with a two-way pipe connected to it. This allows the shell which generated it to write to and read from the coprocess.</p>
<p>The command format is:</p>
<pre><code>coproc [NAME] command [redirections]</code></pre>
<p>The syntax is a little strange to my way of thinking. The use of the name depends on the type of command. If it’s a <em>simple</em> command then no name can be provided, and the default of <code>COPROC</code> is used so you can only run one coprocess at a time. The alternative to a simple command is a command group enclosed in braces or parentheses, and the user-supplied name is used in this case, otherwise, if there’s no name <code>COPROC</code> is used as before.</p>
<p>The relevance of the name is that it is used to create variables relating to the coprocess. There’s a variable ‘<em>name</em><code>_PID</code>’ which holds the process ID number of the subshell (coprocess), and an array.</p>
<p>The two-way pipe for communicating with the coprocess is connected to two <em>file descriptors</em> in the executing shell. These are stored in an indexed array called <em>name</em>. Element zero contains the descriptor for the standard output of the coprocess, and element 1 the descriptor for standard input.</p>
<p>Note: I haven’t talked about Bash file descriptors in my Bash scripting shows but plan to do so before long.</p>
<h4 id="simple-usage">Simple usage</h4>
<p>Here’s an example of running the <code>date</code> command in a coprocess. Not a lot of point in doing this of course, but it might help explain things reasonably well:</p>
<pre><code>$ coproc date; cat &lt;&amp;&quot;${COPROC[0]}&quot;
[1] 40206
Thu 29 Jul 10:32:15 BST 2021
[1]+  Done                    coproc COPROC date</code></pre>
<ul>
<li>The <code>coproc</code> command is followed by the command to run - <code>'date'</code>. Then, after a semicolon the output from the coprocess is displayed with <code>cat</code>. We do this on the same line because the coprocess will finish very quickly and will delete the <code>COPROC</code> array, making it impossible to see the output.</li>
<li>The first line beginning <code>'[1]'</code> shows the PID of the coprocess (an examination of the process hierarchy should show this to be a subprocess of the calling process)</li>
<li>The next line shows the date returned from the coprocess via <code>cat</code></li>
<li>The second line beginning <code>'[1]'</code> shows that the coprocess has finished.</li>
</ul>
<p>If for any reason you have a coprocess that continues to run unexpectedly you can look for it with the Bash <code>'jobs'</code> command – this is job 1 in the above case, as shown by the <code>'[1]'</code> followed by the PID. You can kill the job with the command <code>'kill %1'</code>.</p>
<h4 id="gory-details">Gory details</h4>
<p>I have tried to learn the intricacies of <code>coproc</code> since deciding to use it, but can’t say that I fully understand all the details yet!</p>
<p>Using <code>coproc</code> for single line stuff is not too difficult, but seems quite pointless. However, things get a lot more complex when dealing with coprocesses that receive and send multiple lines of data.</p>
<p>The issues which complicate matters are:</p>
<ol type="1">
<li>It’s your coprocess so you know whether it expects input or not. You assume it’s receiving what you send by how it responds. There may be buffering problems that complicate this.</li>
<li>When you’ve finished sending stuff to the coprocess and you want to tell it it’s all done you must close the input file descriptor. You do this with the rather arcane command <code>'exec {NAME[1]}&gt;&amp;-'</code> where <code>'NAME'</code> is the name you assigned or was assigned for you (<code>'COPROC'</code>). This is an example of the operator described in the Bash Reference Manual under <a href="https://www.gnu.org/software/bash/manual/bash.html#Duplicating-File-Descriptors" title="Bash Reference Manual, 3.6.8 Duplicating File Descriptors">Duplicating File Descriptors</a>. Though the manual doesn’t explain how you use that operator with a FD held in an array! This subject is on the list for the <em>Bash Tips</em> series in due course.</li>
<li>You can’t really tell how much output to read from the coprocess, and you may be dealing with something that performs I/O buffering so it will hold on to its output unexpectedly. This can cause a <em>deadlock</em> if you get it wrong! A deadlock is where the parent and child processes are both waiting for the other process to do something.</li>
</ol>
<p>I have written a <code>coproc</code> example in the form of a Bash script. It’s called <code>coproc_test.sh</code>:</p>
<pre><code>#!/bin/bash -

#
# Use bash in the coprocess but turn off buffering
#
process=&#39;stdbuf -i0 -o0 -e0 bash&#39;

#
# Indexed array of bash commands
#
declare -a com=(&#39;date +%F&#39; &#39;whoami&#39; &#39;id&#39; &#39;echo &quot;$BASH_VERSION&quot;&#39;
    &#39;printf &quot;Hello\nWorld\n&quot;&#39;)

#
# Count commands in the array
#
n=&quot;${#com[@]}&quot;

#
# Start the coprocess
#
coproc child { $process; }

#
# Loop though the commands
#
i=0
while [[ $i -lt $n ]]; do
    # Show the command
    echo &quot;\$ ${com[$i]}&quot;

    # Send to coprocess
    echo &quot;${com[$i]}&quot; &gt;&amp;&quot;${child[1]}&quot;

    # Read a line from the coprocess
    read -u &quot;${child[0]}&quot; -r results

    # Show the line received
    echo &quot;$results&quot;

    ((i++))
done

#
# Send an EOF to the coprocess (if needed)
#
if [[ -v child_PID ]]; then
    echo &quot;-- End --&quot;
    exec {child[1]}&gt;&amp;-

    # Flush any remaining results
    cat &lt;&amp;&quot;${child[0]}&quot;
fi</code></pre>
<p>The key points are:</p>
<ul>
<li>The coprocess is running an instance of <code>bash</code> which expects commands and returns results</li>
<li>The command to be run is preceded by <code>'stdbuf -i0 -o0 -e0'</code> which turns off <b>all</b> buffering</li>
<li>Commands to be sent to it are in an array and are fed to it one at a time</li>
<li>I use a counter and a <code>while</code> loop to do this, but could just as well have used <code>for (( i = 1; i &lt; n; i++ ))</code>.</li>
<li>Each time a command is sent <strong>one line</strong> is read back (using a <code>read</code> command on the FD in array <code>child[0]</code>) and displayed</li>
<li>An <em>end of file</em> is sent to the coprocess by closing the input channel, but only if it’s still running</li>
<li>I use a <code>cat</code> command to flush any remaining output after closing the FD.</li>
<li>Most of the commands generate one line of output, but the last one: <code>'printf'</code> creates two. Only one is read in the loop, but the <code>cat</code> returns it after the input FD has been closed.</li>
<li>What would happen if a command was sent which produced no output? Try adding ‘:’ to the list of commands (this being a “null” command in Bash). The script will hang waiting for output that will never come. Adding a timeout to the <code>read</code> could be a way to avoid this problem.</li>
</ul>
<p>Running this script gives the following output:</p>
<pre><code>$ date +%F
2021-08-21
$ whoami
cendjm
$ id
uid=1000(cendjm) gid=1000(cendjm) groups=1000(cendjm),4(adm),24(cdrom),25(floppy),29(audio),30(dip),44(video),46(plugdev),108(netdev),110(lpadmin),114(bluetooth),115(scanner)
$ echo &quot;$BASH_VERSION&quot;
5.1.4(1)-release
$ printf &quot;Hello\nWorld\n&quot;
Hello
-- End --
World
</code></pre>
<p>Note that the two-line <code>printf</code> at the end gets its first line displayed in the loop, then the loop ends and the script detects that the coprocess is still running, writes <code>'-- End --'</code> and then flushes the remaining line.</p>
<p>Also note that the <em>job control</em> messages we saw in the simple example above are disabled by Bash when running <code>coproc</code> out of a script.</p>
<p>I’m not sure that this example shows anything useful, however. It seems more of a novelty than anything else!</p>
<h4 id="coprocesses-in-gawk">Coprocesses in <code>gawk</code></h4>
<p>In the spirit of enquiry I wrote a brief <code>gawk</code> script to do largely the same as the previous example. The coprocess features are not available in plain <code>awk</code>, they are a GNU extension. This script, called <code>coproc_test.awk</code> has been included in the resources for this show, where it can be downloaded.</p>
<p>I will not cover it any further in this show.</p>
<h3 id="my-eventual-script-using-coproc">My eventual script using <code>coproc</code></h3>
<p>I found out how to run <code>coproc</code> to do what I wanted, but I spent a long time working out how to do it. Looking back, I got an HPR show out of it (though I doubt whether anyone will thank me for it!), and answered my question, but it probably wasn’t worth it.</p>
<p>The eventual script is presented here and is a resource available for download (<code>show_queue_db_2</code>).</p>
<pre><code>#!/bin/bash -
#===============================================================================
#
#         FILE: show_queue
#
#        USAGE: ./show_queue
#
#  DESCRIPTION: Show the pending queue, expanding each album&#39;s details from
#               the database.
#
#               / This version calls sqlite3 once and feeds it queries in a loop /
#
#      OPTIONS: ---
# REQUIREMENTS: ---
#         BUGS: ---
#        NOTES: ---
#       AUTHOR: Dave Morriss (djm), Dave.Morriss@gmail.com
#      VERSION: 0.2.1
#      CREATED: 2020-09-15 12:38:03
#     REVISION: 2021-08-05 16:39:05
#
#===============================================================================

set -o nounset                              # Treat unset variables as an error

SCRIPT=${0##*/}
#DIR=${0%/*}

VERSION=&#39;0.2.1&#39;

#
# Files and directories
#
BASEDIR=&quot;$HOME/MusicDownloads&quot;
DATADIR=&quot;$BASEDIR/Magnatune_Data&quot;
SCRIPTDIR=&quot;$BASEDIR/magnatune-downloader&quot;

QUEUE=&quot;$SCRIPTDIR/pending&quot;
DB=&quot;$DATADIR/sqlite_normalized.db&quot;

#
# Sanity checks
#
[ -e &quot;$QUEUE&quot; ] || { echo &quot;$QUEUE not found&quot;; exit 1; }

#
# Check the queue contains data
#
if [[ ! -s $QUEUE ]]; then
    echo &quot;$SCRIPT($VERSION): there is nothing in the queue&quot;
    exit
fi

RE=&#39;^http://magnatune.com/artists/albums/([A-Za-z0-9-]+)/?$&#39;

#
# Template SQL for printf
#
SQLtemplate=$(cat &lt;&lt;ENDSQL1
SELECT
    ar.name AS Artist,
    al.name AS Album,
    group_concat(distinct ge.name) AS &#39;Genre&#39;,
    group_concat(distinct sg.name) AS &#39;Subgenre&#39;,
    al.sku as Code
FROM albums al
JOIN artists ar ON al.artist_id = ar.artists_id
JOIN genres_albums ga on al.album_id = ga.album_id
JOIN genres ge ON ge.genre_id = ga.genre_id
JOIN subgenres_albums sa on al.album_id = sa.album_id
JOIN subgenres sg ON sg.subgenre_id = sa.subgenre_id
GROUP BY al.album_id
HAVING sku = &#39;%s&#39;;
.print &#39;--------&#39;
ENDSQL1
)

#
# Start the coprocess
#
coproc dbproc { stdbuf -i0 -o0 sqlite3 -line &quot;$DB&quot;; }

#
# Read and report the queue elements.
#
n=0
while read -r URL; do
    ((n++))

    if [[ $URL =~ $RE ]]; then
        SKU=&quot;${BASH_REMATCH[1]}&quot;
    else
        echo &quot;Problem parsing URL in queue (line $n): $URL&quot;
        continue
    fi

    #
    # Build the query and write it to the coprocess
    #
    # shellcheck disable=SC2059
    printf &quot;$SQLtemplate\n&quot; &quot;$SKU&quot; &gt;&amp;&quot;${dbproc[1]}&quot;

done &lt; &quot;$QUEUE&quot;

#
# Close the input pipe (a file descriptor move documented as &#39;[n]&gt;&amp;digit-&#39;
# which &quot;moves the file descriptor digit to file descriptor n, or the standard
# output (file descriptor 1) if n is not specified&quot;. There is no digit here,
# so presumably /nothing/ is being moved to the file descriptor in dbproc[1].
#
exec {dbproc[1]}&gt;&amp;-

#
# Collect everything from the coprocess
#
cat &lt;&amp;&quot;${dbproc[0]}&quot;

# vim: syntax=sh:ts=8:sw=4:ai:et:tw=78:fo=tcrqn21</code></pre>
<p>Salient points are:</p>
<ol type="1">
<li>The query is stored in the variable <code>SQLtemplate</code> in the form of a format string for <code>printf</code>. This lets me substitute a <code>SKU</code> value each time it’s used. The string consists of a SQL query and a SQLite ‘dot’ command (<code>.print</code>) which I use to print a line of hyphens between each album summary.</li>
<li>The coprocess consists of a <code>sqlite3</code> command preceded by a <code>stdbuff</code> call which turns off all buffering.</li>
<li>In the loop which is reading the queue we generate a new query on each iteration and use <code>printf</code> to produce it and write it to the coprocess. We do not read back from the coprocess in the loop.</li>
<li>Once the loop is finished we close the input pipe then use <code>cat</code> to collect all that’s available from the coprocess and display it.</li>
</ol>
<p>Running this script gives the following output (truncated after the first 12 lines):</p>
<pre><code>  Artist = Antiqcool
   Album = Original Instrumental Acoustic Guitar Songs Vol 1
   Genre = Alt Rock
Subgenre = Folk,Instrumental New Age
    Code = antiqcool-acousticguitar
--------
  Artist = Mokhov
   Album = Jupiter Melodies
   Genre = Electronica
Subgenre = Electro Rock,Electronica,Instrumental Classical
    Code = mokhov-jupitermelodies
--------
</code></pre>
<p>I suspect that the strategy of feeding data to the coprocess in a loop but not reading from it until the loop has ended might be dangerous. I think this relies on the fact that the pipe will store data, but it’s not clear what limit there is on such storage. It’s possible that it’s fairly small, and that this script could fail if the queue was long.</p>
<p>It might be possible to avoid this problem by reading data from the coprocess on each loop iteration using <code>read</code> with a timeout. Detecting the timeout could possibly be used to determine that the output pipe is empty and it’s time to write to the input pipe again.</p>
<p>I have not tried this idea though - it feels very <em>clunky</em>!</p>
<h2 id="conclusion">Conclusion</h2>
<p>An interesting voyage, but:</p>
<ul>
<li>I still don’t fully understand what <code>'exec {NAME[1]}&gt;&amp;-'</code> means. I know what it does, but don’t get the syntax!</li>
<li>The general conclusion I find from various sources are:
<ul>
<li>named pipes are better</li>
<li>if you want to interact with a command use something like <a href="https://en.wikipedia.org/wiki/Expect"><code>expect</code></a> (I spent several years using <code>expect</code> and <code>expectk</code> in my job)</li>
<li>see the Stack Exchange reference below for more details</li>
</ul></li>
<li>For this application I can write a much simpler Perl script that connects to the SQLite database, prepares a query with a substitution point and repeats the call with different values without a coprocess (available as a resource with this episode with the name <code>show_queue.pl.zip</code>). Many other programming solutions are also available. I do not believe that this is a task for Bash.</li>
</ul>
<p>I’m in general agreement with clacke that Bash <code>coproc</code> is a feature looking for a use!</p>
<p>Feel free to show me the error of my ways!</p>
<h2 id="links">Links</h2>
<ul>
<li>Bash Reference Manual:
<ul>
<li><a href="https://www.gnu.org/software/bash/manual/bash.html#Coprocesses">3.2.6 Coprocesses</a></li>
<li><a href="https://www.gnu.org/software/bash/manual/bash.html#Duplicating-File-Descriptors">3.6.8 Duplicating File Descriptors</a></li>
</ul></li>
</ul>
<ul>
<li>Stack Exchange:
<ul>
<li><a href="https://unix.stackexchange.com/questions/86270/how-do-you-use-the-command-coproc-in-various-shells">How do you use the command coproc in various shells?</a></li>
</ul></li>
</ul>
<ul>
<li>Wikipedia:
<ul>
<li><a href="https://en.wikipedia.org/wiki/Expect">Wikipedia page on <code>expect</code></a></li>
</ul></li>
</ul>
<ul>
<li>Other links:
<ul>
<li><a href="http://hackerpublicradio.org/eps/hpr1204">hpr1204 :: My Magnatune Downloader</a></li>
<li><a href="https://gitlab.com/davmo/magnatune-downloader">GitLab repository: original magnatune-downloader</a></li>
<li><a href="http://hackerpublicradio.org/eps/hpr2793">hpr2793 :: bash coproc: the future (2009) is here</a></li>
</ul></li>
</ul>
<ul>
<li>Resources:
<ul>
<li>Example files:
<ul>
<li><a href="hpr3413_coproc_test.sh">coproc_test.sh</a></li>
<li><a href="hpr3413_coproc_test.awk">coproc_test.awk</a></li>
<li><a href="hpr3413_show_queue_orig">show_queue_orig</a></li>
<li><a href="hpr3413_show_queue_db_1">show_queue_db_1</a></li>
<li><a href="hpr3413_show_queue_db_2">show_queue_db_2</a></li>
<li><a href="hpr3413_show_queue.pl.zip">show_queue.pl.zip</a></li>
</ul></li>
</ul></li>
</ul>
</article>
</main>
</div>
</body>
</html>