Better Podcasting Through Linux

Note: This post is taken from my reddit post. You might find the comments there helpful if this post isn't.

I recently noticed that a podcast that I listen to regularly has sound issues; specifically, that one of the hosts has a tendency to talk softer than the other host. This leads to me constantly fiddling with the volume to hear what's being said or to avoid being blown out by other sounds/music. I thought there had to be a solution, and I found one. Actually, I made one. And so can you! I wanted to share with all of you what I did. I'll post links to the scripts in their entirety so you can customize them as needed, but I'll also go over the main points here.

High Level Overview

So I want to fix my podcasts. Let's break the idea down into its general parts. The basic process would sound something like this:

Download new episodes of my podcasts.
Normalize the audio.
Create an RSS feed that points to the new audio files.
Create a script to automatically pull this all together for us so we can put our feet up and relax to a good podcast.

Luckily, there exists open source and self hosted solutions to each of these steps.

podfox Linux CLI podcasting client.
ffmpeg audio/video Swiss army knife.
dir2cast podcast feed generator.

Scripting -- ffmpeg

This is where the magic really happens. Let's start with ffmpeg, because normalizing audio is a complicated feat in and of itself. We'll be using the loudnorm filter to even out the audio volume. You can use loudnorm in a single line command, but it is suggested that for the best normalization, to use a 2-pass process. I actually put this in its own script called normalize.sh. For the first pass we'll want to get a few specific value from loudnorm:

input=[generic stand in for input file name]
tempFile=$(mktemp)

ffmpeg -i "$input" -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=summary -f null - 2> $tempFile

This command will take information gathered by the loudnorm filter that will look something like this and put it into a temporary file:

[Parsed_loudnorm_0 @ 0x7fffb8cef180]
Input Integrated:    -15.1 LUFS
Input True Peak:      -2.7 dBTP
Input LRA:            16.5 LU
Input Threshold:     -28.2 LUFS
    
Output Integrated:   -16.8 LUFS
Output True Peak:     -5.7 dBTP
Output LRA:           12.7 LU
Output Threshold:    -29.5 LUFS
    
Normalization Type:   Dynamic
Target Offset:        +0.8 LU

I used a temporary file so I could make things easier on myself. We'll need to extract four values from this output: Input Integrated, Input True Peak, Input LRA, and Input Threshold. I use grep four times. Is there a better way to do this? Probably. But I'm only so so with bash and this works.

output=[generic stand in for output file name]
integrated="$(cat $tempFile | grep 'Input Integrated:' | grep -oP '[-+]?[0-9]+.[0-9]')"
truepeak="$(cat $tempFile | grep 'Input True Peak:' | grep -oP '[-+]?[0-9]+.[0-9]')"
lra="$(cat $tempFile | grep 'Input LRA:' | grep -oP '[-+]?[0-9]+.[0-9]')"
threshold="$(cat $tempFile | grep 'Input Threshold:' | grep -oP '[-+]?[0-9]+.[0-9]')"    

ffmpeg -i "$input" -loglevel panic -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=$integrated:measured_TP=$truepeak:measured_LRA=$lra:measured_thresh=$threshold:offset=-0.3:linear=true:print_format=summary "$output"

Next, I tackled the podcasting setup.

Great. Normalization complete.

Scripting -- podfox

Per the instructions, installed and set up podfox, as well as added some feeds. As an example, we'll use the podcast I used, Opening Arguments.

podfox import https://openargs.com/feed/podcast OA

This will create a directory using the short name you give the podcast, in our case, /home/username/OA.

Our automated process, which will go into podcast.sh, will want to do a few things:

Update the podcast feeds
Download new episodes for each podcast that haven't already been downloaded.
Normalize the newly downloaded files
Move the files to a target directory (that our webserver can find)

Of note, we'll be running the entire script as the root user so we can move the final mp3 files into /var/www/html/podcasts, but we'll be running some commands (like podfox) as our own user.

declare -a arr=("OA" "And" "other" "podcast" "shornames")
username=[your username here]
source=/home/$username/podcasts
target=/var/www/html/podcasts
tempdir=$(mktemp -d)

Make sure to change into your temporary directory, because it is the easiest way to move the files once they're done processing.

cd $tempdir

Next we update the podcast feeds.

sudo -u $username podfox update

Now loop through your array of podcast short names.

for podcast in "${arr[@]}"; do
[work per podcast]
done

For each podcast, we'll need to check the list of episodes, determine if any of the latest episodes haven't been downloaded yet, download them if need be, then normalize the file.

tempfile=$(mktemp)
sudo -u $username podfox episodes $podcast > $tempfile
undownloaded=$(head -3 $tempfile | grep "Not Downloaded" | wc -l)
sudo -u $username podfox download $podcast --how-many=$undownloaded

for file in $(find $source/$podcast -iname "*.mp3"); do
[work per file]
done

For each file, we'll extract the file names, extensions, and check to see if the file was already normalized or not. If not, we'll normalize it into our temporary directory.

filename=$(basename -- "$file")
extension="${filename##*.}"
filename="${filename%.*}"
normalized="$filename.normalized.mp3"
foundfile=$(find $target/$podcast -iname "$normalized")
if [ "$foundfile" != "" ]; then
    #Already normalized. Nothing to do.
else
    /home/$username/normalize.sh "$file" "$tempdir/$normalized"
    mv * "$target/$podcast"
fi

Other than some cleaning up, we're done with the podcasting automation.

dir2cast

dir2cast is a nice little PHP script script that uses a directory (or directories) of mp3 files and will generate a valid RSS feed. It has pretty excellent documentation on its github page, but honestly there isn't much here to configure. Assuming you have apache (or nginx) up and running, it's mostly drag-and-drop.

I created subdirectories within the directory holding our script (/var/www/html/podcasts) for each podcast feed. These are our $target/$podcast directories referenced in our podcast automation script.

Example Directory Structure

/var/www/html/podcasts/
/var/www/html/podcasts/dir2cast.php
/var/www/html/podcasts/OA/
/var/www/html/podcasts/OA/episode01.mp3
/var/www/html/podcasts/OA/episode02.mp3

I suggest you take the time to copy the dir2cast.ini to each podcast subdirectory and edit it with the information from the original podcast feed. This will make your experience in your podcast listening apps a little nicer.

cron Automation

Of course we don't want to have to run this ourselves every time a new podcast comes out. That defeats the automatic nature of podcast feeds. So I run this script using cron every 15 minutes. This ensures that fairly quickly after a podcast releases, I'll have the updated files in my feed. The problem, however, is that if you're running this on a slower server, or if you're downloading multiple podcasts, the script may still be running 15 minutes later. To account for this, I created yet another script; cronpod.sh. This is run as a cron job with the following configuration:

*/15 * * * * /home/[your username]/cronpod.sh

Listening to Your Podcasts

All that's left is to listen to your podcasts in your favorite podcasting app. And maybe to add a reverse proxy with a Let's Encrypt SSL cert, but that's outside the scope of this post. :)

Per the documentation for dir2cast you should be able to access your podcast feeds in a combined form or in individual subfeeds.

http://[yourserver]/dir2cast.php
http://[yourserver]/dir2cast.php?dir=OA
http://[yourserver]/dir2cast.php?dir=[any podfox short name]

Although I didn't see anything about it in the documentation, I have successfully renamed dir2cast.php to index.php. This will make your urls a little prettier. If this breaks something for you, just revert the name.

Summary or TL;DR

Use podfox, ffmpeg, dir2cast, normalize.sh, podcast.sh, cronpod.sh, and a little elbow grease to improve your podcast listening experience.