Some podcast aggregators show ccdn.php as file name #321

Open
opened 2025-12-06 13:54:13 +00:00 by archer72 · 3 comments
Member

Some podcast aggregators show ccdn.php as file name instead of file extensions like .mp3 or .ogg

The following were tested:
Newsboat
Clementine
Strawberry

This test was in response to comment #6 on HPR4521
https://hackerpublicradio.org/eps/hpr4424/index.html#comment_4521

Some podcast aggregators show ccdn.php as file name instead of file extensions like .mp3 or .ogg The following were tested: Newsboat Clementine Strawberry This test was in response to comment #6 on HPR4521 https://hackerpublicradio.org/eps/hpr4424/index.html#comment_4521
Owner

I looked at this issue because it interested me, and I had fallen into it years ago. I thought I'd write about my experience in case it might be of interest. There may be much better solutions to this problem of course :-)

I tripped over this because I modified the venerable Bashpodder script many years ago to act as my personal podcatcher. It suffered from what archer72 is referencing in this issue. It used the approach of assuming the enclosure URL contained the target filename (which it very often did in the early days of RSS and podcasting). It fell over badly when it encountered issues like this.

This implies that the parsing of the enclosure URL to find the target filename is no longer a reliable algorithm (and hasn't been for some time).

A better way of doing this would seem to be to use the HTTP Content-Disposition header (if there is one of course). I tried a simple experiment as follows, using the --content-disposition option of wget:

$ wget --content-disposition 'https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4525/hpr4525.mp3'
--2025-12-07 13:32:43--  https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4525/hpr4525.mp3
Resolving hub.hackerpublicradio.org (hub.hackerpublicradio.org)... 204.13.239.180
Connecting to hub.hackerpublicradio.org (hub.hackerpublicradio.org)|204.13.239.180|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://archive.org/download/hpr4525/hpr4525.mp3 [following]
--2025-12-07 13:32:43--  https://archive.org/download/hpr4525/hpr4525.mp3
Resolving archive.org (archive.org)... 207.241.224.2
Connecting to archive.org (archive.org)|207.241.224.2|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://dn721709.ca.archive.org/0/items/hpr4525/hpr4525.mp3 [following]
--2025-12-07 13:32:44--  https://dn721709.ca.archive.org/0/items/hpr4525/hpr4525.mp3
Resolving dn721709.ca.archive.org (dn721709.ca.archive.org)... 204.62.247.49
Connecting to dn721709.ca.archive.org (dn721709.ca.archive.org)|204.62.247.49|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4596445 (4.4M) [audio/mpeg]
Saving to: ‘hpr4525.mp3’

hpr4525.mp3                                                100%[=======================================================================================================================================>]   4.38M  4.19MB/s    in 1.0s    

2025-12-07 13:32:46 (4.19 MB/s) - ‘hpr4525.mp3’ saved [4596445/4596445]

The query is redirected several times to get the file, and apparently one of the servers in the chain has provided a Content-Disposition header that has allowed wget to generate the correct filename.

Without the --content-disposition option I received the file:

‘ccdn.php?filename=%2Feps%2Fhpr4525%2Fhpr4525.mp3’

So it seems that some step has given the necessary clue, rather than the final URL being parsed to get the filename.

Obviously the failing podcatchers are unlikely to be able to use this type of approach without a redesign. Although I have no expertise in this area I wonder if an alternative URL format in the feed would help.

Could a REST-like interface be used with URLs such as:

https://hub.hackerpublicradio.org/eps/hpr4525/hpr4525.mp3

The code behind this URL could determine that it's an episode download being requested by the eps part, then know it's episode 4525, and the audio file hpr4525.mp3. Another URL could then be redirected to an URL of the format that's currently being used, and the chain of content delivery could follow. The point of doing this would be to give a simplified URL to the podcatcher which it is able to parse to get the filename.

It's just a suggestion, and, if it even worked, would require feeds to be remodelled, which might lead to chaos.

I looked at this issue because it interested me, and I had fallen into it years ago. I thought I'd write about my experience in case it might be of interest. There may be much better solutions to this problem of course :-) I tripped over this because I modified the venerable `Bashpodder` script many years ago to act as my personal podcatcher. It suffered from what `archer72` is referencing in this issue. It used the approach of assuming the enclosure URL contained the target filename (which it very often did in the early days of RSS and podcasting). It fell over badly when it encountered issues like this. This implies that the parsing of the enclosure URL to find the target filename is no longer a reliable algorithm (and hasn't been for some time). A better way of doing this would seem to be to use the HTTP `Content-Disposition` header (if there is one of course). I tried a simple experiment as follows, using the `--content-disposition` option of `wget`: ``` $ wget --content-disposition 'https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4525/hpr4525.mp3' --2025-12-07 13:32:43-- https://hub.hackerpublicradio.org/ccdn.php?filename=/eps/hpr4525/hpr4525.mp3 Resolving hub.hackerpublicradio.org (hub.hackerpublicradio.org)... 204.13.239.180 Connecting to hub.hackerpublicradio.org (hub.hackerpublicradio.org)|204.13.239.180|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://archive.org/download/hpr4525/hpr4525.mp3 [following] --2025-12-07 13:32:43-- https://archive.org/download/hpr4525/hpr4525.mp3 Resolving archive.org (archive.org)... 207.241.224.2 Connecting to archive.org (archive.org)|207.241.224.2|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://dn721709.ca.archive.org/0/items/hpr4525/hpr4525.mp3 [following] --2025-12-07 13:32:44-- https://dn721709.ca.archive.org/0/items/hpr4525/hpr4525.mp3 Resolving dn721709.ca.archive.org (dn721709.ca.archive.org)... 204.62.247.49 Connecting to dn721709.ca.archive.org (dn721709.ca.archive.org)|204.62.247.49|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4596445 (4.4M) [audio/mpeg] Saving to: ‘hpr4525.mp3’ hpr4525.mp3 100%[=======================================================================================================================================>] 4.38M 4.19MB/s in 1.0s 2025-12-07 13:32:46 (4.19 MB/s) - ‘hpr4525.mp3’ saved [4596445/4596445] ``` The query is redirected several times to get the file, and apparently one of the servers in the chain has provided a `Content-Disposition` header that has allowed `wget` to generate the correct filename. Without the `--content-disposition` option I received the file: ``` ‘ccdn.php?filename=%2Feps%2Fhpr4525%2Fhpr4525.mp3’ ``` So it seems that some step has given the necessary clue, rather than the final URL being parsed to get the filename. Obviously the failing podcatchers are unlikely to be able to use this type of approach without a redesign. Although I have no expertise in this area I wonder if an alternative URL format in the feed would help. Could a REST-like interface be used with URLs such as: ``` https://hub.hackerpublicradio.org/eps/hpr4525/hpr4525.mp3 ``` The code behind this URL could determine that it's an episode download being requested by the `eps` part, then know it's episode `4525`, and the audio file `hpr4525.mp3`. Another URL could then be redirected to an URL of the format that's currently being used, and the chain of content delivery could follow. The point of doing this would be to give a simplified URL to the podcatcher which it is able to parse to get the filename. It's just a suggestion, and, if it even worked, would require feeds to be remodelled, which might lead to chaos.
Owner

I think we can fix this on the apache side. Thanks both @archer72 and @davmo for the help in debugging this.

In theory the podcatchers should not download it again as the guid did not change, but most likely they will. Not a big deal for the regular 10 day feed but might be an issue for the full feed.

An episode Zero on the full feed with a warning may be the way to go to alert people to the issue.

I'll do some investigation.

I think we can fix this on the apache side. Thanks both @archer72 and @davmo for the help in debugging this. In theory the podcatchers should not download it again as the `guid` did not change, but most likely they will. Not a big deal for the regular 10 day feed but might be an issue for the full feed. An episode Zero on the full feed with a warning may be the way to go to alert people to the issue. I'll do some investigation.
Owner

Solution was to add a Content-Disposition Header in the ccdn.php

Solution was to add a [Content-Disposition Header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Disposition) in the ccdn.php
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: HPR/hpr_generator#321