This week I got a bit sidetracked making a new podcast/playlist player! We’ll take a pause on the machine learning / LLM / AI stuff since this has taken up most of my week and given me a source of focused motivation.
I’ve been listening to the Security Now podcast since December 2019. I started listening in reverse order from the end, which wasn’t too difficult at the time using the Podcast app on iOS, which would go to the previous episode when the current episode was finished. However, I wanted to start listening from the beginning (Episode 1), which I tried, but due to technical reasons, it wasn’t practical, easy, or possible to get the Podcast player to recognize the old episodes. At the time, I tried to create a workaround by making my own XML file that the Podcast app would recognize and allow me to play from, but for whatever reasons at the time, I couldn’t make it work robustly and eventually gave up.
Then I tried manually keeping track of what episode I was on, and the time within the episode, but my imperfect memory and record-keeping for this particular use case resulted in entirely forgetting where I was, especially if I hadn’t listened for a week. I'm still not certain how far I had listened when I started from episode 1 in 2021.
This week a work project came up where I needed to come up with a new coding problem to solve. I decided to start working on a little podcast/playlist listener app. At first my ambitions were too high as I tried to build something complex on day one, but this allowed me to consider, “What’s the most impactful right now to get something working?” I had already (within the week prior) created a script that would convert Security Now episode metadata from Steve Gibson’s website into a m3u8 file that I could play with VLC. (m3u is a playlist file format, a collection of pointers to other media; the 8 signifies utf-8 encoding which basically means the file only uses characters [numbers, letters, symbols, etc] from a fixed set that improves compatibility.) However, VLC had no easy/friendly way of remembering my playback position.
I searched for an app that could solve this for me, but I wasn’t happy with any of the solutions I found and felt it wouldn’t be too difficult to make something myself.
The primary/first goals for something useful were:
- Keep track of the current episode
- Keep track of the current position within each episode, not just the current episode
- Play/pause previous/next directly from the player so I wouldn't need to manually transcribe this data between the app and the source playing the episodes
I managed to reach that in just a couple days. At the moment it’s only a remote player; I can visit and control it from my mobile device, even away from home (thanks to Tailscale/Wireguard VPN), but it doesn’t stream the audio itself to the mobile device; it plays the audio directly on the machine that’s running the server.
The next goals are:
- Stream the audio to mobile
- Friendly to bandwidth-constrained environments (i.e. mobile)
- Ability to control the media using media controls (headphone/keyboard/remote control/OS/phone volume/playpause/forward/backward buttons)
I already have self-contained code examples for each of those goals, but they need to be integrated together into the app.
To achieve the first two goals, we’ll be using HLS, which is basically just a fancy way to split a media file up into small chunks, give the client (phone etc) metadata so it knows about all the chunks, and then the client can request the right chunks at the right times so the audio/video can play seamlessly as if it were one media file on the client (this is what you see when you play a youtube video and the little gray bar loads only a little bit ahead of the current playback position, while the majority of the video is not downloaded yet). I tried this on my phone and it was supported natively (iOS); on Firefox desktop I installed a addon that wraps a javascript library that does this; on Tera’s phone it didn’t seem to work. However, I’ll be using the javascript library directly within the app so that it works seamlessly across clients without the client/user needing to install anything extra.
For the third goal, ability to control the media using phone/headphone/keyboard/… controls, we hook into the web browser’s media controls API. This worked directly in Firefox and on my phone, although browsers/clients tend to not allow this control capability unless it’s directly tied to media playing directly on the device. Since I originally set this up as a remote control so I could get something working more quickly, there’s no media playing on the client yet, so clients ignored the request for media controls and the buttons simply didn't trigger the commands. We have a working proof of concept that fully worked in both Firefox and iOS when it feeds up an actual media file, so it shouldn’t be difficult to get this working once the previous goal is working (streaming the audio data to the client).
For HLS/streaming, originally I got it working by first using an ffmpeg
command to split the audio file into a m3u8 metadata playlist file (playlist metadata file with information about all the chunks) and lots of .ts
media segment files, and then simply served the entire folder up statically, which the client was able to fully interact with to stream the audio. However, this splits the original media file into many small media files and requires not only more space on the drive where the media is stored, but also managing all these extra files and making sure folders are set up. So we got a prototype that dynamically generates a m3u8 including filenames to files that don’t actually exist, but as the client requests those chunks, the server splits the mp3 file up in-memory and serves the chunks as though they were coming from .ts
files on disk. It’s like magic!!!
Within just this past week, I’ve already managed to listen to Security Now episodes 1-25. Note, I listen on 2x speed most of the time, and although episodes are now upwards of 2 hours, they used to be 20-40 minutes back in 2005. It seemlessly keeps track of my current playback episode and position within the episode!!!
We could take this in lots of different directions. I envision an episode viewer which lists all the episodes along with their titles and descriptions, allowing the user to click on one and start listening to it right away. Right now it’s also pointing directly to my Security Now podcast playlist m3u8 file, which I generated using a script that isn’t in the repository just yet (but it will be). This means that while it might be technically possible for someone else to use it, it’s not friendly to end-users or even experienced developers, yet, so I think focusing on usability for a general audience will probably be most prudent since I’m revealing this little project to the world (my tiny world at the moment, but still).
If you have any interest in this app, or any ideas on the direction it should take, feel free to comment here, or post an issue in the github, or follow instructions on my contact page, or consider any other means you have to get ahold of me.
Thank you for joining us for another Dev Week!
P.S. when I get around to it, I'll be renaming this to Dev Week. As much as I want to make a game, and have some amount of vision about how it should look, my attention is a bit too split. There are so many things I want to do!
P.P.S. another feature of HLS is adaptive bitrate, so that if the client has poor network connectivity, the server can send lower-quality but lower-size chunks of the media. The current setup doesn't support this yet, but we can use pydub (which we're already using to split the media into chunks) to downsample the audio before we send it to the client. However, the server needs to know how much to downsample or if downsampling is even required, and for that the client may need to be able to communicate to the server about its current network speed. This will be another research topic and potential feature that I hope to get in, especially if I'm listening to episodes in a grocery store or other large store where network bandwidth seems to mysteriously thin or drop entirely.