Substack RSS Feed Fix — Specification

GitHub Issue: #1 — Pulling from Substack is broken Date: 2026-02-09


1. The Problem

The GitHub Actions workflow fetch-substack.yml runs hourly to fetch posts from https://johndamask.substack.com/feed and update _data/substack-posts.yml. Since approximately 2025-11-25 (the date of the last successful auto-update commit), Substack has been returning HTTP 403 Forbidden to every request made from GitHub Actions runners.

The workflow still reports success because the Python script (scripts/fetch_substack.py) catches the 403, prints an error to stdout, and exits with code 0. The “Commit and push” step then sees no changes and skips. The site continues to display the 20 stale posts that were last fetched on 2025-11-25.

Evidence from workflow logs:

Fetching from https://johndamask.substack.com/feed...
Error fetching Substack feed: 403
No posts fetched. Keeping existing file if it exists.

2. Root Cause

Substack blocks HTTP requests originating from GitHub Actions runner IP ranges. This is a known issue affecting CI/CD pipelines across multiple platforms (GitHub Actions, Netlify build runners, etc.).

Key findings:

The requests.get() call in fetch_substack.py (line 31) sends a bare request with no special headers, but even adding a browser-like User-Agent would not help since the block is IP-based.

3. The Fix

The codebase already uses a CloudFlare Worker pattern for the audio transcriber tool (openai-proxy.jbdamask.workers.dev). A similar lightweight worker can proxy the Substack RSS feed request, since CloudFlare Worker egress IPs are not blocked by Substack.

Changes required:

A. Create a CloudFlare Worker (new)

A minimal worker that:

B. Update scripts/fetch_substack.py

C. Update scripts/fetch_substack.py error handling

D. No workflow changes needed

Alternative approaches considered

Approach Pros Cons
CloudFlare Worker proxy (recommended) Already a pattern in this codebase; free tier; reliable; fast Requires CloudFlare account setup (already exists)
Run fetch locally on a schedule No proxy needed Requires local machine to be running; not automated
Use a generic CORS/RSS proxy service No infrastructure to manage Third-party dependency; rate limits; reliability concerns
Self-hosted proxy on another cloud Full control Over-engineered for this use case
Substack API with auth token Direct access Substack has no official public API; fragile

Sources: