Building a Custom Bluesky Feed, Part 1: Starting Simple
When I joined Bluesky, Jim suggested that I created a custom feed as part of the learning during the onboarding process. I was excited about it because custom feeds is one of the coolest things in the atmosphere (as we call things atproto-related). During my application process, I had already set up a debug feed in my account to test some tweaks to Discover before shipping them, so I was already a little familiar with how feeds fit into the picture. Long story short, my reaction to Jim's suggestion was hell yeah, why not.
I also had a triathlon feed I made in 2023 with SkyFeed, a no-code solution. It was a simple catch-all: posts containing "triathlon" or "swimbikerun." Too broad, lots of noise, missed plenty of relevant content too. Building a proper feed generator felt like the right time to fix that.
How a feed generator works
The atproto allows apps to define lexicons — think of them as API specs — for everything that happens on the network. Feed generators need to implement two:
app.bsky.feed.describeFeedGenerator: tells the network this server is a feed generator and what feeds it offersapp.bsky.feed.getFeedSkeleton: returns the list of posts to show, in order
That's it. Your feed generator doesn't render anything. It returns a list of AT URIs — post identifiers — and Bluesky fetches the actual content. Your server is only responsible for saying what to show and in what order.
A real AT URI looks like at://did:plc:3272gdrjsuikiff7qsgokgas/app.bsky.feed.post/3mgq6iky6dc22 — that is the AT URI for this post, for example. It identifies a specific record whose identifier is 3mgq6iky6dc22, which happens to be a post (app.bsky.feed.post). Also it tells us I authored it, since did:plc:3272gdrjsuikiff7qsgokgas is my unique identifier in atproto. If you are interested, you can learn more about it in the protocol's official website or in amazing community posts like this one.
The response for getFeedSkeleton looks like this:
type FeedSkeletonResponse struct {
Cursor string `json:"cursor,omitempty"`
Feed []FeedSkeletonItem `json:"feed"`
}
type FeedSkeletonItem struct {
Post string `json:"post"` // just an AT URI
}
Everything else — filtering, storage, ranking — is up to you.
Starting simple
To have anything to return, the server needs to collect posts. For that, it subscribes to the firehose, a stream of every single record published on the network. Every Bluesky post passes through it.
For each incoming post, I check it against a set of filter rules. The initial version hardcoded these directly in Go: a map of DIDs (the decentralized identifiers atproto uses for accounts) whose posts are always triathlon-related, and a list of regular expressions:
dids = map[string]struct{}{
"did:plc:leidqgx3be72rmeiwvdzvnes": {}, // world triathlon
"did:plc:bdg6sni7k7gq7hrgck6h3aky": {}, // triathlete
"did:plc:qcbkud2rb5mp3petgcof47ps": {}, // challenge family
}
targets = []*regexp.Regexp{
regexp.MustCompile(`(?i)\btriathlon\b`),
regexp.MustCompile(`(?i)\btriathlete\b`),
regexp.MustCompile(`(?i)\b70\.3\b`),
regexp.MustCompile(`(?i)\bhalf.?iron\b`),
regexp.MustCompile(`(?i)\bsprint.{0,13}tri\b`),
regexp.MustCompile(`(?i)\bolympic.{0,13}tri\b`),
regexp.MustCompile(`(?i)\bswim.{0,13}bike.{0,13}run\b`),
regexp.MustCompile(`(?i)\biron ?man\b`),
regexp.MustCompile(`(?i)\bkona\b`),
regexp.MustCompile(`(?i)\bchallenge.?roth\b`),
regexp.MustCompile(`(?i)\bxterra\b`),
regexp.MustCompile(`(?i)\bbrick.{0,13}workout\b`),
}
Posts that matched got saved to disk — filesystem storage, one JSON file per post. Crude? Absolutely.
But that was how far I got in two or three hours in one Sunday in January. I was having fun coding, and exploring atproto, so I read, I thought, I typed. The result was humble, and it was working from an old PC I have at home.
The Iron Man or Ironman problem
The thing is: this works, but it doesn't discriminate. For example, people sort of write ironman or iron man interchangeably, so the pattern \biron ?man\b will match posts about the Ironman, the swim-bike-run thing, and posts about Iron Man, the Marvel superhero equally. I found out that this word or expression could also mean a wrestling thing, a heavy metal song, a bikecross race, a NBA strike, among other things.
Other rules had the same issue. For instance: \b70\.3\b could be a race distance or just a number. \bkona\b is a village in Hawaii, a coffee variety, and also the most famous triathlon race in the world. And so on. The feed was technically alive. The content, though, was a mess.
A regex filter is a great starting point — it bootstraps a database without any labeled data, and the hardcoded trusted accounts add some quality signal. But it's a blunt tool. To build a feed worth actually using, I needed to know which posts were genuinely triathlon-related and which weren't. And for that, I needed to label some data.
But that is a different story. Stay tuned for part 2 — if we're talking triathlon, no surprise this series has three parts, right?
You can find out everything triathlon-related going on on Bluesky at the Triathlon feed. And, surely, the code is open source. Way to go!