Kuba's Journal

Skythread updates
Since last autumn, I’ve mostly been focused on database and server stuff (the Postgres migration) and Ruby coding (e.g. the firehose consumer optimizations. I also have a couple of JavaScript projects, mainly Skythread – I’ve been collecting a todo list of things to fix and add there, but I somehow couldn’t get myself to switch contexts and take a look at that project again. I’ve been procrastinating on this for months.

And somehow, one day I decided to just sit down and open Skythread again and start coding – and apparently starting was the hardest part, who would’ve guessed? 🙃

So here are the “fixes and improvements” I managed to make in the last ~1.5 months:
- support for showing outline hashtags in post records
- support for video embeds, which for now are rendered like images, as a “[Video]” link
- clicking Tenor GIF embeds now loads and displays the GIF inline
- you can look up URLs from e.g. deer.social, zeppelin.social and other forks of the Bluesky webapp just like with bsky.app; same goes for the Label Scanner (thx @lucidillusions.in and @roland.cros.by for the suggestion)
- number of quotes of a post is shown in the post footer for all posts in the thread (which have any quotes), and not just for the root post
- added an indicator in the post footer if replies are limited in a thread due to a thread gate
- avatars are loaded lazily now, which should help with long threads with hundreds of posts, where CDN requests get rate-limited (it’s still not ideal, I’m planning to work on this a bit more)
- bookmark 📌 comments are listed at the end
- simplified the look of quotes on the quotes list a bit, so that more of them fit on the page at the same time
- posts in quotes list and hashtag feed views now show an icon with the number of replies in the footer
- fixed some annoying layout issues in some “hidden replies” threads (☣)
- if hidden replies don’t load, there is now some info about why they didn’t
- fixed being able to close the login dialog on pages that require login, like the timeline stats
- whitespace is trimmed from login/password fields in the login form
Additionally, I’ve been working on a new set of features somewhat separate from the main thread view, locked behind a login gate, meant to provide various statistics and utilities for your account. I haven’t figured out the navigation part yet, so they’re only reachable through direct links for now, but here’s what I’ve built there so far:
1. Posting statistics
This tool lets you scan and analyze various timelines in other to estimate who posts how much on average. The main use case is either to scan your home timeline, to find out if maybe some people are adding a little bit too much content to it, and you could consider unfollowing them, or at least moving them to a list (as a workaround for no “turn off reposts” option yet); or to look closer at one or more specific people that you are considering following, to see how much they would add to the timeline if you follow them.

You can select the time range of how many days back to check, and there are four different modes:
- scan home timeline
- scan a list timeline, i.e. what you see when you pin a user list as a feed (list feeds don’t include reposts)
- scan one or more specific users – look up the handles using autocomplete
- scan your own profile timeline (what you post/repost)
Note: the “archive” of the home timeline that you can fetch from the Bluesky AppView is limited to some fixed number of recent posts, which I think includes all replies (even if they aren’t displayed). Depending on how many people you follow, you will only be able to fetch some amount of days back before you reach the end, likely not all 60 that the slider allows. For me the limit is usually about 3 weeks, but I’ve seen people say that they only get a few days worth of posts, or even less than one day (!).
1. Like statistics
This one is for scanning likes – yours on posts and comments of others, and those you’ve received from them. Again, you select a time range to scan, and you get two tables, with the people you gave and got the most likes from recently. It can be a good way to find some people that you might have a lot in common with, but maybe aren’t following yet.
1. Timeline search
And the third tool is a very “MVP”-version thingie for searching things you might have seen on your home timeline, but can’t find again. You need to first select a time range to fetch (right now it’s not saved anywhere, so you lose it after closing the page), and then type some word or phrase in the search field to show matching posts from what was fetched.

This is mostly just something that I built in <2 hours because someone asked about it, but it’s also a first piece of something bigger that I want to make later this summer. Stay tuned 😏
→ 9:04 PM, Jul 2

Firehose go brrr

A couple of months ago, I wrote about the tests I’ve done on how I can speed up my code for processing the Bluesky firehose, using my Skyfall Ruby gem. The limits I’ve reached then were around:

4k events/s with full post processing that it normally does (saving all posts, matching posts to feeds etc.)
5k events/s if using Jetstream – but I noted that it seemed suspiciously like that was a fixed rate limit (which others have confirmed in the comments)
6k events/s just reading packets from the firehose, without processing

Since then, I’ve done two more things. The first one was that I ran Jetstream locally for testing, and configured it to have a much higher rate limit (the –max-sub-rate option). I’ve confirmed that indeed, with the rate limit not getting in the way, Skyfall using Jetstream can go much faster on the same server, up to about 10-12k doing full processing.

The second thing is that I started doing some profiling to find out where else I could save some processing time, and in the process, I managed to massively speed up the underlying faye-websocket library 🙃

The Faye speedup fix

So, I was playing with ruby-prof to find where else I can shave off a few microseconds. I ran the scan on the version with my processing turned off, expecting the remaining work to be mostly in some boring internals of Faye or Ruby core libs, reading and writing bytes from the socket, adding them together and waiting.

And I found something… quite interesting: a majority of time was spend in two places:

Some code deep inside Faye’s helper library websocket-driver, which takes a filled string buffer and converts it to a byte array to be dispatched to a handler:

           4.519    130695/130695   WebSocket::Driver::Hybi#emit_message
13.83%     4.519           130695   String#bytes

And my code in Skyfall, which takes a byte array received from Faye and converts it to a binary string:

           0.000         1/130696   WebSocket::HTTP::Response#body
          12.549    130695/130696   Skyfall::Firehose#handle_message
38.42%    12.549           130696   Array#pack

… wait a minute… 🤔🤔💡

Yes, for a binary websocket (which is used here), Faye prepares the received data in a binary String, but then sends it out as an Array of bytes:

def emit_message
  message  = @extensions.process_incoming_message(@message)
  @message = nil

  payload = message.data

  case message.opcode
    when OPCODES[:text] then
      payload = Driver.encode(payload, Encoding::UTF_8)
      payload = nil unless payload.valid_encoding?
    when OPCODES[:binary]
      payload = payload.bytes.to_a    # <===
  end

And since I want a binary string at the end, to pass it to the CBOR library for decoding, I need to take that byte array and convert it back into a string just like the one we had before:

def handle_message(msg)
  data = msg.data.pack('C*')    # <===
  @handlers[:raw_message]&.call(data)

So could we just… not do that? 🫠

Turns out, yes, although not without some hacking, since the library didn’t have an option to emit a string instead for binary websocket messages.

I made some monkey-patches to Faye & websocket-driver first, and eventually a pull request which I submitted to the author – adding a :binary_data_format option to Faye::Websocket::Client initializer, where you can ask to have the data returned as a string instead, defaulting to the original method of returning a byte array. The author actually said that he thinks it makes sense to change the default to a binary string (while adding an option to revert it), and version 0.12.0 was released last month with this changed behavior. (To use this in Skyfall, you need the latest update, version 0.6.0.)

New benchmarks

I’ve run the benchmarks again, and the results look very encouraging. This is for the CBOR firehose, with and without the fix, and I also rechecked the async-websocket library for comparison:

Library	With processing	Processing + Rust lib	Just reading
Faye Firehose (old)	3,200-3,350	3,300-3,450	6,800-6,900
Faye Firehose (+ fix)	5,300-5,600	6,300-6,800	33,000-34,500
Async Firehose	5,700-5,900	5,700-5,900 (?)	5,800-5,950 (??)

And this is for Jetstream without a rate limit (the Faye fix does not affect/help Jetstream, because here the stream is text-based, not binary, so that problematic code path wasn’t used here):

Library	With processing	Processing + Rust lib	Just reading
Faye Jetstream	9,400-9,600	11,000-12,000	180,000-200,000
Async Jetstream	10,000-11,000	12,000-15,000	250,000-280,000

As you can see:

the fix gives me basically 50% speedup for free in the existing live code
with the Rust module for regexp matching turned on, when a higher portion of the whole time is spent inside Faye + Skyfall, that speedup becomes more like 100%
and when you skip all processing, and almost all of the time is spent inside Faye + Skyfall, that gets as much as 4-5x speedup (!)
this gives me a possible ceiling of as much as 33-34k events/s if I manage to further optimize or parallelize the event processing parts (parsing CBOR/CAR, feed matching, building models, calling Postgres etc.)
Jetstream mode, without the rate limit, can be around 2x faster than the Firehose version in practice
with the data saving part optimized further, we could possibly go into even 6-digit numbers with Jetstream, but at this point it’s kind of theoretical, because I would likely run into many other bottlenecks before I get there (disk speed, VPS bandwidth limits etc.)
the Async library can be a bit faster than the EventMachine based one, but not dramatically so

For unknown reason, I wasn’t able to make the Async version go faster than 6k evt/s on the CBOR (binary) stream (while it was using much less than 100% CPU), and it worked fine with Jetstream; not sure why – maybe it was some issue on my side, but I don’t really want to spend time digging into this. EM/Faye works fine (especially now), is very battle-tested (even if not being updated much anymore), works with older Rubies, and it would be a big API change for apparently not that much gain. So I think I’m going to keep it as is and maybe reconsider for Skyfall 2.0 one day…

Overall, I think all of this gives me more than enough space to not worry about this again until Bluesky becomes much bigger :) And it looks like I’m not even going to need any parallel workers + Redis queue setup anytime soon.

Speeding up the firehose
As the final phase of my database experiments, I’ve been playing recently with optimizing the speed of processing in the firehose process.

The Bluesky firehose is a streaming API that sends you every single thing happening on the network in real time – every post, like, follow, everything (encoded in a binary format called CBOR). And my firehose process is a client connecting to that stream, pulling data from it and saving the relevant parts (the posts).

At the moment, the firehose flow ranges between ~300 and 700 events per second during the day (“event” being one “action” like created or deleted record of any kind), with the bottom around 7-9:00 UTC and the top around 17-22:00 UTC. But the highest bandwidth we’ve seen so far was around 1.8-2.0k events/s in mid November, when a massive new wave of new users joined in the aftermath of the US elections, and we’ve seen the traffic go up rapidly by a few hundred % within days before too, e.g. in early September when the whole Brazil joined Bluesky. The firehose protocol also includes a lot of extra data, needed for cryptographic verification and full replication of data repositories, but not used in a simple case like this – so the total data bandwidth to stream and process is an average ~20-30 Mbps right now (with the smaller traffic than before).

My server was struggling back then in November, processing the 1.8k events per second, which is a big part of the reason why I started looking for a better suited database than SQLite, to be able to handle the next wave. I had a few improvements lined up, which I now started applying one by one (trying to do one at a time, so I can measure which one does what difference), each time pausing the reading for a day or so, and then catching up for several hours at full speed, so I can check what’s the maximum speed it can do.

I was also doing very simple profiling on the way, measuring which parts take how much time and where I can still save some extra microseconds:

Overall, I managed to bring the number from around 2k evt/s to around 4k over a period of about a week (similarly in the MySQL and Postgres versions) – so not bad at all.

These are the changes I’ve made:

Partial indexes

I’ve switched two indexes in Postgres to partial indexes (MySQL doesn’t support that). I have two indexes in the posts table, on quote_id and thread_id, which set a reference to a quoted post and the root of the thread respectively, which are NULL for some of the posts (thread_id for like half of the posts, quote_id for the vast majority), so I added a WHERE ... IS NOT NULL condition to those two.

Effect: no noticeable change in performance, but it least the indexes take less space now.

ActiveRecord

I updated ActiveRecord – from v. 6.1 that I had before to 7.2 (Rails 8 was released in November, but it feels a bit too fresh right now). Ironically, AR 8.0 seems to finally fix my longstanding problem with concurrency errors in SQLite, just when I’m getting rid of it…

Effect: around +10% processing speed.

Ruby update + YJIT

I updated Ruby from 3.2 to 3.4 and enabled YJIT, a new optional JIT written in Rust. I had tested YJIT before on 3.2 and there was a noticeable improvement, but I’ve noticed some kind of memory leak then that they’ve apparently fixed only around 3.3.1. Interestingly, my code seems to run slightly slower on 3.3 and 3.4 without YJIT, but enabling it more than makes up for it.

Effect: around +15%.

Batch inserts

I was already previously grouping posts into a queue, waiting until I got 10, and then saving them all in one transaction, using code like this:
ActiveRecord::Base.transaction do @post_queue.each do |p| p.save! end end
This was already faster than the initial way of saving of each post separately, directly in the block where it’s built from the processed event. But that was still multiple INSERTs, and I wasn’t sure I could do it in just one while using the ORM.

Apparently I can, with the use of the insert_all API. The catch is that it takes an array of raw argument hashes, skips validations and doesn’t handle child records (my Post records have linked FeedPost entries if they are included in a feed). So I validate the records separately earlier (I was actually already validating them first before, which means the validations were running twice), and for those that have FeedPosts, I save them separately next.

So the whole thing looks like this:
def process_post(msg) post = Post.new(...) if !post.valid? log "Error: post is invalid: ..." return end @post_queue << post if @post_queue.length >= POSTS_BATCH_SIZE save_queued_posts end end def save_queued_posts matched, unmatched = @post_queue.partition { |x| !x.feed_posts.empty? } if unmatched.length > 0 values = unmatched.map { |p| p.attributes.except('id') } Post.insert_all(values) end @post_queue = matched return if @post_queue.empty? ActiveRecord::Base.transaction do @post_queue.each do |p| p.save!(validate: false) end end @post_queue = [] rescue StandardError => e # ... try to save valid posts one by one end
Effect: up to around +30% in the Postgres version, less in MySQL one.

Rust native module for regex matching

Last year I wrote a little native module in Rust (which I didn’t know at all before) to speed up matching of post contents with a large number of regexps for the feeds:

I played with a benchmark of some regexp matching code in several languages last week (I wanted to replace that part of Ruby code with something faster, maybe Crystal or Swift). ChatGPT helped me write some versions :)
The results were… unexpected. Swift & Crystal didn’t do well, but JS & PHP did 🤔

[image or embed]
— Kuba Suder 🇵🇱🇺🇦 (@mackuba.eu) April 16, 2024 at 7:34 PM

I haven’t enabled it on production yet though, because there are some edge cases I need to get fixed, and I only used it for manual feed rebuilding so far. So as the next step, I enabled that Rust module in the firehose process. (I’d love to eventually make that available as a gem you can easily add to your app.)

Effect: +15%.

Asynchronous transaction commits

Next, I turned on two options: synchronous_commit = off in Postgres, and innodb-flush-log-at-trx-commit = 0 in MySQL. These basically make it so that when you commit a transaction, you don’t wait until it’s synced to disk first, but the request returns just leaving the data in memory, and the data is synced to disk later (but like, up to a second later). So if the whole instance hard-reboots for some reason, you can lose maybe one second of data (but it wouldn’t corrupt the database, since the transaction is either saved or not).

This is possibly still dangerous if you’re saving user’s private data from a webapp, but in this case, I’m streaming public data that I can scroll back through a bit if needed. So in the highly unlikely case that I lose a second of data somehow, I just replay a very slightly older cursor.

Effect: no noticeable improvement… probably because with the batched inserts, it was already doing just a few inserts per second, so there wasn’t any more time to save here.

Optimizing JSON handling and Time parsing

I had a place in the processing function where it encodes a record parsed from CBOR into JSON, to be saved into a data column and for logging purposes, but for complicated reasons that JSON.generate was called twice with slightly different versions of record. So I refactored that to let it skip one encode.

There was also some time spent parsing the record time and the event time into Time objects from an ISO8601 string using Time.parse; but I had a feeling that this could probably be speeded up using some native code, especially if it only accepts ISO8601 formats, and only in forms that actually appear in the data (since Time.parse handles many, many possible formats). With the help of Mr. GPT I made a little C module with a single function to replace Time.parse, and that indeed ended up being faster. (I’ll also try to turn that into a micro-gem.)

Effect: +8% (this finally pushed it over the 4k level)

Jetstream

Finally, just to check for now, I’ve switched from the full CBOR firehose to Jetstream, which is a kind of pre-processed stream that strips all the signatures and Merkle tree stuff and outputs a simplified JSON stream. I’ve already added support for it to my firehose library Skyfall in October, but I’m not using it everywhere yet.

Effect: +30-40%, although the speed was suspiciously capped at almost precisely ~4989 evt/s on both servers… so I’m guessing I might have hit some kind of rate limit of the Jetstream server. But I could also host it myself if needed.

(Edit: yes, apparently 5k is the default rate limit configured in Jetstream server settings.)

Potential ceiling

I also did a test with all processing completely skipped, so the app would just read the data packets from the websocket and discard them, not even decode the CBOR. This did around 6k evt/s with the full firehose (possibly more for Jetstream, but I couldn’t test this) – so it looks like this is the max I can do without switching to something completely different. The library I’m using for websocket connection uses the very old eventmachine library underneath, and I was told that the newer async-websocket that uses the Async framework could be faster, but in brief tests I wasn’t able to see any improvement so far.

Finally, just to see what the limit would be if I were to abandon Ruby (at least for that part), I ran a quick test in Rust, specifically using the example from the Atrium library. It did… 25k evt/s 😱 So yeah, there are still more options ;)

That said, my current VPS will max out the allowed traffic at a sustained 300 Mbps, which if I’m counting correctly would be something like 7.8k evt/s from the full firehose, so above that level this would be the main problem.
→ 6:26 PM, Mar 18

MacBook Air vs. Pro
I’m a very picky person when it comes to buying things… especially Apple hardware. I can’t count the number of Apple devices I tried out and sent back, because something wasn’t right. Because there are always tradeoffs, and I don’t like tradeoffs, I want it to be just perfect 😅

So since I really needed a new MacBook (I’ve been using an M1, which would be fine if I had only bought it with more than 8 GB RAM… 🤦🏻‍♂️), last autumn I decided to test MacBooks this way. I wasn’t sure if I wanted an Air or a Pro, since they’re quite different right now. There’s this holiday promotion that Apple has each year around Black Friday, that if you buy something in November-December, you can safely return it until the 2nd week of January (!) instead of the usual 2-week period. So I decided to buy both an M3 13" Air and an M4 14" Pro, try them out and pick one to keep in January 🙃

Spoiler: I’ve kept the Air in the end. The Pro has some things better than the Air, but it’s noticeably thicker and heavier, and using the classic MacBook Airs for ~5 years in the last decade has spoiled me forever, and I just refuse to go back to heavier, bulkier laptops.

There are a few more differences that I wrote down, so here’s a short subjective comparison if you’re just looking at those shiny new Sky Blue Airs and thinking if you should get that one or the Pro…

Note: I haven’t really used the Pro that much, certainly not all the time for 2 months – more like a few times for a few hours. I’m writing on the Air right now.

Size / exterior

As mentioned, there is definitely a difference. For comparison:
- current MacBook Air 13.6" (2025): 1.24 kg
- Retina MacBook Air 13.3" (2020): 1.29 kg
- classic MacBook Air 13.3" (2015): 1.35 kg
- older MacBook Pro 13.3" (2020): 1.40 kg
- current MacBook Pro 14.2" (2024): 1.55 kg (M4) to 1.62 kg (M4 Max)
So the lightest Pro is 150g heavier than my previous M1, 200g heavier than classic MacBook Airs, and 300g+ heavier than the current Air (!). That’s about 2/3 of the iPad’s weight of difference, and it’s absolutely noticeable. It might not be a problem if you use it as a desktop computer, but it is a problem if you need to carry it around all the time, or if you have the “on the lap lying on the couch” kind of workplace like me…

The Pro is also a lot thicker (and at least feels thicker than the 13.3" Pro). It’s not that much of a problem, but I definitely like how super thin the Air is.

I remember the Pro also had those kinda annoying slots with a bit sharp edges at the bottom, for air in/outflow probably, Air doesn’t have those.

Screen

The Pro has a slightly bigger screen, not that much though.

ProMotion (120 Hz) on the Pro is immediately noticeable, but I can’t say if it’s important or even clearly better, it’s a bit weird at first (feels kinda… iOS-y?); probably something I could get used to rather quickly though. I can’t say I’m missing it on the Air at all, but I’m not that used to this mode, since the only devices with it that I owned was my previous and current iPad.

The Pro has a better DPI – the screen has a “full 2x Retina” resolution (1512 x 982 “point” size), while the Air uses a scaled resolution (1280 x 832 “point” size, while the standard display resolution is 1470 x 956), so the Pro’s screen is sharper. Again, the past two MacBooks I’ve used also used the scaled resolution, so it’s not something that’s bothering me.

Pro has an HDR mode with higher brightness – I’ve tested this playing “Silo” on Apple TV, but… I had a bit mixed feelings about it. I’m not sure if it looked better on the Pro with the higher contrast and real black, than on the Air with less black blacks. And the white interface elements that show up on hover were kinda blinding.

The Pro is technically capable of higher peak SDR brightness, but it’s hard to say how this works exactly – the specs say “up to 1000 nits (outdoor)” (vs. Air’s 500 nits). But note the “outdoor” – so it’s something that uses the light sensor, and only enables higher max limit if it’s in a bright place. So if you took both of them outside on a bright day, the Pro would probably be much more usable, but in a rather dark room, they looked pretty much the same at max brightness. But there might be some hacks to unlock the limit (?). Anyway, as someone who usually has max brightness set, and has sent back one Air in the past because it was too dark (400 nits), the current Air is good enough for me.

I was a bit worried about the screen color balance – Apple has this annoying thing where every device has a slightly different white tone, sometimes it’s more yellowish, sometimes more blueish, sometimes more towards violet…. and it’s not always possible/easy to calibrate it differently. Some of the past MacBook Airs and the new OLED iPads are too yellowish for me, for example.

So these two… are both kinda yellowish, the Pro even more so I think. But they can be calibrated a bit – the Pro has this more advanced color calibration thingie that I’ve never seen before, that’s missing on the Air, which lets you change color balance on a hexagonal grid. The Air has a standard color profile wizard with a white point slider. But I’ve moved it slightly towards the blue and it’s ok. If you’re used to Apple’s current OLED screens (I’m not), you won’t notice this.

Notch

I was horrified when they first announced the MacBook with a notch in the screen… I was expecting this to be a big problem. It’s not really a problem in practice, and I say this as a notch-hater. I know there’s some hack that lets you run a smaller resolution, pushing the menu bar down and making the screen fully rectangular, at the cost of lost vertical space, but I don’t think I’d want that – those pixels definitely count. The only thing I don’t like is that the notch swallows overflowing icons from the menu bar, they just disappear if you have too many. I was recommended several tools that work around this in that thread, but I haven’t tried any yet.

Keyboard

I think both have fairly similar keyboards – the good one that Apple switched back to from the terrible butterfly one. They both felt ok to write on.

However, they do look different – the Air has the classic black keys on silver background design, while the Pro has black keys on… black. I’m not a fan of this, tbh – the keys are much harder to see. They do have the backlight of course, but you kinda need to rely only on that.

Ports

The Pro has an HDMI port and an SD card slot. I couldn’t care less tbh – I use these very rarely, and I got adapters. I prefer a thinner laptop instead.

The bigger difference is the extra USB-C port on the right, while the Air only has ports on the left – that would be kinda useful, but oh well.

The Pro also has the headphone jack on the left instead of right, but I’m not sure if that matters. It might be better this way tbh – I think they used to be on the left long ago, and I liked that more?…

Temperature

The Air definitely runs a bit warmer at idle than the M1 MBP did (doesn’t really fall below 40°C when it’s running), and gets warm quicker when doing something serious, and I don’t really like this… or even if the temperature of the CPU is the same, it feels warmer on the bottom, which matters when you’re mostly holding it on the lap. I guess it might be because of the fan-less internal design, or just things are arranged differently inside somehow. Although I think the Pro was also slightly warmer than my M1. (I’m curious how the new M4 Air will turn out in complete reviews…)

Sound

The Pro had a better sound on the speakers – not that the Air sounds bad, but the Pro definitely sounded better somehow (note, I’m not an audiophile at all).

And surprisingly, the Air has a different sounding touchpad. I think the click sound is the same here as on the 2020 MacBook Air, and different than the current and last few generations of Pros – the Pros have a kinda more “metallic” click sound, and the Airs' touchpad click is… I don’t know how to describe it, it’s different. A slightly lower, more “dull” tone. I think I like the Air version more.
→ 7:05 PM, Mar 6

Year 2024 review
2024 was a pretty intense year for me in terms of coding… and a pretty not-intense one for anything else 🫣 I was so completely absorbed with building my Bluesky-related projects. It gives me a lot of satisfaction to be able to work on things which are based on my own ideas and needs, which give me fun problems to solve, let me learn a lot, and which seem to be helping a lot of other people.

I figured I could look back at this past year and make a summary of what I managed to build and do, since it’s been a while since I’ve done one of those. Yes, I know we’re well into February already, I’ve been procrastinating, sue me. (Also, we’ve just had an anniversary of Bluesky public launch a few days ago, so that seems like a good excuse too.)

Bluefeeds

Bluefeeds is my internal codename for the Ruby project which runs my main server at blue.mackuba.eu, which includes my feed service, the Handle Directory, statistics and so on.
- New feeds: ATProto feed, ATProto projects feed, Kit feed 🐱, News From Mastodon, Bluesky Team Links feed
- The two Replies feeds: Only Replies and Follows & Replies, which have gained a lot of popularity after the Following feed changes in the summer, when the option to show all replies in Following was removed, and which are now my two feeds with the biggest traffic
- A work-in-progress Hashtags feed (for following hashtags, currently requires running a Ruby script to add tags)
- Some continuous updates to existing feeds, like the Linux feed, where I’ve recently had to fix the name “RedHat” appearing in completely unrelated contexts 🫠
- The Handle Directory, listing handles grouped by top-level domain, a global list of most popular accounts with custom handles, and a ranking of independent PDSes
- The statistics page, showing daily and weekly post stats, later also separately for bridged posts from Mastodon/Nostr and posts on independent PDSes
  
  I also periodically post stats of most popular languages on Bluesky – what % of posts are in which language, and how many users are there that regularly post in each
- The very unstyled labellers list page, which was even briefly featured in a CNBC video for a few seconds 🤯
- An automatically updating list of official .gov and .edu accounts with verified handles, plus an open source repo with the script that updates them
- I also managed to learn some basics of Rust 🦀 and to write – with major help from ChatGPT – a tiny native module for Ruby in Rust, which I knew absolutely nothing about before, to speed up the matching of post contents to regexps (I’ll try to write a bit more about this at some point and share the code)
- … and I’ve spent literally months working on migrating this whole thing from SQLite to either MySQL or Postgres, and I’m still not done 😩
Skythread
- Added a hashtag feed view, listing latest posts with the given hashtag, back when this wasn’t working in the Bluesky app yet
- Similarly, added a post quotes counter and a view listing the post quotes, before that was added to the app (you can still use it now to peek at the quotes which are hidden/detached in the official app 😈)
- Implemented link highlighting and link cards
- Added special handling for bridged Mastodon posts, which are showing author’s formatted Mastodon handle and the expanded full-length post contents (since Bridgy stashes the full original text in a custom field in the post records)
- Added an “infohazard” feature, which lets you load hidden comments which the official app hides because of the “nuclear block” between two commenters
Some of these, like the quotes list and the hidden comments, use data loaded from the Bluefeeds server APIs – since I’m saving all incoming posts to one big-ass table, I can index them by quote source and by thread root, and can use that to look up all quotes of a post or all replies which are supposed to be there in the thread.

Blog
- I wrote a super long post “Complete guide to Bluesky” with a lot of explanations of how things work and various tips and tricks (and updated it a few times, as new features were added)
Skyfall
- 7 releases, adding support for new features like labellers or account state events
- added support for streaming from Jetstream, parsing events from JSON instead of CBOR
Minisky
- 2 releases with small improvements (OAuth support is still waiting for a better day)
New projects & tools
- Label Scanner – a simple web tool that shows all labels assigned to an account/post by any of the known labellers
- sdk.blue – list of libraries and SDKs for Bluesky/ATProto, grouped by language
- tootify – a simple tool for selective cross-posting from Bluesky to Mastodon, which I use to mirror some of my Bluesky posts to my Mastodon account
- didkit – a Ruby library for resolving handles to DIDs, loading DID info like assigned PDS host etc.
- a bookmarklet to easily mute notifications in the Bluesky web app
- two Ruby scripts to scan your home feed or some selected accounts and print the stats of who is posting how much daily
Servers
- I’ve migrated my projects from my old Swiss hosting Coin.host to German Netcup, which has VPSes with really fast CPUs and SSDs and a ton of bandwidth included for a really good price
- I’ve set up my own self-hosted PDS at lab.martianbase.net, and migrated my main account there
Oof… that’s a lot of stuff 🥲 What’s next, in 2025? I… I really think I need to work a bit more on work-life balance (sideproject-life balance?) this year. I feel like I’m doing way too many things. But I have no idea if that’s going to happen. I think at least I should try to do a bit more project finishing and a bit less project starting.

First of all, I can’t wait to get out of this transitional state with that main service running on three different databases in parallel. I think I’m really close now (though I’ve been saying this for weeks).

Then, I have all of the above to maintain, OAuth to add to Minisky & Skythread, things like better documentation and tests to add to the open source projects. I have a few ideas for longer blog posts to write, e.g. some kind of architectural introduction to ATProto.

And also… I’d really love to write an AppView one day (but yes, I know, less project starting 😛).
→ 7:42 PM, Feb 10

Postgress progress
I think I’ve finally made some good progress with my Postgres database. Things are working pretty well now, and I still have a few fixes in the queue. It’s not all finished yet, but I feel like I’m coming out of the woods now.

For those out of the loop: I have a server which streams and saves all posts from the Bluesky firehose, and uses this data to run several feeds on Bluesky used by 1k+ people, plus generates daily post statistics, the directory of custom handles and things like that.

This all has been running on SQLite so far (200+ GB file now), and it has served me well far longer than I would ever have expected, but in the end I’m hitting its limits. The main problem is lack of support for concurrent writes, which makes it unrealistic to e.g. split the work into multiple workers, which I’m gonna have to do sooner or later.

I’ve only really used MySQL before on servers, but everyone was telling me I need to use Postgres, so I did what any normal person would have done (right?): I’ve set up two test servers, one on MySQL and one on Postgres, running from two modified branches of the code, and I made the SQLite-based production server proxy some requests to one or the other in order to test them on real traffic. A sort of database A/B testing 🤪

It has been working like this for a few months now, and I’ve been slowly tweaking things, mostly on the Postgres side. Since I got it to work at all, they’ve both been working ok, but MySQL has generally been doing more write I/O doing the same things, and Postgres has been doing comparably more read I/O.

The key part are the Replies feeds (Follows & Replies and Only Replies, it’s basically the same code). These two are kind of variants of the built-in Following feed, showing the connecting user their own chronological timeline, generated live, filtered in some way. These feeds work by fetching the requesting user’s follows list, either from the AppView or from a local cache, and then making a query that’s basically: “give me the most recent 100 posts from any of these 50/200/1000/5000 users”.

There are two relevant indexes in that table: one on (repo, time) (repo = user’s DID), and one on just (time). Roughly speaking, for those users who follow e.g. 80 or 200 people, it makes more sense to scan the (repo, time) index those 80-200 times and collect the 100 most recent posts from all of those found, and for those who follow e.g. 9000 (yes, that happens 😛), it’s faster to scan the single (time) index until you find 100 relevant posts. But I’ve been struggling to make Postgres always use the right index.

At first it tried to use the (time) index way too much. I tried to find a way to force it to use a specific index, but I was told it’s not possible (it might be though!). With the help of some people on Bluesky, I managed to rewrite the query to use the other index, but then it worked bad for those people with a lot of follows. I interpreted the EXPLAINs as the query planner not having a good enough info about the structure of the data (it estimated n_distinct, the total number of distinct values in a column, as much lower than the actual number of users in the table), so I started messing with the “statistics” settings. I ended up bumping up the statistics target to the maximum 10,000 (making ANALYZE take really long) and hardcoded n_distinct to 4.5M, which was closer to the total count. I also increased shared_buffers to 1 GB. This improved things somewhat, to the “not great, not terrible” level, but I was still not happy with the performance.

But eventually, I was able to improve things a lot over the last few weeks:
- First, I was re-reading one of the blog posts about the differences in how MySQL and Postgres organize data internally, and I had a bit of an epiphany: unlike MySQL, Postgres doesn’t store the primary key (id) in secondary indexes, since they reference the physical location of the data instead. In MySQL I had a version of the query which first did a subquery that collected just post IDs, and then used the final 100 IDs to fetch posts, but this couldn’t work here if the indexes don’t include id! This was something unexpected for me. So I replaced the larger index with (repo, time DESC, id) and rewrote the query again with a subquery, and the EXPLAIN showed “Index Only Scan” now, meaning it could use just the index for the subquery.
- Next, the Index Only Scan was showing something like: “Heap Fetches: 31362”. This apparently means that even though it tries to only use the index in that step, it still has to go fetch 30k post rows from the actual table. This is because as records are updated or deleted (in my case only really deleted), Postgres just marks them as deleted, but doesn’t free up the space or update the index until a VACUUM is done. A separate structure called “visibility map” tracks which table pages have some outdated records in them, and if an index entry points to such page, it has to go check if that record is still there. So I started tracking the state of this visibility map, and set up a cron job to do the VACUUM twice a day to keep it relatively “clean”. (I wish it was as easy to do with my apartment…)
- Now, it was working great for people with short lists, but still tried to use the (repo, time, id) index for most very long lists. The EXPLAIN showed that it was vastly overestimating the cost of using (time). I came to the conclusion that… the hardcoded n_distinct was actually working against it now, even though it was technically more precise. In my practical use cases, what actually matters is not how many different users have at least one post anywhere in the table, but how many are actively posting - whose posts you can find in the last few hours or a day. And that’s much less than 4.5M. So I cleared the hardcoded n_distinct and let it find its own estimate, and reset the statistics target to the default 100 - and this improved things quite a bit! It was now almost perfectly using index_posts_on_repo_time_id to around ~1200 followed accounts, and index_posts_on_time above that.
- Not related to the query/Postgres directly, but I also added a “prefetching” task to the follows list cache: it runs every 15 minutes and checks which follows lists are approaching the point where the cached data becomes stale (but was accessed recently) and reloads the list from the AppView in advance. This saves a couple to several seconds of time every time the list had to be refreshed before returning the feed, so it made quite a lot of difference.
- One last thing I need to fix is that for long lists, significant time is spent on query planning (e.g. 1724 ms planning time and then 12 ms execution time). I’m gonna try to restructure the query to pass the list of user IDs differently somehow and see if that helps, or perhaps I could move the cache to a new database table instead of keeping it in .json files.
- And finally, I have some more config tweaking to do - e.g. it seems shared_buffers could be bumped up to at least 2 GB here (with 8 GB total RAM) - but I wanted to first make sure it’s doing the right things before I give it more space to cache it.
I’m really looking forward to when I can finally set up a production instance with Postgres and move everything there and drop the other branches - but I feel like this moment is getting close 😅
→ 10:39 PM, Jan 16

new domain

who dis

→ 8:49 PM, Jan 8

test post

test 123

→ 5:52 PM, Jan 8

Hello world

just setting up my mcroblog

→ 7:55 PM, Jan 7

The Faye speedup fix

New benchmarks

Partial indexes

ActiveRecord

Ruby update + YJIT

Batch inserts

Rust native module for regex matching

Asynchronous transaction commits

Optimizing JSON handling and Time parsing

Jetstream

Potential ceiling

Size / exterior

Screen

Notch

Keyboard

Ports

Temperature

Sound

Bluefeeds

Blog

New projects & tools

Servers