Docker development revisited

In which I try to clone forem's repo, run docker compose, and use it for development for a while

I started here in March, and it came up in an internal meeting as a pain point yesterday.

Today I'm going to try docker (docker-compose) based development and get a feeling for the existing pain points. Forem in production (outside of a few sites on heroku) is entirely driven by containers, so running in docker shouldn't be wildly different from production, but users consistently have a hard time.

Checkout the code

git clone https://github.com/forem/forem

This part at least went well.

cd forem
docker-compose up

Creating network "forem_default" with the default driver
Creating volume "forem_db_data" with default driver
Pulling bundle (quay.io/forem/forem:development)...
development: Pulling from forem/forem
Creating forem_yarn       ... done
Creating forem_postgresql ... done
Creating forem_bundle     ... done
Creating forem_redis      ... done
Creating forem_rails      ... done
Creating forem_webpacker  ... done
Creating forem_sidekiq    ... done
Creating forem_seed       ... done
Attaching to forem_postgresql, forem_yarn, forem_redis, forem_bundle, forem_rails, forem_webpacker, forem_sidekiq, forem_seed

This worked fine (or looked like it did) - first bundle is run (in the forem_bundle container, as scripts/bundle.sh), periodically containers will notify that they're waiting for other processes to complete (rails is waiting for redis, postgres, and the sentinel files from the other processes, many other services are waiting on rails), yarn can run while bundle is running (these are interleaved in the log output).

I ran into an issue with the rails container running migrations the first time - I'm leaving the output here messy to illustrate the interleaving - only the rails_1 and forem_postgresql lines are relevant - when running the primary key migration on the notifications table there's a postgresql timeout (and consequently and AR error raised).

This is fixable (because I know how the rails migrations work this is just a case of stopping the docker processes, and restarting again - the db is persisted to storage so the schema migrations which succeeded will be skipped - and the bundle/yarn/webpack steps should be a little faster since everything is in place already. This would be a seriouly aggravating case for someone else (the only logged events after the rails container dies are two other containers seed and sidekiq saying they cannot connect to port 3000, and they sleep waiting longer).

On the second pass the migrations completed - rails listener started successfully - and sidekiq started up - the seed step however raised an error related to an unset APP_PROTOCOL in the environment

The code in url.rb where this occurred was the call to image_url with image name and host (should be set to the asset host if present):

Ah - asset_host is in the environment but I (as I always do) forget to copy the environment file from the sample. Do that now and restart

Failed again with another problem (limit was nil)?

The seeder code running when this happened was here

This suggests that users_in_random_order is nil? This is because the seed needs to run all at once (if there are User's already in the system, the create if none causes a skip

There's a second definition of users_in_random_order later, but it's inside a block and only the value from the user seed is available later. Seeds are not restartable if they have failed.

In a command line world, what I might do is just call db:reset or db:schema:load to clear the data - here there are some options (you could call the rake tasks using docker run on the forem_rails container, unfortunately just restarting gets the seed to detect badges exist and skipping the block completely

This is probably non-critical - I think the context is that users will not be awarded badges from the seed

Now since the db:schema:load code ran, the seed should run end to end.

I made one local change to the badges block to ensure the badges are assigned to users in random order rather than failing if the user creation is skipped

This wasn't strictly needed (resetting the db was sufficient, since if all the seeds run end to end there's no problem)

There are a lot of callbacks on the model creations that enqueued a pile of sidekiq jobs - but once that queue goes quiet (podcast episode creation is the main item taking time) you're just looking at a dockerized rails container.

This is even bound to 0.0.0.0 so if you're running docker on a remote host (I am, in this case) you'll still be able to use it - I can view the site at http://192.168.0.12:3000 from my desk (not that address).

Well - I thought I could - I get caught out by the byebug allowed networks

Also, letting things run for a while - the push notification worker was raising errors about being unable to connect to redis (all redis services are configured in the .env file as localhost:6379)

The redis status says there were blocked clients - I wonder if the sidekiq process was seen as too chatty and blocked or of if only the redis rpush url was a problem.

I'll keep an eye out for these problems as it goes by - I restarted docker-compose to clear the blocks (I assume) and observed the bust cache path worker running successfully after startup.

I moved over to the host ip (this is setup in the "docker" initializerarrow-up-right by calling ip route and doing some shell scrubbing to get the local network) and opened a browser - I'm able to log in but most of the webpack resources are giving 404's. This is true even after a restart and it's unclear why - that basically makes the site unworkable - since all the dynamic loading (like articles on the stories controller?) isn't happening without the js being available.

I've removed the manifest.json from public/packs/ to hint to webpacker that maybe it's time to do something extra - there might be a solution where the answer is to precompile assets (this changes the developer experience for front end changes substantially). Yes, removing the manifest and restarting seems to have helped - I still get routing errors for some of the components but articles are loading now

Last updated