Traditional CI container setup
Molly had tried a few months ago to run ci in buildkite - and reported everything except front end timeouts seemed to be working - there's also a forem ci pipeline in buildkite (that keeps asking for an irreversible yaml transition) that's probably tied to these
first-buildkite-pipeline https://github.com/forem/forem/tree/mstruve/first-buildkite-pipeline
try-containers-again https://github.com/forem/forem/tree/mstruve/try-containers-again
buildkite-ci https://github.com/forem/forem/tree/mstruve/buildkite-ci
Let's get one of these working (I think I can use a local buildkite agent using the buildkite cli to execute them, otherwise the CI config should have an equivalent docker configuration).
copy https://raw.githubusercontent.com/forem/forem/f44bfccb82e33e0266100c21f6de0183442d051e/.buildkite/Dockerfile into a local branch and call Docker build .
to see what happens (only change I made was to base off of the ruby:2.7.2 build
That worked (with the file option building from .
) - it looks like the coordination for this container requires postgresql up and running (which docker-compose would have done for you).
I'm still getting the carrierwave issue - and still not having this problem locally. Since molly wasn't crying about this 5 months ago, did something change with the carrierwave setup since then?
oh, I guess I had a typo - we're good - we're good! wow! whoa. this was stupid (and I still don't know what happened to make this work correctly).
https://github.com/forem/forem/commit/2dd87a52b1235bcdea78caefd6a680fc637c59e7 running all of spec/ shows a few failures - these might be systems tests - will let it finish before I decide what to do.
https://gist.github.com/djuber/607f96332f7dc3ff7de8cf05301a1708 okay - summary of issues
can't find chrome (all system/front-end tests require this for selenium)
seems like redis url is localhost:6379 and should not be (a few tests use rpush or put feature flags in redis) - might be a missing RPUSH_URL env var that has been set to redis://redis:6379 in a subsequent commit
admin should show last commit "some date" (might be tied to the entrypoint.sh not finding git?) - this looks like a one-off build process or execution order issue, rather than a problem with the container orchestration.
Next step was "add chrome to the build" since that seems like the biggest improvement if it works correctly. Not sure if the build process was off (at least the first time I confused the image tag and built a bogus name not referenced) - I only shaved one failed spec.
https://github.com/forem/forem/compare/master...djuber:djuber/spike-running-tests-in-docker is the current progress as I get closer to clean.
Note to self - adding rspec to entrypoint.sh for test makes running this easier but makes the output happen all at once (less useful for watching impatiently). --format=documentation
might help (the issue is the console width tracking that the pretty rspec binary tries to use to know how to fill the screen with green during progress formatting is nil for me in the container when writing to logs, so nothing is sent to output until a WARNING gets triggered (then all pending results are flushed out).
Okay - chrome is found (chrome-headless was not enough) but failures on the same tests because vcr/webmock doesn't know to permit through posts to selenium host
90% sure I saw an "allowed hosts" commit that molly made in another branch that I need to bring in.
Indeed - in spec/rails_helper.rb
there is an allow list, I added selenium:4444 to this (which appears to have fixed a bulk of the system test failures) - I'm still getting some issues with file uploads (remote chrome and selenium < 3.14 - might need to update the container in the docker compose).
Not sure I understand the error - the container appears to be running 3.141.59 and should be "> 3.14". Maybe the webdriver code is stale?
Gemfile.lock shows selenium-webdriver is 3.142.7 - seems like also "> 3.14" but I might be missing something...
Time to re-read that Avdi Grimm article on "end to end tests" https://avdi.codes/rails-6-system-tests-from-top-to-bottom/ since I think I'm missing some piece of the action. Toss in https://avdi.codes/run-rails-6-system-tests-in-docker-using-a-host-browser/ for extra credit and come back in 30 minutes after thinking for a bit.
Slog continues
So I have a few questions hanging in the air
why is selenium saying it is using an older version than the one in the container I've used? Is that just inconsistent error reporting based on the weird "lets make digits of pi because we're having so much fun" versioning scheme picked by the selenium maintainers?
I did check to make sure I wasn't accidentally running chrome locally (which seems specially weird since I had to install chrome to get system tests to run) - is the current setup "chromium browser in the ruby container, chrome driver in the selenium container, webdriver sends traffic over port 4444 to ask remote selenium to drive local chromium"? Does that make sense (and why would we go through all of that setup pain?).
Okay - reading assignment part two was https://avdi.codes/run-rails-6-system-tests-in-docker-using-a-host-browser/ and discusses the issue I had earlier (can't find chrome) suggesting disabling the webdrivers gem (which manages the driver but not the download of the browser) was part of the issue?
One thing in our code that popped out is a guard for HEADLESS=false as an env variable (suggesting chrome should be attempting to use the headless version, further frustrating my need to have installed chrome for gui plus the required gui libs to support that).
Just gotta say using containerized selenium, docker, chrome and rspec all together seems like a pretty confusing configuration (there are a lot of parts, they all have to know a little about at least one of the others, and they may or may not work the way you expected).
One thing I did as an experiment to squelch a warning I thought I saw was to send the option to disable sandboxing, thinking chrome was running in the rails container as root, however, it's running as user 1200 inside the selenium/standalone-chrome container (docker container top
showed this).
My morning of experimentation gets me from "under 20 failures" yesterday to 86 failures mostly tied to invalid session ids now. I'm tempted to reverse direction on what I've done at this point and go back to "close but not yet right" for the time being. In fact - all of the tests now seem to be giving "invalid session" and I think something has changed without my wanting it.
Although, I wonder if the problem is that I am running system tests in isolation (why should that matter)? One continuing pain point about this setup is that the code is COPY
into the container, rather than mounted as a volume, meaning a container rebuild is needed between test runs if anything in the spec or code changes.
Session handling
VCR raises an error after the end of the test suite where it looks like we're trying to send DELETE session/:session_id
to the webdriver port. Since this is not recognized the request fails. This has a knock off effect that running the test suite twice without stopping and restarting the selenium container (I just docker-compose down and back up) causes connection failures (can't bind on interface is the error selenium gives but it feels like the competing sessions cause issues).
This Cannot assign requested address (99)
error basically signs a death-sentence for Chrome headless capybara tests.
I might have missed one or two orphaned containers - it's useful to add --remove-orphans
to docker-compose down. This is getting to the point that a makefile would be my next step under normal conditions.
Last updated
Was this helpful?