Reapproach quay containers
Part 2 in an N part series where I try to get our forem:testing containers to run the test suite
Jotting down a few notes before I get started. Talked earlier this week with Joe about how they overcame volume mount and user permissions issues for the forem self-host instances (Digital ocean droplet running podman running thes quay.io/forem/forem:latest image plus supporting services).
One thing that needed/needs to happen is removing the root user and using the forem user (sounds like a docker-compose change- the container file already builds a forem user, we weren't using that). - Snapped this commit together during the discussion https://github.com/djuber/forem/commit/36dc3c56a1f18351983bd038f28e49c60ff016be
My last series of tests would copy the tree into the image during build (so any code changes locally required a rebuild of the image, which wasn't a problem as long as just running the container was the task at hand, but could be a big hassle for live use/reuse).
My problems with port vs expose are still valid (I think I want to change to expose basically for everything in testing - unless there's value in seeing port 3000 I shouldn't need to stop the local dev env to interact with docker).
How do I isolate the "testing" stage in the container file?
docker build --target="testing" . I think this might be correct.
Change 1:
make the entrypoint call rspec instead of rails server - I remember this being a weird decision when I was using docker compose to run the tests because of how the output happened
-CMD ["bundle", "exec", "rails", "server", "-b", "0.0.0.0", "-p", "3000"]
+CMD ["bundle", "exec", "rspec"]
Did some minor surgery on the test compose file and I'm able to get errors running tests now successfully
Commented out the profile image from the users factory and I'm down to a few hundred failures (out of 6000) so (1) carrierwave is still an impediment, mostly tied to user factory, (2) it's the main problem to solve here.
My next step will be to add a single carrierwave test case "it saves profile images" and assert I can do that - and use that as the indicator for testing.

In the meantime - with just a comment on the profile image attribute in the users factory - I can get the specs to run more or less (some other attached images fail, a few tests actually act on it, there are other problems) - but we're looking at a few hundred rather than four thousand failures (when create(:user) fails lots of scenarios go belly up).
Apart from carrierwave - the same issue as earlier (chrome binary not present in the image) presents - we may want to filter out feature tests and run them separately anyway?
This fails (for the same reason you would expect - the profile_image is #<ProfileImageUploader:0x00007f076ab798f0 @model=#<User id: 13362, apple_created_at: nil, apple_usern...=nil, @cache_id=nil, @identifier=nil, @versions=nil, @versions_to_cache=nil, @versions_to_store=nil> and that's not valid (image is nil, so image type is nil)
Suddenly started seeing issues with module autoloading around Notifications (which makes far less sense as a problem).
Part 2
I had some time to reapproach this today. As an experiment, I first tried changing the volume: to a mount: (bind mount) by updating the docker compose file. This seems like it was a dead end (worthwhile experiment, but possibly unimportant).
Second experiment was to change the Rack::Test::UploadedFile.new calls in the factories to File.open() calls. This appears to have worked (notably, I don't see a /tmp file created by rack, and do see the file in public/uploads/users/profile_image/userid but this feels like I've sidestepped the issue rather than understanding it.
Part 3
I'm going to just throw this week at the problem
I removed the edit to the users factory (restoring the original rack test uploaded file image).
User.new(profile_image: Rack::Test::UploadedFile) creates a new tempfile per user (i.e. I pass in an uploaded file that has a tmpfile already, during processing - before anything goes into the upload directory - another tempfile is created - user model fails validation (initial error was wrong mime type - but removing the minimagick include from base uploader causes the error to change to "must be at least 1 byte" and the profile_image has no associated file:
Works fine when I pass in an open file object
This also worked fine on Debian (when I built the app container from rails instead of the forem image) - so I suspect there's something either with /tmp or /opt/apps/forem/ tied to selinux or some other fedora core specific item.
Let's try to see what the profile image uploader is doing:
This doesn't hit the method missing block to puts anything - and trying to use TracePoint didn't log anything until I exited the breakpoint and rspec continued.
I put a breakpoint in check_size (in CarrierWave::Uploader::FileSize - since that's one of the validations that's failing) and wanted to look at new_fileand see what was happening.
In the never ending shuffle of content from /tmp to /tmp to public/tmp/ to public/uploads/user/profile_image it looks like there's also a step through tmp/
Somehow this file is being created, assigned to the upload, but it is empty. Having no content in the created file sure sounds like it could also cause issues with validating the content type is allowed.
Copying the content of the jpeg from /tmp into the jpeg in tmp/ does cause the test to pass:
Since there's something going wrong before this is happening, let's capture the context and figure out where to look upstream
That looks meaningful - I called profile_image= from a factory, and carrierwave started doing things via a mounter.
So what methods look interesting in this call stack?
We have AR's profile image set calling into the mount's profile image set, calling cache in mount.rb:46 (def cache is line 43 in that file):
line 63 is uploader.cache!(new_file) which is where we leave this file (having allocated a blank uploader?) - suggesting "no, upload was not a string, it was a File/IO/Stream type object. This might actually be either too late (we already have new_file which might be the wrong object) or too early (cache! might be the method creating the file in tmp/) - this is worth putting a breakpoint at line 62 (uploader has been created, but not called, and new file is still in scope). For the meantime, I'll leave the breakpoint after in check_size in place as well.

Okay, great - I'm on top of the problem (I have the input and the object I'm giving the input to, and am able to repeatably create the error condition):
So let's take a look inside cache! which raises the error:
I am going to pretend it's okay to just reuse the current context (inside the mounter's cache method) and not set another breakpoint for the moment.
I think the place this might be going wrong is in copy_to:
Since I have this
So this is aggravating - FileUtils.cp is getting a valid source, a valid destination, it's creating a new entry (file) but nothing is getting put into it
So why is the stdlib's FileUtils.cp creating an empty file successfully but not copying the data? I do see Entry_ there and it has a copy_file method (which I assume does something like moving the file content?).
I think my next step is to put a breakpoint in FileUtils.copy_file (I'll probably open the class and redefine the method to add the breakpoint as part of a unit test setup). That's for tomorrow.
tomorrow
Is it tomorrow already? Well time to get to work.
We'll hit this twice, once for the downloaded file being copied to /tmp and a second time for the /tmp file being copied to /opt/apps/forem/tmp/ the first pass is normal
We resume and wait for the second pass (it's basically immediate)
This is where things go wrong. We have what looks like the same setup - a src file, a dest path, which we open for writing, and a call to IO.copy_stream
so Entry_ objects expose stat which basically reflects the stat from the unix filesystem
Both entries are owned by forem (current user) and both are 0x600 (read-write user only).
stat.mode appears to be the input to creat - see man creat(2) or man open(2)
The primary distinction between this call and the preceding one is that /opt/apps/forem/tmp is on the mounted volume, while /tmp is inside the container. The dest file is created by File.open but copy_stream does not move any bytes to the target.
Let's open a new file in /tmp/whatever.jpg and observe if the issue is cross-filesystem cross-device copying (there was a note about that in https://bugs.ruby-lang.org/issues/13867 that might come into play in a minute).
"Huh." it looks like maybe the mode is the issue? In any case, it doesn't look like the issue is crossing into a mounted volume if writing to tmp also fails the same way (weird).
https://ruby-doc.org/core-2.7.2/IO.html#method-c-copy_stream is where we are - it's a c function - good luck in there!
I still don't understand how we get into this situation - we have an entry and a src, are we passing an open (and read) file during the call, but only when we copy into the mounted volume? that seems like it's a different way of getting 0 bytes copied
rewinding and retrying does not fix it here (the way we actually failed during execution).
Adding insult to injury - if I open the original file and copy from it (rather than the temp file) it just works
Summary of where I think I am
this worked fine in a debian container (where the source was copied rather than mounted)
copying file to tmp works
copy file from tmp fails
position does not seek (we basically didn't read at all from the open file)
copying file to ultimate target works.
copying to another file in tmp works, but copying from that file also fails to copy into the dest
Only "visible" difference is the original file was 0100644 and the tmp file is 0100600 - but we're the correct user and should be able to read (in fact, we can read, just not copy stream).
fix?
I ended up changing the docker-compose file I'm working with to mount /tmp as tmpfs rather than using the /tmp from the overlay filesystem. This appears to have fixed the issues for me (I now see all but the system tests requiring chrome building with no failures).
So that's good enough for a 0 level approach - next is to figure out selenium + chromedriver + webdrivers gem + rspec with docker (molly had a branch with this early on that I might revisit - I remember a lot of quirks around allowed sites and webmock/vcr interacting differently at the end so sessions weren't getting removed). I suspect we can run selenium/system specs in a separate container (using --tag=js to filter them).
Second thing I am not doing is using knapsack (but the whole spec suite takes 12 minutes in docker locally so not sure how much that matters).
Last updated
Was this helpful?