How do I use carrierwave and a custom process to upload an unzipped folder to s3

Thomas Bush September 2, 2016 4:15pm

Goal

I take accept a zip file containing some config files as well as 2 folders filled with images
Unzip the folder
run a method to standardize the sub folder and image names as well as remove a unnecessary config files
push this end result to s3

Attempt

I am using carrierwave and a custom processor. I take in a .zip file, unzip it, and push the end result to s3. In an effort to break down a larger problem into smaller steps I am intentionally skipping the portion where I rename sub folders/files and removing extra config files. I already have this renaming/cleanup method created in a rake task so I don't assume the conversion will be that hard.

Problem

Carrierwave seems to be uploading a zip file, even though I am unzipping in the processor and the temp cached fold is unzipped as a result of my processes.

Carrierwave uploader

# encoding: utf-8
class RotatorUploader < CarrierWave::Uploader::Base
  include CarrierWave::RotatorConversion
  storage :fog

  def store_dir
    "#{model.class.to_s.underscore.pluralize}/#{model.id}/rotator"
  end

  def extension_white_list
    %w(zip)
  end

  process :unzip_folder

end

Custom Processor

module CarrierWave
  module RotatorConversion
    extend ActiveSupport::Concern
    module ClassMethods
      def unzip_folder
        process :unzip_folder
      end
    end

    def unzip_folder
      # move upload to local cache
      cache_stored_file! if !cached?

      directory = File.dirname( current_path )

      Zip::File.open(current_path) do |zip_file|
        zip_file.each do |entry|
        next if entry.name =~ /__MACOSX/ || entry.name =~ /\.DS_Store/ || !entry.file?
          entry_full_path = File.join( directory, entry.name )
          unless File.exist?(entry_full_path)
            FileUtils::mkdir_p(File.dirname(entry_full_path))
            zip_file.extract(entry, entry_full_path)
          end
        end
      end

    end

    def standardize_file_names(current_path)
      ... not yet included
    end

    private
      def prepare!
        cache_stored_file! if !cached?
      end
  end
end

I would really appreciate if anyone had any insight here, thanks!

Chris Oliver September 7, 2016 1:59am

I believe you're going to run into a few issues:

Carrierwave usually expects you to only have one file stored per mount. You can create cropped versions and such, but you have to mostly define those ahead of time.
How are you planning on referencing the expanded files in S3?

It sounds like you might not really care about saving the original zip file, but instead references to all the individual files that get created. The trouble is you won't know those files or folders until after the unzip. You may need to build your own sync to S3 for those extracted files and create another table to reference those as I'm not sure Carrierwave can handle that.

Now, something I've been using recently is Refile. It does allow you to create your own arbitrary files and then store them. I'm not sure that it will preserve the folder structure of the zip file though, so it might not matter. They have an example of uploading a video and storing both the video and a screenshot of it in the same attribute. It's closer to what you want, but not quite.

Are you wanting to preserve the file and folder structure?

Thomas Bush September 7, 2016 12:39pm

Thanks for the response Chris! I don't actually need a reference to all the individual files, but I do need to preserve file structure.

The zip file contains two folders of product images 'normal' and 'large' -- these names remain unchanged in all of my processing. Each of these subfolders contains 36 images -- essentially a 360 degree view of the product. The main zip folder will always be renamed to the product number, and all images in the 'normal' and 'large' subfolders are renamed 1-36.jpg.

I can currently get the zip file up to s3 with carrierwave, but it sounds like carrierwave may not be the correct solution for the processing portion of this problem -- running the task that unzips and standardizes subfolders and files names. So I need to find some other method to hook into s3 and run the task? Does that make sense? Any idea how how I would do that? I have most of the task completed, just don't know how to hook s3 and run it.

Chris Oliver September 7, 2016 2:14pm

Okay cool, that's a bit easier! I would say then what I would recommend is a background job for uploading the individual files. You can do the original upload quickly, Carrierwave can complete, and then you can fire off a job to extra the zip and upload those files.

There's a gem called https://github.com/chrishein/s3_uploader that takes a source directory and it recursively uploads all the files and folders to Amazon S3 for you. It basically does exactly what you want it sounds like. You could toss this in a background job and keep a status on the record to know when it completes. Basically start this job as soon as the Carrierwave upload completes and you should be good. You could probably fire it off as part of a processing method like you've got above, just passing in the zip to the background job. The only thing might be that if Carrierwave deletes the temp file when it completes, you may need to extract before starting the job and then pass in the directory of the extracted zip instead of the file directly.

It would be nice if you could upload the zip and have S3 do the extract, but that only seems possible by doing a AWS Lambda function, so I'd probably recommend the s3_uploader for simplicity.

Thomas Bush November 9, 2016 1:59pm

Got sidetracked with other projects, and finally got back to this with help. Thank you Chris -- processing locally and uploading with s3_uploader solved, no need for Carrierwave in the end.

Rails for Beginners

Advanced Ruby: Behind the Magic

Payments with Rails Master Class

Refactoring Rails

Learn Hotwire

Install and Deploy Rails Guides

Hatchbox.io

Jumpstart Rails SaaS Template

Remote Ruby Podcast

GoRails Open Source

Rails Hackathon

Beginner Bounties

Ruby on Rails Job Board

Notifications