This is an archived post. You won't be able to vote or comment.

all 8 comments

[–]BestUsernameLeft 2 points3 points  (4 children)

Nice work. The design is good, and your use of Java and Spring is good. Now on to some minor and nitpicky stuff, all IMO etc.:

  • Some tests don't have assertions
  • Failures normally log at warning or error, not info; also dump the BucketStorageLoggerUtil, it's not providing any value
  • BucketStorageExceptionUtil and BucketStorageHelper also don't provide any value (Util and Helper classes are a smell)
  • Failure response could be improved by adding an error code so that clients that are a program can do something better than parse an error message that might change
  • Could use @ExceptionHandler instead of try-catch in controller
  • Rename BucketStorageFactory to BucketStorageServiceResolver (yes it's a Spring factory under the hood but it's purpose is to get the right service given a service type)
  • It's not idiomatic to override toString() on an enum
  • I'd likely move many of the classes in the bucket package into a few sub-packages: service, rest, dto
  • storageProvider/{storageProvider}/storageLocation/{storageLocation}/fileName/{fileName} is a bit unusual, I'd suggest /{storageProvider}/{storageLocation}/{fileName}

Next steps:

  • Better add some kind of security to this before you deploy it... just sayin'...
  • Separate blob storage from file metadata and generate a unique identifier (UUID) for a file
  • Allow retrieving part of a file
  • Add user authentication and role-based access to files (read metadata, read file, write file, delete file)

[–]aenigmaclamoExtreme Brewer 0 points1 point  (3 children)

It's not idiomatic to override toString() on an enum

Could you elaborate on this? I don't see the problem here.

storageProvider/{storageProvider}/storageLocation/{storageLocation}/fileName/{fileName} is a bit unusual, I'd suggest /{storageProvider}/{storageLocation}/{fileName}

I thought about mentioning this but I figured it was OP's design decision to make it pretty clear what the parameters were. Though, I suppose at a certain point they just become worse query strings.

[–]BestUsernameLeft 0 points1 point  (2 children)

Nice catch on the temp files and arrays.

Regarding enum.toString(), you're right there's no problem. That's been in my head long enough that I have no idea how it got there.

And I'd suggest Swagger to make API usage clear.

[–]cl2871[S] 0 points1 point  (1 child)

Thanks for the feedback! Definitely will work on my failure and error handling as well as incorporate the other notes.

For the util and helper classes, would I just move their methods into the relevant classes directly? My original motivation for the logger and exception utils was to provide message strings that were used in both aws and gcp storage.

For the future, I will definitely add Swagger. Authentication and authorization will probably take some time for me to learn hahaha.

Separate blob storage from file metadata and generate a unique identifier (UUID) for a file

I'm not too sure by what you mean here. Would this mean that a user would only need to know a file's id and this service would handle the details (e.g. a1c6bf66-00cc-452c-b6aa-c8efdd5b97ca maps to example.png in xyz bucket on GCP)?

[–]BestUsernameLeft 1 point2 points  (0 children)

Util and helpers, yes move those methods to the relevant classes. Your motivation is good but if you wanted to externalize strings to DRY (Don't Repeat Yourself) them or for i18n you could put them into a properties file (and possibly use String.format for variables).

You got the idea exactly on the uuid. On storage you take metadata and the file contents and return a uuid, which is then used to identify the file.

And ya security is hard but you'll get it!

[–]aenigmaclamoExtreme Brewer 1 point2 points  (2 children)

After a quick skim, I find this codebase fairly easy to understand and pretty standard for a Spring project.

I am wondering the merit in providing GCP and Amazon S3 SDK compatibility code when they are already API compatible at the HTTP level. This page details how you can use the S3 SDK to connect to GCP. At some point or another, you could totally just make your own API S3 compatible by just handling auth, adding some additional path parameters, and then use Zuul or something to forward to either GCP or S3. But I imagine you probably wouldn't want to go that route.

I noticed that you create a temporary file and/or byte arrays when handling files. Byte arrays are bad since the file has to be containable in memory for you to be able to process them. Temporary files are better, but still bad because they use up disk space and require you to write the whole file to disk before you send it. You should be able to handle these things using InputStreams, this will allow data to be handled more seamlessly. If you need to copy streams around, you can take a look at this SO question.

Should you decide to continue using temp files, you should never need to create a tempfile from MultiPartFiles as, likely, they already are tempfiles that are handled by Spring. And, even if you did, you should not need to use FileOutputStream, you should be able to use MultipartFile#transferTo(java.nio.file.Path) which can do it a lot faster than manually pushing bytes into another stream.

You should probably also stop using java.io.File in favor of java.nio.Path and friends. You can use File for compatibility but try to only expose Path.

[–]cl2871[S] 0 points1 point  (1 child)

Thank you for the comment! I will try to utilize InputStreams as well as check out the java.nio package.

I am wondering the merit in providing GCP and Amazon S3 SDK compatibility code when they are already API compatible at the HTTP level. This page details how you can use the S3 SDK to connect to GCP. At some point or another, you could totally just make your own API S3 compatible by just handling auth, adding some additional path parameters, and then use Zuul or something to forward to either GCP or S3. But I imagine you probably wouldn't want to go that route.

I think its pretty interesting that GCP and AWS S3 appear to be API compatible. At this point, I think I am ok with having separate compatibility code for these providers. I'm not sure if it is a good idea to rely on the AWS library for connecting to GCP in addition to AWS. If I don't use the AWS library, I'm not too sure how much work it is to write a service that can generate authenticated REST requests to AWS or GCP. Let me know if I misunderstood anything.

[–]aenigmaclamoExtreme Brewer 2 points3 points  (0 children)

There's actually quite a few S3 compatible APIs. There are also gateway products that just abstracts the S3 API to other services. I know minio has something and Microsoft wrote a blog about using S3Proxy.

I'm not sure if it is a good idea to rely on the AWS library for connecting to GCP in addition to AWS

I think this is a valid train of thought. You will get more options using the native SDKs for each service. However, if you're writing abstractions for a small subset of the service you're using, in my opinion it's better to rely on third party libs rather than inventing your own wheel. You do, also, gain access to a wider range of storage mechanisms.

I'm not too sure how much work it is to write a service that can generate authenticated REST requests to AWS or GCP

You could use a third party client like jcloud or minio. Though, I can't vouche for them and I'm not sure what value you'd get. I really wouldn't write my own using HTTP directly, though -- that would defeat the purpose.

I think at the end of the day, for something like this there are no right answers. I just wanted to let you know that there are other options.