I've been looking for a good URL validator for a while, but all the solutions I found are kinda weird, so I'm not even sure if I wanna use any.
Here is my usecase: A user can create a product where one of the fields (attrs) is the product website_url. On the product page the website_url will be displayed. I don't exactly know if I should just let them save the string regardless the format or I should use some validation via regex or URI.
What would you do? Leave it as it is or validate it somehow? If the latter what is your way doing this?
Maybe this can help you:
I'd validate it, as it is user data entering your database, you want to validate all user input, all.
Enrique, thanks! I found all of these on my own, that's why I wrote this question :). There are 4 different solutions and none of them looks that good. I thought there was a "best" solution all the experienced guys are using for production apps.
I realized my way of doing it is broken thanks to rspec. I just wrote a new question on stackoverflow: http://stackoverflow.com/questions/36566056/rspec-with-website-format-validation-fails. Could you take a look at it? I was using this version (but I think there must be a better approach) since I think users should be able to type 'example.com' and 'www.example.com' on the top of the URI versions.
You'll definitely want to do some parsing and validation. For example, leaving out the protocol happens all the time where someone might type in just
gorails.com into the field. The protocol is important, otherwise you can print out a link that ends up being relative on accident like
https://gorails.com/example.com. You see this stuff all the time when validations haven't been written properly.
You can use the url input field type to have the browser do a little verification. This will help with the frontend validation since it should require users to type in a protocol like
http:// at the beginning. Here's how to use that url field: http://apidock.com/rails/ActionView/Helpers/FormHelper/url_field
Server side, you could simply require all the urls to have
https:// at the beginning. That's nice and simple, can do that with regex or just plain Ruby. Easy to maintain. The Coderwall link Enrique posted is a great example of an implementation like that (although it looks like it has a couple small issues). It basically uses Ruby to parse the URL and verify the protocol. That's important because simple regex checks can't verify the TLD at the end of the domains easily. There are infinite number of TLDs now, and it's going to be best to rely on the Ruby standard library for this because it will always get updated behind the scenes and you have no external dependencies.
Now all these methods: regex, URI, gems, etc are all going to be approximations. If you truly want to make sure the URL is 100% valid, the best way is just to simply make a request server side anytime the URL changes to request the status code for it.
require 'open-uri' open("https://gorails.com").status => ["200", "OK"]
As long as you get a 200 status, then you know the URL is truly valid. You can't do anything more accurate than that, but it's also going to slow down your server response somewhat to request the page. Worth considering how important it is to have 100% correct because you're trading off server time vs correcting a number of errors on user input. It probably depends on how often you see mistyped URLs. You could also make an AJAX endpoint that does this to improve validations client side when the user is typing so it doesn't have to significantly delay server side validations.
Join 30,005+ developers who get early access to new screencasts, articles, guides, updates, and more.