Match URLs with regex
A regex that captures http and https URLs in free text. Covers query strings, fragments, ports, and unicode-safe alternatives.
# pattern
/https?://[\w./?=&%#:-]+/gi → Open in regex tester (pre-filled)
# how it works
Matches http or https (the `s?` makes the s optional), then `://`, then a run of URL-safe characters. The character class accepts letters, digits, dots, slashes, question marks, equals, ampersands, percent signs, hashes, colons, and hyphens — covering query strings, fragments, and port numbers.
# sample input
See https://bytefork.tools and http://example.com/path?q=1#section or check https://docs.python.org:8080/3/library/re.html for details.
# pitfalls
- Will match a trailing period if the URL ends a sentence. Use a lookbehind or trim after the match.
- Does not validate the URL is well-formed. For that, parse with new URL(match) and catch the throw.
- Unicode IDN domains (xn--...) match via the punycode form but are case-insensitive — keep /i on.