Posted in Uncategorized

[Java] Regular Express for URLs detection in a sentence.

java_logo_100

Regular Express is very great. It helps us a lot in cases of filtering string which we expect. In this section, I want to introduce once case using Regex, that is detecting URL links in a sentence.

To detect valid URLs which start with “www.”, “http” and “https” in a sentence, you can use below Regular Express:

public List<String> extractUrls(String text)
{
List<String> containedUrls = new ArrayList<String>();

// String urlRegex = “((https?|ftp|gopher|telnet|file):((//)|(\\\\))+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*)”;
String urlRegex =
“(”
+ “(”
+ “((https?|ftp|gopher|telnet|file):((//)|(\\\\)))” // URLs starting with http://, https://, or ftp://
+ “|”
+ “(^|[^\\/])(www\\.)” // URLs starting with “www.” (without // before it, or it’d re-link the ones done above).
+ “)”
+ “+[\\w\\d:#@%/;$()~_?\\+-=\\\\\\.&]*”
+ “)”;
Pattern pattern = Pattern.compile(urlRegex, Pattern.CASE_INSENSITIVE);

Matcher urlMatcher = pattern.matcher(text);
while (urlMatcher.find())
{
containedUrls.add(text.substring(urlMatcher.start(0),
urlMatcher.end(0)));
}

return containedUrls;
}

Above Regular Express only detect which word contain URL(s) such as: “<www.google.com<>>>aaa”, etc. So, to remove unexpected other words, you can modify code to do that.

Please refer my_demo to remove unexpected words after filtering using Regular Express

Example:

Hope it help 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s