"Phishing" is the term coined by hackers for attempting to lure personal information out of people by pursuading them to visit web sites that look like genuine bank, credit card, or payment sites, when they are actually sophisticated fakes of those sites.
This tries to give a description of exactly how the phishing net works. It is pretty complicated, so this description can't be perfect.
Many of the items listed below handle "obfuscations" (attempts to disguise the real text) of text and URLs. These include swapping letters around, using letters that look very like other letters, using ";" instead of ":", using "," instead of "." and many tricks like that. I have tried to highlight which rules handle obfuscations, but I have not given the details of exactly what the rule will accept. There are many many variations on the expected text that will be detected.
Keep track of all <BASE> tags as they provide a root URL for every relative link on the page. | ||
Attach the <BASE> URL onto the front of all relative URLs contained in every link on the page. | ||
Look for links contained in imagemaps. The imagemap may be inside a link to a safe site, and contain an image of the text of the name of the safe site. But it can have a rectangle defined in it, whose link destination is a fraud site. Reduce these by removing imagemaps so the real destination of the link is used instead of the apparent destination. | ||
| ||
Real destination or apparent destination | Operation | |
---|---|---|
apparent | Convert to lower case. | |
apparent | Remove %a0 encoded characters (hard space). | |
apparent | Decode all %-encoded characters. | |
apparent | Remove all white space. | |
apparent | Change any \ to / as many browsers do this quietly to help Windows authors. | |
apparent | Remove all HTML tags. | |
apparent | Remove the username part of email addresses. | |
apparent | Remove all &-encoded symbols such as < and >. | |
real | Insert the BASE url if the link is relative and the BASE url is defined. | |
real | Convert to lower case. | |
real | Remove %a0 encoded characters (hard space). | |
real | Decode all %-encoded characters. | |
real | Force "safe" result if it does not contain either a . or a /. | |
real | Remove all white space. | |
real | Force "safe" result if it is an email address. | |
real | Remove all HTML tags. | |
real | Remove "blocked::" labels as inserted by some other products. | |
real | Remove any leading http:// or ftp:// or slight variations on those, including replacing the : with a ;. | |
real | Force "safe" result if it is a mailto: link. | |
real | Remove everything after the first / or ?. | |
real | Remove any trailing pr, p or ul tags. | |
real | Force "safe" result is it is a file: link. | |
real | Force "safe" result if it is a link to somewhere else in the same page (internal link). | |
real | Remove any trailing /. | |
real | Force "dangerous" result if URL contains any non-printable-ASCII characters. | |
apparent | Continue searching if any of these are true:
| |
apparent | Remove leading strings that look like http:, ftp: mailto: and other obfuscations of these. | |
apparent | Remove everything after the first /. | |
apparent | Remove all trailing . characters (and obfuscations). | |
apparent | Add www. on the front unless it already starts with www, ftp, mailto or obfuscations of these. | |
real | Force a "dangerous" result if Phishing By Numbers and link is numeric (IPv4 and IPv6). | |
| ||
both | Compare the apparent destination with the real destination, with an optional www on the front. | |
| ||
If they do not match, and the real address is not in the Phishing Safe Sites file, trigger a "dangerous" result. |