On Sat, Jan 29, 2005 at 01:57:23PM -0800,
stan wrote
a message of 17 lines which said:

> My challenge is to determine the base portion of the URI--stripped
> of subdomains but including top-level domains. E.g., for
> "http://www.google.com" I need to get "google.com", and for
> "subdomain.domain.com.au", I need to get "domain.com.au".


Your examples do not match your requirment. The top-level domain for
www.google.com is "com" and for subdomain.domain.com.au, it is "au".

> My current naive system just takes the last two chunks, which means
> it thinks all web pages from austrailia are the same site. (They'll
> all from "com.au"!)


There is no better algorithm, not even hardwiring the number of labels
in a registry-indexed table (because some registries like "fr", "dz"
or "af" delegate both second-level and third-level domains).

> What's the intelligent way to do this?


None. Funny question because there have been a thread on namedroppers
(the IETF Working group on DNS extensions) recently about this very
subject (in the context of the SPF protocol):

http://ops.ietf.org/lists/namedroppe.../msg00039.html