[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Automate extract domain

On 2019-05-12, Birdep <Birdep at free.net> wrote:
> I am trying to extract domain name from a adblock rule , so what
> pattern should i used to extract domain name only?
> import re
> domains = ['ru', ' fr' ,'eu', 'com'] with open('easylist.txt', 'r') as f:
> 	a=f.read() result=re.findall(r'[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+',a)
> unique_result = list(set(result))
> for r in unique_result: #Extract domain name out of url domain_name = r.split('.')[1] #Check if domain name is in list of domains, only then add it
> if domain_name in domains: print(r)
> this one is labours process for that I have to find extension of all
> domain nd then add it into the domains. So I want something which
> could automate extract domain only

What do you mean by "domain name"? Do you mean just the top level?
In which case you can just do fullname.rsplit(".", 1)[-1]. If you
mean "the registrable domain" (such as example.com, example.co.uk,
etc) then you will need to look at https://publicsuffix.org/