Quote (CyberGod @ Oct 17 2017 02:43pm)
Yes regex
soup.find_all(href=re.compile("http(?:s)://(.*)/GOODSTUFF"))
this is what someone helped me with I'm terrible with regex.
But it works perfectly aslong as I don't go through a socks5 prozy for some reason it has problems with https but works for http lucky for this case I didn't need a proxy
Interesting. Where ya pulling this html from anyways? You could try putting in a header of some sort. A last resort, if you wer trying to steal the declaration of independence, you could verify=false, but this exposes you to some very sketchy stuff, where I highly recommend you don't resort to.
However, for a simple label like 'goodstuff' where nothing changes and 'goodstuff' is always what you'l be searching for... then regex is most def overkill.
try something of the following:
Code
for link in page.find_all('a'): #this iterates through all <a /a> elements
if 'GOODSTUFF' in str(link.get('href'): #checks for 'GOODSTUFF' in current link iteration
print(link) #and of course instead of print you can do anythign else you'd like. By the way, link.get('href') may be more handy for this line.
hope this helps.
edit: you already solved, sry for pointless solution XD
This post was edited by destroyered on Oct 19 2017 02:38pm