Quote (SelfTaught @ Dec 6 2014 08:44pm)
uhh, the code killg0re posted is about 20 lines if you exclude the comments and it downloads + scrapes? not seeing how python is a bunch of trouble tbh
Quote (killg0re @ Dec 7 2014 01:40am)
Python has a lot of support and almost everyone has written a scraper. Has awesome libraries too, like Scrapy
okay well if that code works as is sure. but python is not the optimal language.
.net can do this without any added libraries.
and how do you know if that code will even work for his situation
is the data really in a div? what else is in the div? what are you scraping, how does the program know what to scrape?
is the data easier to retrieve by doing ElementById? can it all be retrieved in a simple get request?
you dont know if those 20 lines are going to work
Quote
In terms of my python progress I'm able to open the webpages I want, but parsing elements a lot of my tables / websites are java based which is putting an additional layer of complexity since they don't just appear when you look at HTML on the page. From my understanding I have to actually click on the element to see the relevant HTML (or something along those lines)??
If it is java based, his best best is to use .NET and an http request, most likely the return he is getting is Json which is easily deserialized and can be made into an easy list<object>
Code
[Serializable]
public class Myclasss
[Json Property "json name"]
public string MyData {get;set}
He makes his class structure, and then after he does the get request, which is 5 lines of code at most, he uses Json.DeserializeObject and its done. He has everything he wants in its own property.
No parsing, no looping, nothing.
This post was edited by t9x on Dec 7 2014 09:23am