Quote (carteblanche @ Dec 3 2014 05:47pm)
i'm guessing you have some kind of financial incentive to do it, so my suggestion is hire someone to do it for you. they will complete it faster and provide a more robust solution than you'd make on your own.
with that said, it depends on how complex your data is. if it's as simple as going to a URL and finding an html table whose structure is always the same, then any language that can send an HTTP request and receive html will be fine. VB.NET can do it for sure, and I'm guessing VBA can do it. if it's more complex than that, you'd have to be more specific. eg: from that main url, are you going to search for 20 more URLs and each of those go 20 URLs deep, and you have to dynamically try to figure out what tables have data that you care about? is the data inside an html table or is it inside a flash object / image / popup? etc
Haha
My main goal is to
learn how to do this myself, just as painless as possible!!
I'll try to write it out in more detail
For ELEMENT A
1) Go to a website 1 from a pre-determined list
2) Select the correct table out of many options
3) Select the correct row (most recent lets say -- the table might only be 5 rows or it could be 15, I just want the most recent (usually last or first row))
4) Take multiple values from the row based on column heading
5) Repeat process for other relevant tables, but still at the same URL
6) Populate database
Repeat process for website 2,3,4,5,etc for ELEMENT A still. I would expect the row in the database to look something like
Element A [key data from website 1, more data from website 1, data from website 2, no data from website 3 - maybe a placeholder or 0, data from website 4]
Element B [same story]
...
...
...
Element AAAABZ [same story]
It doesn't have to go any deeper than the initial URL since the predetermined lists are solid (I hope)
Ultimately this would create a dataset I would work with in R or SAS
I have done a lot of this work in VBA -- it works OK, though I cheated and used the Excel Macro Writer to retrieve the tables by manually selecting them, and then writing a subsequent VBA macro to locate the correct rows.
My motivation for doing this project are 2 fold.
First, I really want to do this project.
Second, I think that at my current job/career learning how to be able to do this would really help me. I'm not trying to be a pure programmer perse... but someone that works with data that knows how to create my own complex datasets. Getting the right data is half the battle.
Further complication is that I want the most up to date data so I would be running this macro like... daily

Definitely would appreciate ANY help/comments/advice
This post was edited by yamamotocannon on Dec 3 2014 05:18pm