d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Programming Help - Fetching Data From Website > Will Pay Fg For ... Idk
Prev123
Add Reply New Topic New Poll
Member
Posts: 32,925
Joined: Jul 23 2006
Gold: 3,804.50
Dec 7 2014 10:20pm
Quote (t9x @ Dec 7 2014 10:14am)
okay well if that code works as is sure. but python is not the optimal language.

.net can do this without any added libraries.

and how do you know if that code will even work for his situation

is the data really in a div? what else is in the div? what are you scraping, how does the program know what to scrape?

is the data easier to retrieve by doing ElementById? can it all be retrieved in a simple get request?

you dont know if those 20 lines are going to work



If it is java based, his best best is to use .NET and an http request, most likely the return he is getting is Json which is easily deserialized and can be made into an easy list<object>

Code
[Serializable]
public class Myclasss
[Json Property "json name"]
public string MyData {get;set}


He makes his class structure, and then after he does the get request, which is 5 lines of code at most, he uses Json.DeserializeObject and its done. He has everything he wants in its own property.

No parsing, no looping, nothing.


you can skip the bloated classes altogether if you use groovy ;)
Member
Posts: 376
Joined: Sep 12 2014
Gold: 1,949.00
Dec 8 2014 06:53am
Here is some example code from my VBA project out of Excel. I am a novice coder in general (trying to get better) so the code isn't as efficient as possible, but it does/did work.

Lets say I had a list of all pitchers from 2004 and was trying to get the "Batted Ball Table" (5th table down on the page) from this website: http://www.fangraphs.com/statss.aspx?playerid=3&position=P#battedball

From that table I would collect all the info from 2004

I have seen now, at least when looking at the inspect element / I see that the table is probably populated with javascript / jquery type of stuff.

I am trying to re-create my program below but take it out of Excel

Quote
Sub BBP_Pitchers()
'
' All 2004 Pitchers Batted Ball Profiles

Sheet4.Activate

For i = 4 To 330

    With ActiveSheet.QueryTables.Add(Connection:= _
        "URL;http://www.fangraphs.com/statss.aspx?playerid=" & Sheet25.Cells(i, 5) & "&position=1B#battedball" _
        , Destination:=Range("$A$1"))
        .Name = "statss.aspx?playerid=3&position=P#battedball"
        .FieldNames = True
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = False
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .WebSelectionType = xlSpecifiedTables
        .WebFormatting = xlWebFormattingNone
        .WebTables = """SeasonStats1_dgSeason3_ctl00"""
        .WebPreFormattedTextToColumns = True
        .WebConsecutiveDelimitersAsOne = True
        .WebSingleBlockTextImport = False
        .WebDisableDateRecognition = False
        .WebDisableRedirections = False
        .Refresh BackgroundQuery:=False
    End With

For j = 1 To 30

If Cells(j, 1) = 2004 Then

LD = Cells(j, 3)
GB = Cells(j, 4)
FB = Cells(j, 5)
IFFB = Cells(j, 6)
HRFB = Cells(j, 7)
HRFB1 = Cells(j, 8)
End If

Next j

Sheet25.Cells(i, 2) = LD
Sheet25.Cells(i, 3) = GB
Sheet25.Cells(i, 4) = FB
Sheet25.Cells(i, 5) = IFFB
Sheet25.Cells(i, 6) = HRFB
Sheet25.Cells(i, 7) = HRFB1

Sheet4.Cells.Clear

Next i
End Sub


This post was edited by yamamotocannon on Dec 8 2014 06:53am
Member
Posts: 15,717
Joined: Aug 20 2007
Gold: 481.00
Dec 8 2014 06:56am
Quote (yamamotocannon @ Dec 8 2014 08:53am)
Here is some example code that did what I wanted it to do in Excel using VBA. I am a novice coder in general (trying to get better) so the code isn't as efficient as possible, but it does/did work.

Lets say I had a list of all pitchers from 2004 and was trying to get the "Batted Ball Table" (5th table down on the page) from this website: http://www.fangraphs.com/statss.aspx?playerid=3&position=P#battedball

From that table I would collect all the info from 2004

I have seen now, at least when looking at the inspect element / I see that the table is probably populated with javascript / jquery type of stuff.

I am trying to re-create my program below but take it out of Excel


That is an odata query in your URL , you can use that with an http request, when I get to my work laptop ill look at the website and see if .net is the choice for you

Also does your excel program work and get what you need?

This post was edited by t9x on Dec 8 2014 06:57am
Member
Posts: 376
Joined: Sep 12 2014
Gold: 1,949.00
Dec 8 2014 07:20am
The excel program does work, just really inefficient. I have experimented with things like screen updating off, but since data is physically copied and pasted at some point into the cells I think it runs slow / can be unwieldly.

Thanks for taking a peek at it

This post was edited by yamamotocannon on Dec 8 2014 07:20am
Member
Posts: 15,717
Joined: Aug 20 2007
Gold: 481.00
Dec 8 2014 08:25am
They aren't hiding anything in java, I don't know much about baseball so I don't know what order the columns come in but it seems like they are doing this:

the "dashboard" is 3 different tables.

Season Team W L SV G GS IP // This is one table
K/9 BB/9 HR/9 BABIP LOB% GB% HR/FB // This is another table
ERA FIP xFIP WAR // This is the 3rd table

Those 3 tables make up "dashboard", they can be viewed in the page source and basically retrieved any way you want, except by element ID, because they dont have an ID, its just printed as text.


Its kind of hard to see how they are feeding it, because when you load the page, there are 500+ things being loaded into your browser, they use alot of different ad services and information trackers.

But if you have google chrome or some form of developer console in your browser, you need to open it, look at the Network tab, and try to find the GET request that returns the information that fills the tables, it will only have the information in the tables and they should have an ID for each row value

I would find it for you, but I am now at work and there is literally 500+ things being loaded into the browser from that page, this is probably why your app is so slow, the generic request for the page has too much with it, things that see what you click to generate ads.

One of the requests being sent will be the one you are looking for, this will speed up your app a lot.

edit: when looking for the correct request, take the URL given and put it in your address bar, it should give you a source that only contains the info in the tables, there could be a different request for each table as this website is a bit sloppy

This post was edited by t9x on Dec 8 2014 08:26am
Member
Posts: 376
Joined: Sep 12 2014
Gold: 1,949.00
Dec 8 2014 08:52am
So the data I am looking for will always appear in some format in one of those network pages? I actually looked through several hundred of them last night and was worried it would be in JavaScript and unintelligible to me.

So for instance I could skip looking at the css / .js files because they will never have the table data reside within them?

Thanks for your time

Edit:

Also I would expect to see the table info in the response body on the network tab?

This post was edited by yamamotocannon on Dec 8 2014 08:53am
Member
Posts: 15,717
Joined: Aug 20 2007
Gold: 481.00
Dec 8 2014 09:00am
Quote (yamamotocannon @ Dec 8 2014 10:52am)
So the data I am looking for will always appear in some format in one of those network pages? I actually looked through several hundred of them last night and was worried it would be in JavaScript and unintelligible to me.

So for instance I could skip looking at the css / .js files because they will never have the table data reside within them?

Thanks for your time

Edit:

Also I would expect to see the table info in the response body on the network tab?


yes you will see the table it will be like

table1
{
id1: value
id2: value
id3: value
}

Go Back To Programming & Development Topic List
Prev123
Add Reply New Topic New Poll