I opened a ticket with imgur to see if I can obtain their comment database for public use.
They said they would add it to their feature list, but no saying how long it would take to get around to it, if ever.
So now I am slowly writing a spider for their API to pull comments off images. For example here is the partial output from a random image:
Code
>But what does this offer to a casual computer guy, whos only skill is browsing imgur and porn?
>free shit!!
>This guy gets it.
>Does he get free 'shit' ?
>He gets the 'shit' without even knowing it,
>????????????
>Wtf tried to do emojis xD
>*slowly slides back down to the aquarium*
>As a British citizen, I have a legal obligation to download this without the need for it
>Adding huge black bars on top and bottom of pictures.
>You monster
>Not the gum drop buttons!
>Makes porn look better without increasing cost, one would think
>Dank meme making
>"you rang?" -thousands of us.
>-dozens of us
>Photoshopping yourself into the porn and posting to imgur
>You can use up some of that valuable disk space before it expires!
>You can put Hillary's face on anything you like.
>Noise removal on amateur pictures.
>. Same
>Photo editing, for example making color photos look great in monochrome
>well i mean you can already reach a decent chunk of internttable shit from google.. google. thats what they offer.
>Photoshop ur crush in a porno
>Good question.
As you can see I have managed to abstract the comments into a tree like format. The way training would be done is that the Nth level will be the input and the Nth+1 level will be the response. For example:
Code
>"you rang?" -thousands of us.
>-dozens of us
The first comment is from the 2nd level, and the latter the 3rd level. The prior being the input that will trigger the later output.
This will all be fed into my keywords learning brain once I can scrape sufficient amount of data.
Heres the source so far of the crawler. Right now it it simply fetches and parses a hard coded ID from their API:
Code
require "http/client"
require "json"
def printComments(hash, level)
hash.each do |child|
level.times { print " " }
print ">"
puts child["comment"]
if !child["children"].as_a.empty?
printComments(child["children"], level + 1)
end
end
puts
end
headers = HTTP::Headers{"Authorization" => "Client-ID 123456"}
response = HTTP::Client.get("https://api.imgur.com/3/gallery/hRV78Jr/comments", headers)
parsedJson = JSON.parse response.body
printComments(parsedJson["data"], 0)
What needs to happen now is to format this data into a SQL database which I can populate, then learn from. It will likely be a simple schema of "id Integer Autoincrement, input TEXT, response TEXT" one to many mapping and go row by row to learn the database. This means the recursive function will likely evolve into something like "buildDatabase(hash, parentComment)" so that I can still transverse the hash but also have access to the parents comment so that I can pair it with the childs comment.
This post was edited by AbDuCt on Apr 6 2016 07:31pm