d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > Tracking Youtube Statistics > How To Do It?
12Next
Add Reply New Topic New Poll
Member
Posts: 11,643
Joined: Dec 18 2006
Gold: 340.00
Jun 18 2014 08:58am
Hello, I am relatively new to programming but am starting to get more into it in my free time this summer. One project I will likely be doing is one with my professor. For this project, we will need to be able to track Youtube views, likes, dislikes, etc. for other people's videos.
Can this be done? What is the best language to do this (I assume Python)? Where would be a good place to begin learning how to program something like this?

I am not asking for the code, just to be pointed in the right direction in how to learn how to do it.

Thank you very much!
Member
Posts: 62,215
Joined: Jun 3 2007
Gold: 9,039.20
Member
Posts: 24,488
Joined: Jul 11 2011
Gold: 1,272.50
Jun 18 2014 10:33am
USE YOUTUBE'S API.
Member
Posts: 2,757
Joined: Nov 26 2007
Gold: 1,214.81
Jun 18 2014 11:40am
Here's an example on how to parse the view count from html using java. Or you can just use Youtube API like the others suggested.
Code
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Youtube {
public static void main( String[] args ) {
System.out.println( getViewCount( "LlB0FqQ0DlQ" ) );
}

static int getViewCount( String videoId ) {
String html = getHtml( "http://www.youtube.com/watch?v=" + videoId );
Pattern pattern = Pattern.compile( "watch-view-count\">[^<]*</s" );
Matcher matcher = pattern.matcher( html );
matcher.find( );
String viewCount = matcher.group( 0 ).substring(
matcher.group( 0 ).indexOf( ">" ) + 1, matcher.group( 0 ).indexOf( "<" ) );
return Integer.parseInt( viewCount.replaceAll( ",", "" ) );
}

static String getHtml( String url_ ) {
String html = "";
try {
URL url = new URL( url_ );
URLConnection spoof = url.openConnection( );
spoof.setRequestProperty( "User-Agent",
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
BufferedReader in = new BufferedReader( new InputStreamReader(
spoof.getInputStream( ) ) );
String line = "";
while ( (line = in.readLine( )) != null ) {
html = html + line;
}
}
catch ( Exception e ) {
}
return html;
}
}


This post was edited by labatymo on Jun 18 2014 11:51am
Member
Posts: 11,637
Joined: Feb 2 2004
Gold: 434.84
Jun 18 2014 11:41am
Quote (labatymo @ Jun 18 2014 12:40pm)
Here's an example on how to parse the view count from html using java. Or you can just use Youtube API like the others suggested.
Code
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Youtube {
  public static void main( String[] args ) {
    System.out.println( getViewCount( "yourVideoId" ) );
  }

  static String getViewCount( String videoId ) {
    String Html = GetHtml( "http://www.youtube.com/watch?v=" + videoId );
    Pattern pattern = Pattern.compile( "watch-view-count\">[^<]*</s" );
    Matcher matcher = pattern.matcher( Html );
    matcher.find( );
    String viewCount = matcher.group( 0 ).substring(
        matcher.group( 0 ).indexOf( ">" ) + 1, matcher.group( 0 ).indexOf( "<" ) );
    return viewCount;
  }

  static String GetHtml( String url1 ) {
    String str = "";
    try {
      URL url = new URL( url1 );
      URLConnection spoof = url.openConnection( );
      spoof.setRequestProperty( "User-Agent",
          "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
      BufferedReader in = new BufferedReader( new InputStreamReader(
          spoof.getInputStream( ) ) );
      String strLine = "";
      while ( (strLine = in.readLine( )) != null ) {
        str = str + strLine;
      }
    }
    catch ( Exception e ) {
    }
    return str;
  }
}


Did you write that or did you find that online? Because that code is really terrible. You shouldn't ever parse HTML with regex.
Member
Posts: 2,757
Joined: Nov 26 2007
Gold: 1,214.81
Jun 18 2014 11:45am
Quote (rockonkenshin @ Jun 18 2014 01:41pm)
Did you write that or did you find that online? Because that code is really terrible. You shouldn't ever parse HTML with regex.


Why not? I've been doing it for years lol
Member
Posts: 13,425
Joined: Sep 29 2007
Gold: 0.00
Warn: 20%
Jun 18 2014 11:46am
Quote (rockonkenshin @ Jun 18 2014 01:41pm)
Did you write that or did you find that online? Because that code is really terrible. You shouldn't ever parse HTML with regex.


Does using regex on their feeds api count!

Plugin for my irc bot using cinch as the framework.

Code
require 'open-uri'

class YoutubeChannelParser
include Cinch::Plugin

listen_to :channel

def listen(m)
return unless m.message =~ /youtube.com\/watch\?/ or m.message =~ /youtu.be/
title, author, duration, date, views = parseUrl(URI.extract(m.message, ['http', 'https']).first)

m.reply Format("%s%s%s %s: #{title}, %s: #{author}, %s: #{Time.at(duration.to_i).gmtime.strftime('%R:%S')}, %s: #{date}, %s: #{views}" % [Format(:bold, "["),
Format(:red, "YouTube"),
Format(:bold, "]"),
Format(:bold, "Title"),
Format(:bold, "Author"),
Format(:bold, "Duration"),
Format(:bold, "Date Added"),
Format(:bold, "Views")])
end

def parseUrl(url)
if url.include? '&'
p youtubeID = url[url.index('v=')+2..url.index('&', url.index('v=')+2)-1] if url =~ /youtube.com\/watch\?/
else
youtubeID = url[url.index('v=')+2..-1] if url =~ /youtube.com\/watch\?/
youtubeID = url[url.rindex('/')+1..-1] if url =~ /youtu.be/
end

return unless youtubeID

data = ''
open("http://gdata.youtube.com/feeds/api/videos/#{youtubeID}") { |file| data = file.read }

title = data.match(/<title type='text'>(.+)<\/title>/)[1]
author = data.match(/<author><name>(.+)<\/name>/)[1]
duration = data.match(/<yt:duration seconds='(\d+)'\/>/)[1]
date = data.match(/<published>(.+)<\/published>/)[1]
views = data.match(/viewCount='(.+)'\/>/)[1]

return title, author, duration, date[0..date.index('T')-1], views
end
end
Member
Posts: 13,425
Joined: Sep 29 2007
Gold: 0.00
Warn: 20%
Jun 18 2014 11:48am
Quote (labatymo @ Jun 18 2014 01:45pm)
Why not? I've been doing it for years lol


Html changes to readily and will break your regex half the time. Using a dedicated api you are almost guaranteed that it will continue to work.
Member
Posts: 11,637
Joined: Feb 2 2004
Gold: 434.84
Jun 18 2014 11:48am
Quote (labatymo @ Jun 18 2014 12:45pm)
Why not? I've been doing it for years lol


And you've been doing it poorly for years. Reasons:

1. Regular expressions parse regular languages. HTML is not a regular language (it's a context-free language). This makes most non-trivial HTML incredibly difficult or impossible to parse.
2. HTML parsing is, by nature, very forgiving with regards to badly formed HTML. HTML that may render is every browser on the market may not be "valid", in which case your regexes will fail.
3. It's much simpler to implement with a real HTML parsing library like JSoup.

Always use the right tool for the job.

This post was edited by rockonkenshin on Jun 18 2014 11:50am
Member
Posts: 2,757
Joined: Nov 26 2007
Gold: 1,214.81
Jun 18 2014 11:52am
Quote (rockonkenshin @ Jun 18 2014 01:48pm)
And you've been doing it poorly for years. Reasons:

1. Regular expressions parse regular languages. HTML is not a regular language (it's a context-free language). This makes most non-trivial HTML incredibly difficult or impossible to parse.
2. HTML parsing is, by nature, very forgiving with regards to badly formed HTML. HTML that may render is every browser on the market may not be "valid", in which case your regexes will fail.
3. It's much simpler to implement with a real HTML parsing library like JSoup.

Always use the right tool for the job.


Thanks, I'll give JSoup a try.
Go Back To Programming & Development Topic List
12Next
Add Reply New Topic New Poll