Tracking Youtube Statistics - Topic

Member

Posts: 11,643

Joined: Dec 18 2006

Gold: 340.00

Jun 18 2014 08:58am

Hello, I am relatively new to programming but am starting to get more into it in my free time this summer. One project I will likely be doing is one with my professor. For this project, we will need to be able to track Youtube views, likes, dislikes, etc. for other people's videos.
Can this be done? What is the best language to do this (I assume Python)? Where would be a good place to begin learning how to program something like this?

I am not asking for the code, just to be pointed in the right direction in how to learn how to do it.

Thank you very much!

j0ltk0la

Member

Posts: 62,215

Joined: Jun 3 2007

Gold: 9,039.20

Jun 18 2014 09:04am

https://developers.google.com/youtube/v3/code_samples/python

HighschoolTurd

Member

Posts: 24,488

Joined: Jul 11 2011

Gold: 1,272.50

Jun 18 2014 10:33am

USE YOUTUBE'S API.

labatymo

Member

Posts: 2,757

Joined: Nov 26 2007

Gold: 1,214.81

Jun 18 2014 11:40am

Here's an example on how to parse the view count from html using java. Or you can just use Youtube API like the others suggested.

Code

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Youtube {
public static void main( String[] args ) {
System.out.println( getViewCount( "LlB0FqQ0DlQ" ) );
}

static int getViewCount( String videoId ) {
String html = getHtml( "http://www.youtube.com/watch?v=" + videoId );
Pattern pattern = Pattern.compile( "watch-view-count\">[^<]*</s" );
Matcher matcher = pattern.matcher( html );
matcher.find( );
String viewCount = matcher.group( 0 ).substring(
matcher.group( 0 ).indexOf( ">" ) + 1, matcher.group( 0 ).indexOf( "<" ) );
return Integer.parseInt( viewCount.replaceAll( ",", "" ) );
}

static String getHtml( String url_ ) {
String html = "";
try {
URL url = new URL( url_ );
URLConnection spoof = url.openConnection( );
spoof.setRequestProperty( "User-Agent",
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
BufferedReader in = new BufferedReader( new InputStreamReader(
spoof.getInputStream( ) ) );
String line = "";
while ( (line = in.readLine( )) != null ) {
html = html + line;
}
}
catch ( Exception e ) {
}
return html;
}
}

This post was edited by labatymo on Jun 18 2014 11:51am

rockonkenshin

Member

Posts: 11,637

Joined: Feb 2 2004

Gold: 434.84

Jun 18 2014 11:41am

Quote (labatymo @ Jun 18 2014 12:40pm)

Here's an example on how to parse the view count from html using java. Or you can just use Youtube API like the others suggested.

Code

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Youtube {
public static void main( String[] args ) {
System.out.println( getViewCount( "yourVideoId" ) );
}

static String getViewCount( String videoId ) {
String Html = GetHtml( "http://www.youtube.com/watch?v=" + videoId );
Pattern pattern = Pattern.compile( "watch-view-count\">[^<]*</s" );
Matcher matcher = pattern.matcher( Html );
matcher.find( );
String viewCount = matcher.group( 0 ).substring(
matcher.group( 0 ).indexOf( ">" ) + 1, matcher.group( 0 ).indexOf( "<" ) );
return viewCount;
}

static String GetHtml( String url1 ) {
String str = "";
try {
URL url = new URL( url1 );
URLConnection spoof = url.openConnection( );
spoof.setRequestProperty( "User-Agent",
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
BufferedReader in = new BufferedReader( new InputStreamReader(
spoof.getInputStream( ) ) );
String strLine = "";
while ( (strLine = in.readLine( )) != null ) {
str = str + strLine;
}
}
catch ( Exception e ) {
}
return str;
}
}

Did you write that or did you find that online? Because that code is really terrible. You shouldn't ever parse HTML with regex.

labatymo

Member

Posts: 2,757

Joined: Nov 26 2007

Gold: 1,214.81

Jun 18 2014 11:45am

Quote (rockonkenshin @ Jun 18 2014 01:41pm)

Did you write that or did you find that online? Because that code is really terrible. You shouldn't ever parse HTML with regex.

Why not? I've been doing it for years lol

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

Jun 18 2014 11:46am

Quote (rockonkenshin @ Jun 18 2014 01:41pm)

Did you write that or did you find that online? Because that code is really terrible. You shouldn't ever parse HTML with regex.

Does using regex on their feeds api count!

Plugin for my irc bot using cinch as the framework.

Code

require 'open-uri'

class YoutubeChannelParser
include Cinch::Plugin

listen_to :channel

def listen(m)
return unless m.message =~ /youtube.com\/watch\?/ or m.message =~ /youtu.be/
title, author, duration, date, views = parseUrl(URI.extract(m.message, ['http', 'https']).first)

m.reply Format("%s%s%s %s: #{title}, %s: #{author}, %s: #{Time.at(duration.to_i).gmtime.strftime('%R:%S')}, %s: #{date}, %s: #{views}" % [Format(:bold, "["),
Format(:red, "YouTube"),
Format(:bold, "]"),
Format(:bold, "Title"),
Format(:bold, "Author"),
Format(:bold, "Duration"),
Format(:bold, "Date Added"),
Format(:bold, "Views")])
end

def parseUrl(url)
if url.include? '&'
p youtubeID = url[url.index('v=')+2..url.index('&', url.index('v=')+2)-1] if url =~ /youtube.com\/watch\?/
else
youtubeID = url[url.index('v=')+2..-1] if url =~ /youtube.com\/watch\?/
youtubeID = url[url.rindex('/')+1..-1] if url =~ /youtu.be/
end

return unless youtubeID

data = ''
open("http://gdata.youtube.com/feeds/api/videos/#{youtubeID}") { |file| data = file.read }

title = data.match(/<title type='text'>(.+)<\/title>/)[1]
author = data.match(/<author><name>(.+)<\/name>/)[1]
duration = data.match(/<yt:duration seconds='(\d+)'\/>/)[1]
date = data.match(/<published>(.+)<\/published>/)[1]
views = data.match(/viewCount='(.+)'\/>/)[1]

return title, author, duration, date[0..date.index('T')-1], views
end
end

AbDuCt

Member

Posts: 13,425

Joined: Sep 29 2007

Gold: 0.00

Warn: 20%

Jun 18 2014 11:48am

Quote (labatymo @ Jun 18 2014 01:45pm)

Why not? I've been doing it for years lol

Html changes to readily and will break your regex half the time. Using a dedicated api you are almost guaranteed that it will continue to work.

rockonkenshin

Member

Posts: 11,637

Joined: Feb 2 2004

Gold: 434.84

Jun 18 2014 11:48am

Quote (labatymo @ Jun 18 2014 12:45pm)

Why not? I've been doing it for years lol

And you've been doing it poorly for years. Reasons:

1. Regular expressions parse regular languages. HTML is not a regular language (it's a context-free language). This makes most non-trivial HTML incredibly difficult or impossible to parse.
2. HTML parsing is, by nature, very forgiving with regards to badly formed HTML. HTML that may render is every browser on the market may not be "valid", in which case your regexes will fail.
3. It's much simpler to implement with a real HTML parsing library like JSoup.

Always use the right tool for the job.

This post was edited by rockonkenshin on Jun 18 2014 11:50am

labatymo

Member

Posts: 2,757

Joined: Nov 26 2007

Gold: 1,214.81

#10

Jun 18 2014 11:52am

Quote (rockonkenshin @ Jun 18 2014 01:48pm)

Thanks, I'll give JSoup a try.

Go Back To Programming & Development Topic List