Page 15 of 15

Re: Autodownload script

Posted: March 5th, 2010, 11:50 am
by J03 8LACK
this is how i faked IMDB Changes in green

#Make my custom user agent urlopener paste under imports
class MyOpener(urllib.FancyURLopener):
      version = ' find a ligit browers agent version or make your custom agent here '
myopener = MyOpener()



#replace old one with this sock = myopener.open(imdb_link)

if movie_desc:
                               
                        try:
                                movie_link = re.search(".*",movie_desc.group())
                                imdb_link=movie_link.group()
                                sock = myopener.open(imdb_link)
                                imdb_movie_page = sock.read()
                                #print(imdb_movie_page)
                                sock.close()
hope this helps someone

Re: Autodownload script

Posted: March 8th, 2010, 9:46 am
by fzeroe
Thanks
my script works fine now :)

Re: Autodownload script

Posted: March 8th, 2010, 3:20 pm
by Cuchulainn
Thanks for the fix!

Re: Autodownload script

Posted: May 12th, 2010, 10:12 am
by mike171562
There is a python module IMDB.py that makes grabing the IMDB ratings easier than using the regex, also makes it easier for the user to add a variable rather than editing the regex

http://imdbpy.sourceforge.net/

Im using it something like this:

from imdb import IMDb
ia = IMDb()

                i = IMDB({})
                imdb_code, name, year = i._search(htmlSource)
                imdb_rating=re.compile("?/10").search(htmlSource)

                print imdb_link
                imdb = ia.get_movie(imdb_code)
                try:
                        imdb_score =  imdb['rating']
                except:
                        imdb_score = 0
                print 'IMDB Rating is %s' % imdb_score

Re: Autodownload script

Posted: May 12th, 2010, 3:43 pm
by binhex
mike171562 wrote: There is a python module IMDB.py that makes grabing the IMDB ratings easier than using the regex, also makes it easier for the user to add a variable rather than editing the regex

http://imdbpy.sourceforge.net/

Im using it something like this:

from imdb import IMDb
ia = IMDb()

                i = IMDB({})
                imdb_code, name, year = i._search(htmlSource)
                imdb_rating=re.compile("?/10").search(htmlSource)

                print imdb_link
                imdb = ia.get_movie(imdb_code)
                try:
                        imdb_score =  imdb['rating']
                except:
                        imdb_score = 0
                print 'IMDB Rating is %s' % imdb_score
i am using regex to search the imdb page for the rating, once found it does a simple floating point calculation and compares the user defined rating vs the imdb rating, simple and requres the user to not enter a single regular expression :-), an example below:-

Code: Select all

                def isGoodRatings(imdb_html):
                        
                        #this is set to download movies with minimum defined movie rating
                        movie_ratings = re.compile("<b>[0-9]\.[0-9]/10</b>").search(imdb_html)
                        
                        if movie_ratings is not None:      

                                movie_ratings = movie_ratings.group()
                                movie_ratings = re.compile("\d.\d").search(movie_ratings)
                                movie_ratings = movie_ratings.group()
                                movie_ratings = float(movie_ratings)

                                GoodGeneralRatingsfloat = float(GoodGeneralRatings)

                                if movie_ratings >= GoodGeneralRatingsfloat:

                                        return 1
                                
                                else:
                                        
                                        return 0

Re: Autodownload script

Posted: June 24th, 2010, 9:37 pm
by annaj
Hey everyone,
I stumbled on this thread and really like the idea of being able to auto download movies. I've tried using the most recent script that I can find here (from pilGrim), but I'm running into errors.
I know nothing about programming, so I'm hoping I can find some help here.
Here is the output that I am getting:

Code: Select all

218 existing movies
watching for 0 movies

2010-06-24 22:22:15: xxxxxFILENAME1xxxx
No IMDB link - Skipping

2010-06-24 22:22:15: xxxxFILENAME2xxxxx
Traceback (most recent call last):
  File "/home/htpc/movie_script/movie_script_RT.py", line 158, in <module>
    optimized_movie_title=optimized_movie_title.group()
AttributeError: 'NoneType' object has no attribute 'group'
Here is the exact code that I used after making my modifications:

Code: Select all

import urllib

import xml.dom.minidom

import re

import time

import os

import sys



##########

# MOVIES #

##########



#This script will scan a NZBS.org movie feed and test for movie score, year and if you currently have the movie, if the movie is in the process of a dl

#If conditions are true the script will feed the movie to SABnzbd for download.

#Currently set to download IMDB Score of > 75 with > 1,000 votes or Rotton Tomato Score > 75, year is set to > 1995

#You can change these settings in line 109, 185 & 195



#What you need to run: SABnzbd & Python 2.6.  Once installed you can set the script to run on a schedule or just double click.

#A good debug python tool like ActiveState Komodo is helpful



#The script works with NZBS.org news feeds.  Create a NZBS.org feed and then use feedburner to process your feed to ensure all works properly.

#You can find scripts that work with newzbin.com here: http://forums.sabnzbd.org/index.php?topic=1335.0

#The most complete TV Script can be found here: http://forums.sabnzbd.org/index.php?topic=1786.0



#This script is a blend of features found in preveously posted scripts.  Many thanks to all original posters who conrtibuted parts to the feature set:

#popcornarsonist, xxhds, binhex, doubledrat, Cuchulainn

#Maintained by pilGrim

#Version 1.1





################################### change to your own values #####################################



# set this to the location of your downloads.  the first entry must be your live download area

movie_dirs= ["/media/PRORAID_/Movies"]



# a directory that will hold string to match against movies that you want no matter what score they get

watch_dir="/media/PRORAID_/Watched"

# uncomment the following if you don't want to search for a particular set of movies

#watch_dir=""



# your SABNZB address

my_SAB="http://localhost:8080"



# the RSS link to your search

url="http://nzbs.org/rss.php?catid=4&i=xxxxx&h=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&dl=1&num=100"







################################### change to your own values #####################################





# Search IMDB for movie, bases on the script by Switch

class IMDB:

    def __init__(self, config):

        # Credit for these expressions go to XBMC.org

        self.re = re.compile('><a href="/title/([t0-9]*)/[^>]*>([^<]*)</a> *\(([0-9]*)')

        self.re_alt1 = re.compile('<title>([^\(]*)\((\d{4})')

        self.re_alt2 = re.compile('href="/title/tt([0-9]*)/fullcredits"')

        

        self.url = 'http://akas.imdb.com/find?s=ttl&q=%s'

        

        self.config = config

    

    def _search(self, data):

        ''' Executes imdb search and returns first result, if present, in the form of ttID, name, year '''

        match = self._match_search_page(data)

        if not match:

            match = self._match_redirect(data)

        return match

        

    def _match_search_page(self, data):

        matches = re.findall(self.re, data)

        if matches:

            return matches[0]

        return matches

    

    def _match_redirect(self, data):

        m1 = re.search(self.re_alt1, data)

        m2 = re.search(self.re_alt2, data)

        if m1 and m2:

            return (m2.group(1), m1.group(1), m1.group(2))

        else:

            return '','',''



	



movie_list = []

for dir in movie_dirs:

	for movie in os.listdir(dir):

		movie_list.append(movie)

print str(len(movie_list))+" existing movies"



watch_list = []

if len(watch_dir)>0:

	watch_list = os.listdir(watch_dir)

	print "watching for "+str(len(watch_list))+" movies"



def removeIllegalChars(origstring):

	if sys.platform == "win32":

		newstring=re.sub("[\\?,]", ".", origstring)

		newstring=re.sub("[:]", ";", newstring)

	else:

		newstring=re.sub("[/?,]", ".", origstring)

#	print origstring+" becomes "+newstring

	return newstring







def isGoodMovie(imdb_html):

        #This is set up to only get movies above an 8.0, with over 1000 votes

        if re.compile("<b>[[7]\.[6-9]|[8-9]\.[0-9]]/10</b>").search(imdb_html) and re.compile(">[0-9]*,[0-9][0-9][0-9] votes<").search(imdb_html):

                return 1

        else:

                return 0





def rottenLink2(name):

        tomato_link="http://www.rottentomatoes.com/search/full_search.php?search="+name

        sock = urllib.urlopen(tomato_link)

        htmlSource = sock.read()

        sock.close()

        rotten_link2 = re.compile("href=\"\/m\/.+?\>").search(htmlSource)

        if rotten_link2==None:

                return "error"

        else:

                rotten_link2= re.sub("href=\"","",rotten_link2.group())

                rotten_link2= re.sub("\>","",rotten_link2)

                rotten_link2="http://www.rottentomatoes.com"+rotten_link2

                return rotten_link2



dom=xml.dom.minidom.parse(urllib.urlopen(url))

# go through each entry returned from the above newzbin search

for node in dom.getElementsByTagName("item"):

	

	report_title=node.getElementsByTagName("title")[0].childNodes[0].data

	report_title = re.sub("\;","\:",report_title)

	report_title = re.sub("\s","\:",report_title)

	legal_movie_title = removeIllegalChars(report_title)

	print ""

	print time.strftime("%Y-%m-%d %H:%M:%S")+": "+ report_title





	report_description=node.getElementsByTagName("description")[0].childNodes[0].data

	movie_desc = re.search("IMDB:.+\/\"\>", report_description)



	if movie_desc:

		movie_link = re.search("http.+\/",movie_desc.group())

		imdb_link=movie_link.group()

		# print(imdb_link)

		sock = urllib.urlopen(imdb_link)

		htmlSource = sock.read()                     

		sock.close()

		# get the movie code from the imdb_link

		i = IMDB({})

		imdb_code, name, year = i._search(htmlSource)

		imdb_rating=re.compile("<b>?/10</b>").search(htmlSource)

		

		optimized_movie_title=re.compile(".+[^\s]").search(name)

		optimized_movie_title=optimized_movie_title.group()

		optimized_movie_title=re.sub("\s"," ",optimized_movie_title)

		optimized_movie_title=re.sub(":","",optimized_movie_title)

		optimized_movie_title=re.sub(",","",optimized_movie_title)



		

		rotten_link="http://www.rottentomatoes.com/alias/?type=imdbid&s="+imdb_code

		rotten_link2=rottenLink2(name)

		sock=urllib.urlopen(rotten_link)

		rotten_source=sock.read()

		sock.close()

		rotten_rating=re.compile("<span.+?percent.+?>.+?<").search(rotten_source)



				

		if (rotten_rating==None):

			if (rotten_link2<>"error"):

				sock=urllib.urlopen(rotten_link2)

				rotten_source=sock.read()

				sock.close()

				rotten_rating=re.compile("<span.+?percent.+?>.+?<").search(rotten_source)

		if (rotten_rating==None):

			rotten_rating="00"

		else:

			rotten_rating=re.sub(".+percent.+?\>","",rotten_rating.group())

			rotten_rating=re.sub("\<","",rotten_rating)

			print "Tomatometer: " + rotten_rating

		print "IMDB rating OVER 75: ", isGoodMovie(htmlSource) == 1



		if ((isGoodMovie(htmlSource) == 1) or (int(rotten_rating) > 75) ==1):

			print "Good Rating? YES"		

			download = 1

		else:

			print "Good Rating? NO, Skipping"			

			download = 0





		#Year check.  Only download if newer than year set

		if download == 1:

			if int(year) > 1995:

				

				download = 1

				print "Newer than 1995? YES"

			

			else:

				download = 0

				print "Newer than 1995? NO, Skipping"

		

		# doesn't get a good imdb score, but am I watching for it?

		if download == 0:

			for watch_file in watch_list:

				regex_watch = re.compile(".*"+watch_file+".*", re.IGNORECASE)

				if re.match(regex_watch, optimized_movie_title):

					print optimized_movie_title+" has a low score, but is one your watch list - you're the boss!"

					download = 1

					break



	

		if download == 1:

			# it's worth a download but do we have it already?

			regex_movie_title_string = removeIllegalChars(optimized_movie_title)

			regex_movie_title = re.compile(regex_movie_title_string)

			#regex_movie_year = re.compile(".*\([0-9][0-9][0-9][0-9]\).*")

			print "Do we have : \"" + regex_movie_title_string+"\"",

			for movie_file in movie_list:

				if re.match(regex_movie_title, movie_file):

					print " - YES have it, Skipping"

					download = 0

					break



		

			if download == 1:

				print " - NO Download it"



				#we want it, we don't have it, but is it in the queue already?

				queue=xml.dom.minidom.parse(urllib.urlopen(my_SAB+"/sabnzbd/api?mode=qstatus&output=xml"))

				for job in queue.getElementsByTagName("job"):

					filename=job.getElementsByTagName("filename")[0].childNodes[0].data

					if filename == report_title:

						print "Already in queue - Skipping"

						download = 0



			if download == 1:

				print time.strftime("%Y-%m-%d %H:%M:%S")+": ++++++ added "+ optimized_movie_title+" ++++++"

				movie_list.append(legal_movie_title)

				msgid=node.getElementsByTagName("link")[0].childNodes[0].data

				msgid = urllib.quote(msgid)

				urllib.urlopen(my_SAB+"/sabnzbd/api?mode=addurl&name="+msgid+"&cat=movies")

				

	else:

		print "No IMDB link - Skipping"



dom.unlink()
Any help would be appreciated.
Thanks!!!

Re: Autodownload script

Posted: June 25th, 2010, 3:22 am
by binhex
annaj wrote: Hey everyone,
I stumbled on this thread and really like the idea of being able to auto download movies. I've tried using the most recent script that I can find here (from pilGrim), but I'm running into errors.
I know nothing about programming, so I'm hoping I can find some help here.
Here is the output that I am getting:

Code: Select all

218 existing movies
watching for 0 movies

2010-06-24 22:22:15: xxxxxFILENAME1xxxx
No IMDB link - Skipping

2010-06-24 22:22:15: xxxxFILENAME2xxxxx
Traceback (most recent call last):
  File "/home/htpc/movie_script/movie_script_RT.py", line 158, in <module>
    optimized_movie_title=optimized_movie_title.group()
AttributeError: 'NoneType' object has no attribute 'group'
Here is the exact code that I used after making my modifications:

Code: Select all

import urllib

import xml.dom.minidom

import re

import time

import os

import sys



##########

# MOVIES #

##########



#This script will scan a NZBS.org movie feed and test for movie score, year and if you currently have the movie, if the movie is in the process of a dl

#If conditions are true the script will feed the movie to SABnzbd for download.

#Currently set to download IMDB Score of > 75 with > 1,000 votes or Rotton Tomato Score > 75, year is set to > 1995

#You can change these settings in line 109, 185 & 195



#What you need to run: SABnzbd & Python 2.6.  Once installed you can set the script to run on a schedule or just double click.

#A good debug python tool like ActiveState Komodo is helpful



#The script works with NZBS.org news feeds.  Create a NZBS.org feed and then use feedburner to process your feed to ensure all works properly.

#You can find scripts that work with newzbin.com here: http://forums.sabnzbd.org/index.php?topic=1335.0

#The most complete TV Script can be found here: http://forums.sabnzbd.org/index.php?topic=1786.0



#This script is a blend of features found in preveously posted scripts.  Many thanks to all original posters who conrtibuted parts to the feature set:

#popcornarsonist, xxhds, binhex, doubledrat, Cuchulainn

#Maintained by pilGrim

#Version 1.1





################################### change to your own values #####################################



# set this to the location of your downloads.  the first entry must be your live download area

movie_dirs= ["/media/PRORAID_/Movies"]



# a directory that will hold string to match against movies that you want no matter what score they get

watch_dir="/media/PRORAID_/Watched"

# uncomment the following if you don't want to search for a particular set of movies

#watch_dir=""



# your SABNZB address

my_SAB="http://localhost:8080"



# the RSS link to your search

url="http://nzbs.org/rss.php?catid=4&i=xxxxx&h=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx&dl=1&num=100"







################################### change to your own values #####################################





# Search IMDB for movie, bases on the script by Switch

class IMDB:

    def __init__(self, config):

        # Credit for these expressions go to XBMC.org

        self.re = re.compile('><a href="/title/([t0-9]*)/[^>]*>([^<]*)</a> *\(([0-9]*)')

        self.re_alt1 = re.compile('<title>([^\(]*)\((\d{4})')

        self.re_alt2 = re.compile('href="/title/tt([0-9]*)/fullcredits"')

        

        self.url = 'http://akas.imdb.com/find?s=ttl&q=%s'

        

        self.config = config

    

    def _search(self, data):

        ''' Executes imdb search and returns first result, if present, in the form of ttID, name, year '''

        match = self._match_search_page(data)

        if not match:

            match = self._match_redirect(data)

        return match

        

    def _match_search_page(self, data):

        matches = re.findall(self.re, data)

        if matches:

            return matches[0]

        return matches

    

    def _match_redirect(self, data):

        m1 = re.search(self.re_alt1, data)

        m2 = re.search(self.re_alt2, data)

        if m1 and m2:

            return (m2.group(1), m1.group(1), m1.group(2))

        else:

            return '','',''



	



movie_list = []

for dir in movie_dirs:

	for movie in os.listdir(dir):

		movie_list.append(movie)

print str(len(movie_list))+" existing movies"



watch_list = []

if len(watch_dir)>0:

	watch_list = os.listdir(watch_dir)

	print "watching for "+str(len(watch_list))+" movies"



def removeIllegalChars(origstring):

	if sys.platform == "win32":

		newstring=re.sub("[\\?,]", ".", origstring)

		newstring=re.sub("[:]", ";", newstring)

	else:

		newstring=re.sub("[/?,]", ".", origstring)

#	print origstring+" becomes "+newstring

	return newstring







def isGoodMovie(imdb_html):

        #This is set up to only get movies above an 8.0, with over 1000 votes

        if re.compile("<b>[[7]\.[6-9]|[8-9]\.[0-9]]/10</b>").search(imdb_html) and re.compile(">[0-9]*,[0-9][0-9][0-9] votes<").search(imdb_html):

                return 1

        else:

                return 0





def rottenLink2(name):

        tomato_link="http://www.rottentomatoes.com/search/full_search.php?search="+name

        sock = urllib.urlopen(tomato_link)

        htmlSource = sock.read()

        sock.close()

        rotten_link2 = re.compile("href="\/m\/.+?\>").search(htmlSource)

        if rotten_link2==None:

                return "error"

        else:

                rotten_link2= re.sub("href="","",rotten_link2.group())

                rotten_link2= re.sub("\>","",rotten_link2)

                rotten_link2="http://www.rottentomatoes.com"+rotten_link2

                return rotten_link2



dom=xml.dom.minidom.parse(urllib.urlopen(url))

# go through each entry returned from the above newzbin search

for node in dom.getElementsByTagName("item"):

	

	report_title=node.getElementsByTagName("title")[0].childNodes[0].data

	report_title = re.sub("\;","\:",report_title)

	report_title = re.sub("\s","\:",report_title)

	legal_movie_title = removeIllegalChars(report_title)

	print ""

	print time.strftime("%Y-%m-%d %H:%M:%S")+": "+ report_title





	report_description=node.getElementsByTagName("description")[0].childNodes[0].data

	movie_desc = re.search("IMDB:.+\/"\>", report_description)



	if movie_desc:

		movie_link = re.search("http.+\/",movie_desc.group())

		imdb_link=movie_link.group()

		# print(imdb_link)

		sock = urllib.urlopen(imdb_link)

		htmlSource = sock.read()                     

		sock.close()

		# get the movie code from the imdb_link

		i = IMDB({})

		imdb_code, name, year = i._search(htmlSource)

		imdb_rating=re.compile("<b>?/10</b>").search(htmlSource)

		

		optimized_movie_title=re.compile(".+[^\s]").search(name)

		optimized_movie_title=optimized_movie_title.group()

		optimized_movie_title=re.sub("\s"," ",optimized_movie_title)

		optimized_movie_title=re.sub(":","",optimized_movie_title)

		optimized_movie_title=re.sub(",","",optimized_movie_title)



		

		rotten_link="http://www.rottentomatoes.com/alias/?type=imdbid&s="+imdb_code

		rotten_link2=rottenLink2(name)

		sock=urllib.urlopen(rotten_link)

		rotten_source=sock.read()

		sock.close()

		rotten_rating=re.compile("<span.+?percent.+?>.+?<").search(rotten_source)



				

		if (rotten_rating==None):

			if (rotten_link2<>"error"):

				sock=urllib.urlopen(rotten_link2)

				rotten_source=sock.read()

				sock.close()

				rotten_rating=re.compile("<span.+?percent.+?>.+?<").search(rotten_source)

		if (rotten_rating==None):

			rotten_rating="00"

		else:

			rotten_rating=re.sub(".+percent.+?\>","",rotten_rating.group())

			rotten_rating=re.sub("\<","",rotten_rating)

			print "Tomatometer: " + rotten_rating

		print "IMDB rating OVER 75: ", isGoodMovie(htmlSource) == 1



		if ((isGoodMovie(htmlSource) == 1) or (int(rotten_rating) > 75) ==1):

			print "Good Rating? YES"		

			download = 1

		else:

			print "Good Rating? NO, Skipping"			

			download = 0





		#Year check.  Only download if newer than year set

		if download == 1:

			if int(year) > 1995:

				

				download = 1

				print "Newer than 1995? YES"

			

			else:

				download = 0

				print "Newer than 1995? NO, Skipping"

		

		# doesn't get a good imdb score, but am I watching for it?

		if download == 0:

			for watch_file in watch_list:

				regex_watch = re.compile(".*"+watch_file+".*", re.IGNORECASE)

				if re.match(regex_watch, optimized_movie_title):

					print optimized_movie_title+" has a low score, but is one your watch list - you're the boss!"

					download = 1

					break



	

		if download == 1:

			# it's worth a download but do we have it already?

			regex_movie_title_string = removeIllegalChars(optimized_movie_title)

			regex_movie_title = re.compile(regex_movie_title_string)

			#regex_movie_year = re.compile(".*\([0-9][0-9][0-9][0-9]\).*")

			print "Do we have : "" + regex_movie_title_string+""",

			for movie_file in movie_list:

				if re.match(regex_movie_title, movie_file):

					print " - YES have it, Skipping"

					download = 0

					break



		

			if download == 1:

				print " - NO Download it"



				#we want it, we don't have it, but is it in the queue already?

				queue=xml.dom.minidom.parse(urllib.urlopen(my_SAB+"/sabnzbd/api?mode=qstatus&output=xml"))

				for job in queue.getElementsByTagName("job"):

					filename=job.getElementsByTagName("filename")[0].childNodes[0].data

					if filename == report_title:

						print "Already in queue - Skipping"

						download = 0



			if download == 1:

				print time.strftime("%Y-%m-%d %H:%M:%S")+": ++++++ added "+ optimized_movie_title+" ++++++"

				movie_list.append(legal_movie_title)

				msgid=node.getElementsByTagName("link")[0].childNodes[0].data

				msgid = urllib.quote(msgid)

				urllib.urlopen(my_SAB+"/sabnzbd/api?mode=addurl&name="+msgid+"&cat=movies")

				

	else:

		print "No IMDB link - Skipping"



dom.unlink()
Any help would be appreciated.
Thanks!!!
check out my post, its a automated movie downloader with quite advanced features.

http://forums.sabnzbd.org/index.php?topic=4145.0

binhex.

Re: Autodownload script

Posted: June 25th, 2010, 4:18 am
by annaj
Thanks binhex for the suggestion. I had seen that thread, but since your script is written for windows and I don't know enough to port it, it's not much use to me. I saw that there were others interested in using in on Linux, but I don't see anyone who has done it yet.

edit: I re-read it and saw that mike seems to have been successful, so I have posted on that thread asking for his help.

Thanks!!