PreProcessing script - parses nzb names with regex to assign categories
Posted: June 28th, 2023, 12:17 am
Figured I'd post up a python script which has helped resolve a nagging issue for me. Hopefully it will be of use to some others out there.
My NZB search provider (just like binsearch) often puts out .nzb files which lack the proper xml (category) tagging to allow fully-automated category sorting in SAB. On occasion they get tagged properly, my SAB categories kicks in and everything is great. However they sometimes don't get sorted properly and I'm left with a folder full of mixed files - movies, tv shows, ebooks, audio files, etc.
This script accomplishes the following:
* Checks if a SAB category is already assigned before continuing. If already assigned, the download continues without any changes.
* Uses (case-insensitive) regular expressions in a specific order (specific search terms > TV > Movies > Audio) to determine category sorting.
* Individual words are evaluated within the nzb name for pattern matches (ie - the "epub" search won't match with "republic")
* TV shows are evaluated for the (s01e08, S2E14) or (season3, Season02) patterns.
* Movies are evaluated next based on resolution (720p, 1080P, 2160p). If not matching a TV show, it's assumed to be a Movie at this point.
* If all of the pattern matching fails, the download continues without any changes.
This script might not be needed if your categories already work flawlessly and/or you only use Radarr or Sonarr to add content. If you ever search manually, or have to contend with unsorted downloads, this will most likely help with your tv/movie sorting.
This example script below is very tailored to my particular needs, but the pattern matching is not overly "fuzzy" and the tv/movie sorting portion should be very universal. Chances are very good that you can run this script as-is to realize some benefits.
Enjoy!
UPDATE - Here's a simplified version which only detects/sorts the TV and Movie categories.
The regex for TV shows has also been updated to include the somewhat rare 3x01 format.
Refer to the original script if you need to sort additional categories besides just TV and movies.
My NZB search provider (just like binsearch) often puts out .nzb files which lack the proper xml (category) tagging to allow fully-automated category sorting in SAB. On occasion they get tagged properly, my SAB categories kicks in and everything is great. However they sometimes don't get sorted properly and I'm left with a folder full of mixed files - movies, tv shows, ebooks, audio files, etc.
This script accomplishes the following:
* Checks if a SAB category is already assigned before continuing. If already assigned, the download continues without any changes.
* Uses (case-insensitive) regular expressions in a specific order (specific search terms > TV > Movies > Audio) to determine category sorting.
* Individual words are evaluated within the nzb name for pattern matches (ie - the "epub" search won't match with "republic")
* TV shows are evaluated for the (s01e08, S2E14) or (season3, Season02) patterns.
* Movies are evaluated next based on resolution (720p, 1080P, 2160p). If not matching a TV show, it's assumed to be a Movie at this point.
* If all of the pattern matching fails, the download continues without any changes.
This script might not be needed if your categories already work flawlessly and/or you only use Radarr or Sonarr to add content. If you ever search manually, or have to contend with unsorted downloads, this will most likely help with your tv/movie sorting.
This example script below is very tailored to my particular needs, but the pattern matching is not overly "fuzzy" and the tv/movie sorting portion should be very universal. Chances are very good that you can run this script as-is to realize some benefits.
Enjoy!
Code: Select all
# SABnzbd PreProcessing script - evaluates un-categorized downloads and assigns categories based on nzb name
import sys
import re # add regex module
try:
# Parse the 18 input variables for SABnzbd version >= 4.0.0
(scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
except ValueError:
# ...or 11 variables for earlier versions
(scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename) = sys.argv
except Exception:
sys.exit(1) # a non-zero exit status causes SABnzbd to ignore the output of this script
# Assign nzb name to string
string = (nzbname)
# The example rules below follows this basic format:
# 1 - Specific searches - Specific search terms which reliably indicate a category (ie - motogp, or epub)
# 2 - TV search - TV shows have a reliable naming convention (ie - S01E04, or Season2), so searching these first
# 3 - Movie search - likely a movie if the above rules don't apply and it's a video file (ie - 720p, 1080p, 2160p)
# 4 - Audio search - If the above rules don't apply and it has "FLAC" in it's name, it's likely an audio download
# ^^^ Listed last since movie and tv releases sometimes include the word "FLAC" in their release names
# 5 - If none of the above rules apply, allow the download to continue without any changes
#
# Summary of return parameter options:
# print('') # 0 - Refuse, 1 - Accept, 3 - Accept but Fail
# print('') # Cleaned version of job name (no path or .nzb)
# print('') # 0 = Download, 1 = +Repair, 2 = +Unpack, 3 = +Delete
# print('') # Category
# print('') # Script (no path)
# print('') # Priority -- -100 = Default, -3 = Duplicate, -2 = Paused, -1 = Low, 0 = Normal, 1 = High, 2 = Force
# print('') # Group
if (category) is '': # Verify a SAB category is not assigned before continuing
# Searches for "motogp"
if re.search(r"\b\motogp\b", string, re.IGNORECASE):
category = 'motogp'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# Searches for "epub"
elif re.search(r"\b\epub\b", string, re.IGNORECASE):
category = 'ebook'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# TV search - matches s01e04, S2E11, Season2, season03, etc
elif re.compile(r"(.*?)[.\s][sS](\d{1,2})[eE](\d{1,3}).*|(.*?)\b(SEASON|season)(\d{1,3}).*").match(string):
category = 'tv'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# Movie search - matches 720p, 1080p, 2160P, etc. Assumed to be a movie at this point, if not matching the previous TV pattern.
elif re.search(r"\b\d{3,4}[pP]\b", string, re.IGNORECASE):
category = 'movie'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# Post TV/movie search, for "flac"
elif re.search(r"\b\FLAC\b", string, re.IGNORECASE):
category = 'audio'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# No pattern matches were found, allow the download to continue without assigning a category
else:
print('1')
print('')
print('')
print('')
print('')
print('')
print('')
# SAB category is already assigned, allow download to continue without any changes
else:
print('1')
print('')
print('')
print('')
print('')
print('')
print('')
sys.exit(0)
UPDATE - Here's a simplified version which only detects/sorts the TV and Movie categories.
The regex for TV shows has also been updated to include the somewhat rare 3x01 format.
Refer to the original script if you need to sort additional categories besides just TV and movies.
Code: Select all
# SABnzbd PreProcessing script - evaluates un-categorized downloads and assigns to TV or Movie categories based on nzb name
import sys
import re # add regex module
try:
# Parse the 18 input variables for SABnzbd version >= 4.0.0
(scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
except ValueError:
# ...or 11 variables for earlier versions
(scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename) = sys.argv
except Exception:
sys.exit(1) # a non-zero exit status causes SABnzbd to ignore the output of this script
# Assign nzb name to string
string = (nzbname)
# Summary of return parameter options:
# print('') # 0 - Refuse, 1 - Accept, 3 - Accept but Fail
# print('') # Cleaned version of job name (no path or .nzb)
# print('') # 0 = Download, 1 = +Repair, 2 = +Unpack, 3 = +Delete
# print('') # Category
# print('') # Script (no path)
# print('') # Priority -- -100 = Default, -3 = Duplicate, -2 = Paused, -1 = Low, 0 = Normal, 1 = High, 2 = Force
# print('') # Group
if (category) is '': # Verify a SAB category is not assigned before continuing
# TV search - matches s01e04, S2E11, Season2, season03, 2x1, 3x03, etc
if re.compile(r"(.*?)[.\s][sS](\d{1,2})[eE](\d{1,3}).*|(.*?)\b(SEASON|season)(\d{1,3}).*|(.*?)(\d{1,2})[xX](\d{1,3}).*").match(string):
category = 'tv'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# Movie search - matches 720p, 1080p, 2160P, etc
elif re.search(r"\b\d{3,4}[pP]\b", string, re.IGNORECASE):
category = 'movie'
print('')
print('')
print('')
print(category)
print('')
print('')
print('')
# No pattern matches were found, allow the download to continue without assigning a category
else:
print('1')
print('')
print('')
print('')
print('')
print('')
print('')
# SAB category is already assigned, allow download to continue without any changes
else:
print('1')
print('')
print('')
print('')
print('')
print('')
print('')
sys.exit(0)