Page 1 of 2

4.2.0Beta1 Smart Duplicate detection

Posted: December 2nd, 2023, 5:12 pm
by goldeneyes
I am noticing that when grabbing a multiple episodes of a (standard) series, sabnzbd is marking every episode after the first as an alternative even though the season and episode numbers are unique. It looks like the smart matching for the queue items is causing the confusion

Code: Select all

023-12-02 15:05:24,083::DEBUG::[nzbstuff:1929] Duplicate checking NZB The Great British Bake Off - 2x05 - Pies avi-xpost (md5sum=ffbd690f3083400e6b3f59fa820f4b86)
2023-12-02 15:05:24,086::DEBUG::[nzbstuff:1932] Duplicate in history: False
2023-12-02 15:05:24,087::DEBUG::[nzbstuff:1936] Duplicate in backup: False
2023-12-02 15:05:24,088::DEBUG::[nzbstuff:1939] Duplicate in queue: False
2023-12-02 15:05:24,109::DEBUG::[nzbstuff:1944] Smart duplicate checking (The Great British Bake Off - 2x05 - Pies avi-xpost): the great british bake off//
2023-12-02 15:05:24,111::DEBUG::[nzbstuff:1947] Duplicate in history: False
2023-12-02 15:05:24,112::DEBUG::[nzbstuff:1950] Duplicate in queue: True

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 3rd, 2023, 3:51 am
by safihre
It seems it's unable to detect the season and episode numbers. That would unfortunately have also failed on 4.1.0 if any were in history, as the same detection is applied.
It does highlight another bug, it shouldn't create a duplicate matching key when just the show is know but episode and season are not. I'll fix that.

Thanks for testing and reporting!

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 4th, 2023, 5:04 pm
by wilberfan
I noticed this issue last night for the first time. Is there a manual override, or any way to 'force' a downlaod? I'm running inside docker on a Synology setup. Is it safe to downgrade until the issue is resolved?

[edit} I created a new container from stable, so all seems well for now. ;D

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 4th, 2023, 5:42 pm
by goldeneyes
You can change the smart detection to tag instead of pause in the settings under Switches tab.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 4th, 2023, 8:55 pm
by djones
I was happy to see duplicates finally getting some attention in the latest beta build, after so many years of waiting, and seeing "just use Sonarr" replied whenever the issue has been raised. The reality is there's lots of RSS fetched content for example that falls outside of what Sonarr handles.

I'm hoping the feature will continue to be fleshed out so some basic user control (overrides for example) are provided for Smart Duplicate detection, since right now I guess the feature is a black box. I *have* read how GuessIt works, as well as the Wiki page for SABNzbd's duplicate-detection.

More immediately, I'm hoping Smart Duplicate detection bypassing the prequeue script gets addressed. I had a lot of renaming logic in a prequeue python script to clean up messy filenames precisely so they would be easier to de-dupe (visually+manually) later.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 5th, 2023, 4:49 am
by safihre
You can bypass by setting the job to Force, either when adding or when the job is in the queue.

@djones: What if we run the duplicate detection again after the pre-queue result, only if it changed the name?
That seems reasonable.
Or what else would you like to do from the pre-queue script that isn't possible?
Indeed GuessIt is a black box, also to me. It just works, or sometimes it doesn't...

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 5th, 2023, 4:56 am
by safihre
@goldeneyes I see that I made a stupid mistake in smart duplicate detection so it never works for shows.. Very stupid.
You can download the fixed release here in a few minutes (does require a GitHub account).
Will release a new beta soon, just need to add automated-testing so this stupid mistake doesn't happen again.

https://github.com/sabnzbd/sabnzbd/acti ... 7099019691

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 10th, 2023, 3:30 pm
by djones
safihre wrote: December 5th, 2023, 4:49 am You can bypass by setting the job to Force, either when adding or when the job is in the queue.

@djones: What if we run the duplicate detection again after the pre-queue result, only if it changed the name?
That seems reasonable.
Or what else would you like to do from the pre-queue script that isn't possible?
Indeed GuessIt is a black box, also to me. It just works, or sometimes it doesn't...
That's an interesting idea. Can queue smart duplicate checking somehow always be active rather for the queue items rather than checking only once when a new job comes in? Or maybe create a user option for how often to check (on a timer lets say).

For your reference, here is a shortened version of the pre-queue script I was using, "shortened" meaning my actual script has 90+ rename/substitute lines:

prequeue.py

Code: Select all

import sys
import re

try:
    (scriptname, nzbname, postprocflags, category, script, prio, downloadsize, grouplist, showname, season, episodenumber, episodename, is_proper, resolution, decade, year, month, day, job_type) = sys.argv
    downloadsize = int(downloadsize)
   
except:
    sys.exit(1)    # exit with 1 causes SABnzbd to ignore the output of this script

fwp = nzbname
fwp = nzbname.replace('...', '.').replace('..', '.')
fwp = re.sub('(?i)\.4k', '.2160p', fwp)
fwp = re.sub('(?i)-Obfuscated$', '', fwp)
fwp = re.sub('(?i).READ.NFO', '', fwp)
fwp = re.sub('(?i).com.', '.', fwp)
fwp = re.sub('(?i).par2', '', fwp)

print("1")    # Accept
print(fwp)
print()
print()
print()
print() 
print()
# 0 means OK
sys.exit(0)

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 10th, 2023, 3:41 pm
by djones
One more request for feature add/change: a field to define Smart Duplicate detection bypass keywords.

Currently there is simply a checkbox "Allow proper releases" with PROPER, REAL or REPACK words hardcoded. Perhaps it could be changed to "Allow duplicate bypass" and a field for user defined keywords - and PROPER, REAL, REPACK could be pre-populated in the field just for continuity.

Image

Example, I'd define "2160p" as a bypass keyword, because currently my RSS feeds will download a 1080p version of a video, then afterward a 2160p version enters the queue but it's marked as a duplicate.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 10th, 2023, 3:45 pm
by safihre
@djones: In the new 4.2.0RC1 release I implemented that if the pre-queue script supplies a new job name, the Duplicate Detection is re-analysed.
Regarding your other request: that is really something that tools like Sonarr/Radarr are made for, they allow exactly such things.
The integration with TV/Movie-databases make them so much better at getting only 1 release even if the names don't really match, or updating a 1080p to 4K once it comes available.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 10th, 2023, 3:48 pm
by djones
safihre wrote: December 10th, 2023, 3:45 pm @djones: In the new 4.2.0RC1 release I implemented that if the pre-queue script supplies a new job name, the Duplicate Detection is re-analysed.
My issue is the pre-queue py script I supplied does not function anymore since 4.2.0Beta1, the jobname substitutions are no longer happening. Unclear how to troubleshoot, I've looked in the logs but not experienced enough to know what to look for.
safihre wrote: December 10th, 2023, 3:45 pmRegarding your other request: that is really something that tools like Sonarr/Radarr are made for, they allow exactly such things.
The integration with TV/Movie-databases make them so much better at getting only 1 release even if the names don't really match, or updating a 1080p to 4K once it comes available.
Understood, there's lots of non-TV/Movie content that Sonarr/Radarr don't handle or I wouldn't be bothering to ask. VR content for example. All I'm suggesting is a whitelist field instead of PROPER, etc being hardcoded.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 10th, 2023, 4:19 pm
by djones
Another idea to enhance user control of Smart Duplicate detection, consideration of filesize - and again, for the vast exabytes of Usenet content that Sonarr/Radarr do not support:

A dropdown in Switches with options: Keep smallest size, Keep largest size, etc

I always want the largest version because it's usually the highest resolution, and would consider anything smaller a dupe. Other people like the smallest version of a video for their mobile device, etc.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 11th, 2023, 2:39 am
by safihre
djones wrote: December 10th, 2023, 3:48 pm
safihre wrote: December 10th, 2023, 3:45 pm @djones: In the new 4.2.0RC1 release I implemented that if the pre-queue script supplies a new job name, the Duplicate Detection is re-analysed.
My issue is the pre-queue py script I supplied does not function anymore since 4.2.0Beta1, the jobname substitutions are no longer happening. Unclear how to troubleshoot, I've looked in the logs but not experienced enough to know what to look for.
Can you try 4.2.0RC1? I cannot reproduce this, my modified name from the pre-queue script is used.
If you enable Debug logging you can send the logs to me at [email protected]

Regarding the special keyworks for override: the proper/real/etc check is done by GuessIt, so not us checking specific keywords ourselves.

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 11th, 2023, 9:51 pm
by djones
safihre wrote: December 11th, 2023, 2:39 amCan you try 4.2.0RC1? I cannot reproduce this, my modified name from the pre-queue script is used.
Yep emailed you the logs. Meantime, I worked around it by re-writing the python script in powershell. It's jank since it has to be spawned by a windows batch file, but it works (4.2.0RC1).

Re: 4.2.0Beta1 Smart Duplicate detection

Posted: December 23rd, 2023, 6:39 am
by Scarfaro
The Smart Duplicate detection unfortunately ignores nzbs that are in subfolders. (NZB-Backup-Folder)

e.g. /incoming/nzb/M/Movie.gz

Could you please fix this?