Page 1 of 1
Header information API
Posted: August 7th, 2011, 10:07 pm
by sbuser
For a variety of reasons it is useful to look inside certain nzb files before sending them to SAB to make sure the contents are as expected.
This is impossible when the contents are in .rar or .zip files. The solution would be to download just the headers for those files to get a list of the contents.
This could be done with external programs but, again, for a variety of reason it is best if SAB does this for us. This could happen in a few ways:
1) any program seeking header information passes SAB a tiny nzb that only contains a single file and only the 1st segment of that file (which should contain the header information) - SAB then grabs it, analyzes it for things like passwords and file contents and then delivers the results.
2) sab accepts full .nzb files through the api as usual but if they are flagged with a header request they are parsed by sab and paused in the Q until further action is taken. programs seeking header information then query SAB for that info and send either an unpause or delete for the paused nzb.
----
Why does this matter? There are some really nice automated options available for content these days but there are a lot of misfires because people do things like wrap what should be a bunch of individual files up into a single file. Things like a single lossless audio file with an embedded cue sheet rather than individual tracks. It would be nice to know what's in those rar files so that appropriate action can be taken.
Re: Header information API
Posted: August 8th, 2011, 2:02 am
by shypike
Maybe I'm a bit dim, but what exactly do you expect SABnzbd to do
differently in this scenario?
Re: Header information API
Posted: August 8th, 2011, 7:26 am
by sbuser
The idea is that when asked SAB will download the first few bytes from the rar file and get the header information out of the file. It would then make this data available to external programs via the API.
So: nzb is passed to SAB with the headers flag set. SAB downloads part of the nzb (or the whole nzb depending on implementation and then it's up to the other program to deliver a tiny nzb that consists of just the part of the file with the headers). SAB then parses the headers in the .rar with something like Python's rarfile:
http://pypi.python.org/pypi/rarfile/2.3 - The information is then available at something like sabnzbd/api?headers=1&nzb=id - or what have you.
Make sense?
Similar things could be done for a variety of file types once it works with .rars - where I think it's most valuable. Similar things could be done with any file that contains header information in the first few bytes/kb.
--
Another option is to leave the parsing completely up to the external program and just pass back the specified number of decoded bytes from the file when requested. So SAB downloads and decodes with high priority X number of bytes from the nzb passed in and then makes those bytes available via the API. The external program then decodes those with rarfile or similar.
--
It doesn't really matter how it's done - but the idea is to get inside of the rar files before the whole nzb is downloaded and make sure they are what we're looking for.
Re: Header information API
Posted: August 8th, 2011, 7:36 am
by shypike
SABnzbd can already pause downloads when it detects encrypted RAR files.
It only does that when the first RAR file is complete.
I'm not going to make the download queue even more complicated than it already is.
Doing early checks is difficult due to all parallel and potentially out-of-order article downloads that take place.
What you can do is create an NZB with just the initial articles of one or more files,
the result of which you can inspect outside of SABnzbd.
Queue that with high priority, with "download-only" and wait for the result (optionally using a user script to signal completeness).
Re: Header information API
Posted: August 8th, 2011, 8:48 am
by sbuser
Ok... yes, that will work.
Can you talk to me about how "Force" works as a priority? The docs say it will go "straight to the top" and that it will not respect the queue pause.
If multiple items are forced does the 2nd force move ahead of the one that is already being forced? If so - how might this be avoided?
I ask because priority=high is already used by a lot of people for things like sickbeard and one wouldn't want to wait for several gigs to download before getting header information back.
Any idea how best to handle that situation? Ideally we'd want to be able to get the fragments with a high priority (higher than "high" if you will) and also keep them in order.
I also wonder how "force" will work with the download-limits coming in the next version?
Thanks for looking at this.
Re: Header information API
Posted: August 8th, 2011, 8:54 am
by sbuser
Sorry another thing while I'm thinking about it: what would be the best way to keep these out of the history? At the very least these will be duplicates of an eventual download and there might be 10 checks for every actual download - that could lead to a lot of unwanted history spam. Is there a way to keep them out in advance or do we have to somehow go back and delete them?
Re: Header information API
Posted: August 8th, 2011, 8:59 am
by shypike
You'll need the API to remove unwanted entries from history.
"Force" is an extra priority level, the second "force" job will be second in queue.
"Force" will also ignore quota, it's called "Force" for a reason.
There are three prio levels already: High, Normal, Low. You should even be able to use it without "force".
Re: Header information API
Posted: August 8th, 2011, 9:08 am
by sbuser
Right... it's just that a lot of people use the existing priorities for things already.
Anyway thanks for the help I will give it a shot.
Re: Header information API
Posted: August 8th, 2011, 12:40 pm
by sbuser
Does the "bytes" attribute of a "segment" node in an nzb file not have any effect on how much of the segment SAB downloads?
If a segment starts out as bytes="327680" and I create a new mini nzb to grab the headers and change that attribute to "1024" it looks like SAB is still getting the original 320KB rather than the new, smaller number of bytes.
Is there any way to tell SAB to get only part of a segment?
--
EDIT: Or is SAB maybe just getting the 1024 bytes and the rest are just blank but the file is created on disk with the full number of segment bytes?
Hmm, no - looks like it got the full segment...oof. It will start to add up if we have to pull the full segment every time....
Re: Header information API
Posted: August 8th, 2011, 2:42 pm
by shypike
It doesn't matter. If only article one of a file is available, that will be the content of the file.
Re: Header information API
Posted: August 8th, 2011, 3:30 pm
by sbuser
Sorry, just so I'm clear:
An "article" is the same as a "segment" in the nzb? And SAB will pull down the entire segment/article if it is given and will not (or cannot?) pull down only a certain number of bytes of that segment/article?
The idea is that we don't want to pull down a 300 KB segment (or article) - we just want the first 1024 (or some specifiable number) of bytes. If the header information is all in the first 1024 bytes and we have to get the full article that's 300x unnecessary information every time we want to peek inside an RAR.
Is that the only way? Sorry if I'm being obtuse, I'm not intimately familiar with the internals of SAB or usenet and how articles are encoded etc...
Re: Header information API
Posted: August 9th, 2011, 4:09 am
by shypike
Yes, a segment is an article.
Articles can only be downloaded fully, the server is going to send all of it anyway.
SABnzbd will put whatever segments are present in a file section into a disk file.