Two Things About par2/SFV verifying/repairing

Want something added? Ask for it here.
Post Reply
theunforgiven
Newbie
Newbie
Posts: 3
Joined: January 31st, 2008, 7:50 am

Two Things About par2/SFV verifying/repairing

Post by theunforgiven »

I think it would be awesome if SabNZBd could do the following.  Nothing else supports this but i think it would be relatively easily to implement.  I propose that each file be checked on download against a  SFV file (if it exists) or a PAR2 file as each file is downloaded (SFV takes precedence because a CRC check will be easier to calculate and be quicker)  This will tell you if a file is broken and thus you can automatically get the correct number of blocks at the end.  This allows us to skip the par2 at the end of a post download if it is known to be good.  This will save time.  And at the end we won't have to scan twice if an archive is broken (one to find out it is broken and then one to repair it) there will only be one scan on a known bad archive to fix it.

What does everyone think?
nzb_leecher
Full Member
Full Member
Posts: 211
Joined: January 22nd, 2008, 1:38 pm

Re: Two Things About par2/SFV verifying/repairing

Post by nzb_leecher »

I would prefer the newsleecher method, that it checks each dloaded file direct after its dloaded and not wait until all 90 rars are dloaded before it starts to par scan. But i believe that one is already on the to do list.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Two Things About par2/SFV verifying/repairing

Post by shypike »

This is indeed on the todo list.
It will take quite some time though.
The idea is to do an on-the-fly par2 verification on each downloaded file and only use the external par2 program if repairs are needed.
Maybe we can sneak in .sfv too.
nzb_leecher
Full Member
Full Member
Posts: 211
Joined: January 22nd, 2008, 1:38 pm

Re: Two Things About par2/SFV verifying/repairing

Post by nzb_leecher »

Smart!

One idea, i use the sab cache method with 150.000 kbytes (150mb) to store the parts (mostly 50mb rars) so the encoding goes quicker and doesnt taka i/o from the harddisk until it is encoded and then it gets written to hd. The par check would be superfast if it was checked while still in the cache. If it would write it down to the harddisk and then read it again to parcheck it, there would be a lot of i/o going on, slowing the rest of the system. For example if it was extracting a file at the same time (or doing other disk i/o). Dont know or that is a complex thing but it would make it very very efficient. Just a thought. :-)
Last edited by nzb_leecher on February 1st, 2008, 3:57 pm, edited 1 time in total.
Quadro
Newbie
Newbie
Posts: 2
Joined: August 30th, 2008, 2:47 am

Re: Two Things About par2/SFV verifying/repairing

Post by Quadro »

Because I was bored at work and annoyed that my uber fast 100mbit downloads from giganews were lagged by post-process verification, I started thinking: on the fly verification would be very nice. So I started looking if anyone had already started working on this feature and then I stumbled upon this topic. Anyway, because I always want the things I want asap and since I'm a programmer too, I started looking at how hard it would be to do myself.

Immediately, I ran in to the problem that sabnzbd uses an external par2 command to verify/repair, which forces sabnzbd into doing this all at once at the end of the job, because the tool just doesn't support verifying only 1 file in a set. So I started looking if there were any par2 libraries for python, but they simply didn't exist. Then I came across the par2 specification and thought: hey, this isn't too hard.

My plan was the following: from the par2 file, retrieve all the filenames and their corresponding md5 hashes. Then when a file has been downloaded, calculate the md5 hash of the downloaded file and compare it to the value from the par2 file. When all files in the par set have verified, the 'repair' stage can be skipped :)

Now, retrieving the filenames and their md5 hashes wasn't too hard, the code below handles that part:

Code: Select all

def getMD5Hashes(par2file):
    """ Returns a dictionary with as keys the filenames and as values the md5 checksums
        Argument 'par2file' should be the filepath to the par2 file
    """

    par2 = file(par2file, 'rb').read()
    par2length = len(par2)
	
    # dictionary that contains filenames with their corresponding hash
    files = dict()
    
    # Far a full description of the par2 specification, visit: 
    # http://parchive.sourceforge.net/docs/specifications/parity-volume-spec/article-spec.html

    # 'offset' describes the offset in bytes to the start of the chunk currently working on

    # for file description packet:
    # offset + 24   = start md5 hash entire file
    # offset + 40   = start md5 hash first 16KB
    # offset + 56   = start length file
    # offset + 64   = start name of file

	for offset in range(0, par2length, 4):
		if par2[offset:offset+8] == "FileDesc":
			hash = par2[offset+24:offset+40]
			
            # Because a par2 file is based upon chunks of 4 bytes, but
            # filenames are variable of length, the last bytes of the last 
            # chunk where the filename is stored will be filled with '\0's
            # when the filename length isn't x * 4. (e.g. foobar, will become foobar\0\0)
            filename = ''			
			chunkid = 0
			nextchunk = par2[offset+64:offset+68]
			while nextchunk != 'PAR2':
				filename += nextchunk.strip('\0')
				chunkid += 1
				nextchunk = par2[offset+64+(chunkid*4):offset+68+(chunkid*4)]
			
			files[filename] = hash
	return files
For me, the hard part is integrating this all into sabnzbd, because I have not enough knowledge about sabnbzd yet, but I believe switch was working on some kind of documentation on trac.
Last edited by Quadro on August 30th, 2008, 3:19 am, edited 1 time in total.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Two Things About par2/SFV verifying/repairing

Post by shypike »

Thanks, very useful code to get going with the subject.
As you guessed, integration is not easy.

1. The ideal place is in the assembler, verify just when complete file gets written to disk.
2. Alas, some files are complete before the first par2 is complete, this means an extra step for these files
3. The current post-postprocessor does not look at the internal admin, but only at the folder content (rework required)

Also, your code assumes that the par2 file is OK, this needs to be verified as well.

Anyway, I had already started working on an implementation, but stumbled over par2 file analisys.
Your code fits in quite well and I'll be sure to credit you for it.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: Two Things About par2/SFV verifying/repairing

Post by shypike »

Integration is complete, I changed your code to add verification of the PAR2 file itself.
This feature will land in the 0.5.0 release (it's now on a branch for extensive testing).
Post Reply