bug in 0.5; it stops

Questions and bug reports for Beta releases should be posted here.
Forum rules
Help us help you:
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

bug in 0.5; it stops

Post by ugokanain »

Been dl what I hadn't for a while, so I have a lot in queue. Basically every day have to restart sab, for it stops dl. The menu widget displays a dl speed value, but there isn't any activity. After restarting sab is able to continue from where it stopped. Don't recall this happening on previous versions of 0.5, probably alpha 0.7 or higher.

This isn't because of missing or incomplete segments. Some posts that were skipped (though sab continued with no problems) did had missing segments, which I had to manually get the missing parts, because sab didn't assembled them with the missing segments. Most had enough info (actual segments and par2s) to complete the post.

Also another problem, with the unraring or par checking, is that several posts get tagged as failed, but it manages to assemble the post correctly. Other posts were not assembled, but had all the necessary parts. MacPar had trouble deciding what language to use, otherwise everything worked correctly.


2009-11-19 14:20:40,913::ERROR::[downloader:433] Failed to initialize [email protected]:119
2009-11-19 14:20:40,914::DEBUG::[downloader:434] Traceback:
Traceback (most recent call last):
  File "sabnzbd/downloader.pyc", line 428, in run
  File "sabnzbd/newswrapper.pyc", line 204, in init_connect
  File "sabnzbd/newswrapper.pyc", line 111, in __init__
error: [Errno 49] Address not available - Check for internet or DNS problems
2009-11-19 14:20:40,914::WARNING::[downloader:659] Thread [email protected]:119: failed to initialize
2009-11-19 14:20:40,915::DEBUG::[trylist:59] Reseting .__try_list
2009-11-19 14:20:40,916::DEBUG::[trylist:59] Reseting .__try_list
2009-11-19 14:20:41,877::DEBUG::[trylist:45] Appending yellow.octanews.net:119 to .__try_list
2009-11-19 14:20:41,879::DEBUG::[trylist:45] Appending yellow.octanews.net:119 to .__try_list
2009-11-19 14:20:41,883::DEBUG::[trylist:45] Appending free.teranews.com:119 to .__try_list
2009-11-19 14:20:41,885::DEBUG::[trylist:45] Appending free.teranews.com:119 to .__try_list
2009-11-19 14:20:41,889::DEBUG::[trylist:45] Appending nntp.aioe.org:119 to .__try_list
2009-11-19 14:20:41,891::DEBUG::[trylist:45] Appending nntp.aioe.org:119 to .__try_list

2009-11-19 14:20:54,448::DEBUG::[trylist:45] Appending yellow.octanews.net:119 to .__try_list
2009-11-19 14:20:54,450::DEBUG::[trylist:45] Appending yellow.octanews.net:119 to .__try_list
2009-11-19 14:20:54,453::DEBUG::[trylist:45] Appending free.teranews.com:119 to .__try_list
2009-11-19 14:20:54,455::DEBUG::[trylist:45] Appending free.teranews.com:119 to .__try_list

2009-11-19 14:32:20,292::INFO::[__init__:648] Loading data for SABnzbd_article_kCy488 from /cache/SABnzbd_article_kCy488
2009-11-19 14:32:20,299::INFO::[__init__:678] /cache/SABnzbd_article_kCy488 removed
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

I'll look into it.
Could you supply some more info?
Platform? Specific NZBs? If so, which ones.

There's still the possibility that articles missing on fill-servers create a deadlock in the queue.
This can happen when you set "top-only" on.
I see you have a  large number of servers, did you also designate some servers as "optional"?
Because that is the way to break the missing article deadlock.

BTW: in the mean there is an (ugly) work-around.
You can set a planned restart in the scheduler.
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

This is on a 10.5.8 os x powerbook. The list of the nzbs is really long, it went to 14 screens.

As for the servers, most are set to optional and fills. The two good ones are set without any option, but ssl. Do have the setting so that it doesn't only dl from the top, and have a daily pause and resume scheduled.

Not all the missing segment posts freeze sab.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

A daily planned restart might be a good idea anyway.
We intend to overhaul the queue mechanism in the next major release (so after 0.5.x).
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

Can you tell what happens exactly?
If the downloading stops, are there still server connections open (and you see article-id's)?
If so, it's probably an infinite retry for a missing article.
If not, it's a true deadlock which could have multiple causes.
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

I will try to get more details next time it happens
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

Did you ever saw the log files I sent via email?

RC2 still stops. For instance the last update to the cache files was almost 4 hours ago
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

I saw the files, but there's nothing in it that gives a clue.

As long as there's no reproducibility in this, it will be very hard to diagnose.
I'm not saying there isn't a problem, but I simply have never
been able to reproduce it.
Still a couple of check questions.
- Does downloading resume when pause and resume?
- Does downloading resume when you restart SABnzbd?
- Does downloading resume when you delete the "stuck" job?
- Do you have the "top-only" option on or off?

So far I have been able to find one potential deadlock,
but haven't been able to come up with a scenario to trigger it.
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

Doesn't resume after pausing the download. However, it does continue after restarting sab (though most often I have to kill it first).

I don't need to delete any files.

The Only Get Articles for Top of Queue is off.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

Do you see any open connections in the Connections screen?
Does the stop occur right after a job is finished?
The last thing is what your logs suggest.

The potential deadlock I found had to do with the downloader
not notifying the post processor when a job completes.
That will result in a job that will never leave the queue.
However, it will not result in a complete lockup, other jobs
should simply continue (since you have top-only off).
If the queue is "stuck", what happens if you add a new NZB?

You say you need to "kill" SABnzbd.
Does this mean it hangs up in the restart attempt?
What does the logging say at this point?
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

Both the menu and the connections display activity. Though there isn't any. Last time I let it stand for over 4 hours, and it didn't had any activity. After that time, the servers drop the connection (indeed my servers show no activity).

Usually it stops in the middle. Currently there are over 200 MB left of the first dl, and all of the next.

The last lines in the log are
2010-01-18 04:33:09,768::ERROR::[downloader:735] Exception?
2010-01-18 05:09:00,377::ERROR::[misc:764] Failed moving /Volumes/nntp/downloads/Appaloosa (2008)/Appaloosa.1.nfo to /Volumes/Downloads/usenet/_UNPACK_Appaloosa (2008)/Appaloosa.1.nfo

There is a gap of 5 hours in the last log entry and the last updated cache file.


As for the killing sab, I issue a stop command (either by the menu or the web interface) and usually nothing happens. The log hasn't been updated (not since 5 am the log from above).  Sometimes it takes a long time, but most often it doesn't stop sab.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

You may have hit upon a downloader crash.
I suspect that the exception handler in downloader:735 isn't always safe.
It may crash depending on what exactly happened.
Unfortunately we don't have a traceback logging statement here, which would tell more.
It's also a very unlikely event, otherwise we would have had many more complaints.
(But unlikely is not impossible :) )

The stop of logging may mean one or both of two issues.
The infamous Python logging bug popped up or logging stopped
because the downloader crashed.

If the downloader crashes, the health monitor should restart SABnzbd within a minute.
At least it does so when I force a crash of the downloader.
I have no idea why it doesn't happen on your system (cannot test on OSX unfortunately).

I'm looking at improving the code.
Would you be willing to test a special version?
May take some time, because I depend on another team member to create the OSX binaries.
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

Sure
ugokanain
Release Testers
Release Testers
Posts: 89
Joined: March 19th, 2008, 6:45 pm

Re: bug in 0.5; it stops

Post by ugokanain »

Have the impression that it happens when the post is problematic. There was a large file I had been trying to get, and only one of my servers had it. All of the segments were available.  When it is the top most file, sab seems to have trouble dling the other files. When I changed the priority to low, the dling went ok.
User avatar
shypike
Administrator
Administrator
Posts: 19774
Joined: January 18th, 2008, 12:49 pm

Re: bug in 0.5; it stops

Post by shypike »

Maybe it's still that old deadlock that haunts us.
If articles are missing from all non-optional servers and the
only optional server left is not responding, there's a deadlock.
This is why we tell people to mark all servers that are
less than very reliable as optional.
That way they are temporarily removed from the servers list when troublesome.
Servers that say an article is absent are OK, the trouble starts when no answer arrives.

When I think about it, one sure way to create a deadlock is to have a server
that operates normally, but that needs more time than server-timeout allows, before it gives
the "not found" answer. But if it's only asked for one specific atricle, the server
will be eventually be blocked (if optional!)

I don't quite understand why setting the priority to low would help.
Unless of course the fact that a higher priority NZB starts delivering
articles is enough to break the deadlock. But why?

The reason why we do not work with a limited nimber of retries (like other Usenet clients do)
is that a failing server may "burn" through all of your queued jobs without downloading anything.
The disadvantage of our method is that deadlocks occur when the design (and the user) isn't careful.

I'll keep this on the backburner for a while.
Post Reply