newsbin.com

by **stavros** » Mon Nov 25, 2024 1:40 pm

Hi Quade,

when a download file fails to download properly and gets put into the "Failed Files" tab/list, the details of a rolled up file are not kept.

The top line, with the + sign is correct, but when this is expanded, instead of seeing each line detail, the top line is replicated throughout the list.

You can't tell what was downloaded, what wasn't and what the reason for failure was.

This is most often due to the downloading of a small file containing boiler-plate text that is included in multiple different posts and the
"Use Duplicate Detector" setting is enabled.
Would it be possible to have a 'Bypass Option" for small readme, sfv, info files, as mostly RARs and PARs are the important files?

regards
Stavros

by **Quade** » Mon Nov 25, 2024 9:58 pm

The dup detector is kind of useless these days. I keep it turned off.

The issue is it was created when everyone was on slow connections. Newsbin would dump the connection when it encountered a dup to save bandwidth. These days, connections are fast but, just checking the first 24K of the file just isn't enough. That's why I added another dup detector which works at the file level. If a file already exists on disk with the same data as in the first entire chunk, it'll duplicate detect out.

These days Newsbin always downloads at least one chunk of a file so I could check much more data. 200k-1M of data for dups but that would invalidate all the old signatures.

Maybe make that an optional change? For people who turn on the new dup detector, it'll check the whole first chunk or 200K of the first chunk.

The reasons for failure these days are:

1 - Dup check either the old or new one.

2 - Encrypted but no password.

3 - More then 10% of the files 430 out.

4 - The download doesn't have any PARS so Newbin can't check it. In this case though the download might still succeed, it will attempt to decode but, it can't tell ahead of time if the set should succeed.

by **stavros** » Tue Nov 26, 2024 12:29 am

Hi Quade,

My first issue was with the display of the failed files details in the failed files tab being incorrectly displayed.

My second issue was the duplicate checking causing the failure - and I agree with your reasoning and the 4 main causes,
but I usually suffer from a 5th failure cause, or a variation of case 1, where an info file is included in the file set that contains boiler-plate text.

This common info file has been downloaded hundreds of times and is in the dupe-database, and every time a file set is
downloaded that contains this standard info file, the whole download of the file set gets failed, but not until some or even most of the files are downloaded and then the info file is hit and the download fails
- which then causes my first issue to come into play, so that I don't easily know what files have been downloaded or failed.
I then have to either hunt around and work out which files are missing, or re-add the fileset and ignore dups, but then it re-downloads all the stuff that is already down ok - could be multiple GB!

The new optional dup detector would maybe help with some of the causes, but I don't think it will help with what I have described above,
and I would hate to invalidate a dup database that has taken 20 years to build up :-)

I like the dup checker as it is useful for individual file downloads like JPGs, EPUBs, PDFs etc, but I also agree that it is less useful now than it used to be.

Would it be possible, perhaps, to have a user maintained list of file types to be included/excluded from the dup check?
Maybe defaulting to everything included in the dup check to start with, but allowing for fine tuning?

Or, my preferred option, maybe in a large file set, with a mixture of large files, PARs and other small files (.info, .sfv, .jpg, .png etc),
an option to only include large files and PARs as the reason for failing the whole file set download?

As you said, you're going to download the whole of the small files anyway, before you can know if they are dupes or not,
and I think you can't easily stop the download from the server anyway once it is decided that a file is actually a dup,
so saving bandwidth isn't possible, and nowadays, is a lot less important for the smaller files - as long as they don't amplify the failure rate
and increase the bandwidth usage out of proportion.

Anyway, thanks for reading. If you could please look at Issue 1, that will make it easier for me to deal with issue 2, even if you can't fix them both.

regards
Stavros.

by **Quade** » Tue Nov 26, 2024 1:21 pm

The new optional dup detector would maybe help with some of the causes, but I don't think it will help with what I have described above,
and I would hate to invalidate a dup database that has taken 20 years to build up

It would likely solve it because the common text file is probably smaller than the size of the first chunk. The issue is that the old dup detector sample size is so small, a common smallish text file is always matched.

Newsbin doesn't look for files when it's dup checking the old way. It just compares the first 24K of the file to a database of the first 24K of every other file you've already downloaded. I'm assuming this common text file is inside a RAR. Newsbin doesn't decode the rar to dupcheck.

by **stavros** » Tue Nov 26, 2024 11:32 pm

I'm assuming this common text file is inside a RAR. Newsbin doesn't decode the rar to dupcheck.

Ah, I see. No, it is a separate file - blah!blah!.info.

The rolled up post is just a bunch of headers in a loaded group containing a few large RARs (partnn etc) a few PARs, and a few other files e.g. .JPG, .INFO, .TXT, .SFV etc. - no NZB involved.

The RARs start downloading, as do the smaller files. As soon as the common info file is downloaded, the whole rolled up post gets failed as a duplicate and stops downloading.
This leaves a partially downloaded set of files in the target download directory to deal with, along with a failed files tab entry (see issue 1).

regards
Stavros

by **Moondawgie** » Thu Nov 28, 2024 3:13 am

With an updated version of the SigGenerator (SigGenV6.exe), I'd use such an option.

Quade wrote:The dup detector is kind of useless these days. I keep it turned off.

The issue is it was created when everyone was on slow connections. Newsbin would dump the connection when it encountered a dup to save bandwidth. These days, connections are fast but, just checking the first 24K of the file just isn't enough. That's why I added another dup detector which works at the file level. If a file already exists on disk with the same data as in the first entire chunk, it'll duplicate detect out.

These days Newsbin always downloads at least one chunk of a file so I could check much more data. 200k-1M of data for dups but that would invalidate all the old signatures.

Maybe make that an optional change? For people who turn on the new dup detector, it'll check the whole first chunk or 200K of the first chunk.

The reasons for failure these days are:

1 - Dup check either the old or new one.

2 - Encrypted but no password.

3 - More then 10% of the files 430 out.

4 - The download doesn't have any PARS so Newbin can't check it. In this case though the download might still succeed, it will attempt to decode but, it can't tell ahead of time if the set should succeed.

newsbin.com

Failed Files details not displayed properly

Failed Files details not displayed properly

Re: Failed Files details not displayed properly

Re: Failed Files details not displayed properly

Re: Failed Files details not displayed properly

Re: Failed Files details not displayed properly

Re: Failed Files details not displayed properly

Who is online