SBNews: News Robot Utility
(C) Scott M Baker
I variously refer to this program under the names
"NewsBot" and "SBNews" throughout the
document. You can for the most part consider them to be the same.
Eventually, I plan on including extra utilities in this package,
"SBNews" will be the name of the full package and
"NewsBot" will be the name of the News Robot utility
within the package.
The SBNews online help is usually more specific and up-to-date
if you are looking for information on a specific command or
feature. This document should be considered more of an
installation and setup guide.
Table of Contents
The purpose of this program is to automatically download
and uudecode files from newsgroups via a winsock
connection. If you've found this program, then you probably know
what this means, so I won't go into detail. Newsbot will attempt
to piece together multi-part files. Both uuencode
and mime base-64 formats are supported,
although piecing together of multiple files works best with
uuencoded data.
If you're new to binary newsgroups, then here are a few
G-rated newsgroups that you might wish to try:
alt.binaries.pictures.astro
alt.binaries.pictures.animals
alt.binaries.pictures.cartoons
alt.binaries.pictures.fractals
There are a wealth of adult-oriented newsgroups, but I'm not
going to list them in this document.
I have included a sample file, SAMPLE_G.TXT, which includes a
listing of binary newsgroups. There are no descriptions, but you
can probably find out from the newsgroup name what subject matter
the newsgroup deals with. Newsbot will download a newsrc
file for you automatically the first time you connect to your
news server. You can access the newsrc file by using the
<Add> button and selecting <Browse>.
Newsbot is primarilly intended for automatic unattended
download. Newsbot can be used to gain a large number of files
with little or no user intervention, allowing the user to spend
most of his/her time viewing the files. Nevertheless, there are
several manual features supported to give the user control and
status information, such as thumbnails displayed during download,
anti-SPAM controls, and the ability to manually select headers to
filter out unwanted mail.
- Windows 3.1 or greater
- Winsock connection capability (i.e. SLIP or PPP)
- 16-bit (Windows 3.1) Version: SBNEWSxx.ZIP (xx denotes
version number)
- 32-bit (Windows 95) Version: SBN32_xx.ZIP (xx denotes
version number
- Unattended download - do other work (or go out to lunch)
while downloads and uudecoding are all processed for you!
- Automatic HTTP address logging -- NewsBot will keep track
of any HTTP URL's that were found while downloading
addresses. These will be saved to a convenient HTM file,
HTTPGRAB.HTM.
- Built in JPEG viewer. Allows you to see postage-stamp
size images of the files you are downloading in progress.
- Delete/View previously downloaded JPEG files.
- Comprehensive logging capabilities - save the subject and
from information for files that you might want to follow
up on later.
- Dupe Checkers -- avoid downloading the same files
multiple times.
- Auto-Encryption -- can be set to automatically encrypt
files when saving to disk to keep their contents private.
- Automatic FILES.BBS generation for sysops
- Eliminate advertisements/SPAM with the Maximum XRef Limit
(see preferences) and the XRef Lockout Filter (see
lockout lists)
Here are some very quick notes on installing and using
Newsbot:
Installation:
- Unzip the distribution archive into the (temporary)
directory of your choice.
- Run the file SETUP.EXE -- this is an automatic
InstallShield setup program and will copy all files and
do everything else necessary to get the program installed
for you.
Running:
- Run NEWSBOT.EXE -- this can be done in various ways -
from the Win-95 command prompt, the Windows Explorer
(and/or file manager), the Windows Run Program command,
etc.
- On your first session, you'll be prompted for a
"News Host Name". This is the name of your news
host, and should be given to you by your ISP (internet
service provider).
- On your first session, you'll be prompted for a
"download path". This is the path where you
want the files you download to be placed.
- Once NewsBot is loaded, you'll want to add some
newsgroups. This is done with the "Add" button.
You'll need to know the names of the groups you want; you
might want to consult SAMPLE_G.TXT for some sample
newsgroup names.
- Press the "CONNECT" button to start everything.
All else should be mostly automatic from here on. Feel
free to tinker around with the menu items
- "-i INI_FILE_NAME". Sets the name
of the INI file to use. By default, newsbot uses the file
newsbot.ini in your windows directory. Specifying a
filename here will cause newsbot to use the file under
that name in newsbot's home directory. For example "Newsbot
-i news1.ini" would use the filename
"news1.ini" in newsbot's home directory.
"-auto"
. Will automatically begin
downloading news articles as soon as newsbot has loaded. Note:
For unregistered users, about screen will still be
displayed!
"-autoexit"
. Automatically exits
SBNews after all groups have been received.
"-noabout"
. Omits display of the
about dialog on startup of Newsbot. Note: Only
supported in registered version!
"-autorestart"
. Automatically
restarts after all groups have been received. Note:
Incompatible with -autoexit and -autocat
"-autocat"
. Generates image
catalogs in each group after all groups have been
received.
SBNews has a built in Jpeg viewer for viewing JPEG files
offline. The JPEG viewer can be operated in several ways:
- By double-clicking on a [RECEIVED] entry in Newsbot's log
window
- By single-clicking on one of the preview pictures on the
right hand side of Newsbot's main window.
- By using the FILE:VIEW JPEG option from Newsbot's
pulldown menu.
Once loaded, the JPEG viewer window will display the JPG file
selected and include a listing of filenames and directories on
the left-hand side of the window. This listing is provided to let
you easilly select other files to view. Double clicking on a
directory in the directory list will change to that directory and
double clicking on a filename in the file list will display that
file. The CHDIR and VIEW buttons perform the same functions,
respectively.
The "<" and ">" buttons will view
the immediately previous and next files in the currently selected
directory. You may use these to rapidly view a list of files that
were downloaded overnight for example. The ">"
button will activate a slideshow of the current directory. Each
file will be displayed, there will be a slight delay, and the
next file will be displayed.
The FILE menu of the viewer has several options:
- Open File: Allows you to open a new file using the
windows common-dialog open method.
- Delete File: Deletes the current file that you are
viewing.
- Save as BMP: Saves the current image as a BMP image,
which may later be used as windows wallpaper, for
example.
- Set as Wallpaper: Saves the current file as a BMP and
tells window to use it was wallpaper.
The SlideShow menu has a few options as well:
- Start/Stop: Starts and stops the slideshow, equivalent to
the ">" button.
- Set Interval: Sets the amount of delay between slideshow
pictures.
- "Normal Mode": Images will be displayed in the
jpeg viewers window.
- "Wallpaper Mode": Images will also be set as
WallPaper. This sort-of turns your entire desktop into a
slideshow display. You can minimize the JPEG viewer
window and continue to do work while images display in
the background as wallpaper.
Appearance:
Newsbot has several different main window styles. You may
select whichever style appeals to you most, they all convey more
or less the same information:
- Small: Designed to display as much information as
possible in a small footprint. The "file log"
is unavailable, but all other information in present.
- Large: Default configuration for systems with 800x600
resolution or above.
- Windows-95 Tabbed: Same footprint as the small dialog,
but uses a tab control to select different panels of
information. Somewhat less cluttered, but you will need
to use the tab between the information you want to see.
Large current/previous thumbnails.
The host name is the name of the nntp host which NewsBot will
connect to. If you don't already know this, then you may wish to
contact your Internet Service Provider (ISP) or check an existing
news program on your system for the name.
The base path where downloaded files will be placed. If you
enable any of the download path expansion (see misc.
preferences), then files may be downloaded into sub-directories
of this base path.
By default, all files will be downloaded into the download
directory that you have specified. However, this can lead to
confusion, since you will not know which files came from which
newsgroups. Thus, there are several "path expansion"
options that will create a sub-directory tree for the individual
groups. The options are:
- None: All files go into the same
subdirectory
- Group Number: Files are placed in
numerical directories, according to their order in your
newsgroup list.
- Expanded Group Name: Each field of the
Newsgroup name is made into a subdirectory name. For
example, "alt.binaries.pictures.misc" would
become "alt\binaries\pictures\misc". This
option is useful for Windows-3.1 systems, since
subdirectory names may only be a short length.
- Full Group Name: The entire Newsgroup
name is used as one big subdirectory name. This is only
available in Windows-95 with long filename support.
Path Expansion Example
(alt.binaries.pictures.misc)
Path Exp Setting |
Resultant Path |
None |
c:\download\ |
Group Number |
c:\download\group1\ |
Exp Group Name |
c:\download\alt\binaries\pictures\misc\ |
Full Group Name |
c:\download\alt.binaries.pictures.misc\ |
For the "group name" options, you may wish to trim
repetitive prefixes from the front of the path name. For example,
there's no need to include "alt.binaries.pictures" in
the beginning of each pathname -- it just leads to wasted space.
The Path Prefix Removal box is used for this
purpose. Any string(s) that you enter in this box will be removed
from the front of the newsgroup name before the path is
generated.
Miscellaneous preference options are located under the
Configure menu under the heading Preferences.
- Logging: Logging options control the
creation of the newsbot.log file. The log may be useful
so that you have more information about the files that
you have downloaded. The log filename is
"newsMMDD.log", where MM is the current month
and DD is the current day. For example, the log for June
6 would be saved in "news0606.log". The log is
plain-ascii and may be viewed with any text editor, or by
using the dos "type" command.
- Log Enabled: if checked, then a log file
will be written. Otherwise, no log will be written.
- From user name: records the name
of the person who posted the message
- Subject: records the subject
line of the message
- Comments: records the first few
non-file lines of the message. Useful if the
sender has prefaced the message with some
explanation of it's content.
- Too Small: Messages that are
below the minimum message line limit (see below).
- Too Long: Messages that are
above the maximum message line limit (see below).
- Dupe File: Files caught by the
dupe-file checker.
- Dupe Subject: Messages caught by
the dupe-subject checker.
- Mime Headers: Really just
debugging info for me....
- Delete old Logs: If checked,
then any log files not equal to the current date
will be deleted when newsbot is run. This
minimizes
- Case Conversion: These options will
convert the case of the filename to a uniform format. Due
to limitations in the 16-bit application model, filenames
will probably be always upper case in the 16-bit
executable.
- None: Leave filename case as it
appears over the modem.
- Lower: Convert all filenames to
lower case.
- Upper: Convert all filenames to
upper case.
- Disconnect/Reconnect: The
disconnect/reconnect system will automatically disconnect
and reconnect the current nntp session in order to abort
a message that is being transmitted. This is used to save
time, rather than downloading entire messages which are
not stored on disk. If you disable disconnect/reconnect,
then Newsbot will receive the entire message, although
the message will not be written to disk. You may toggle
disconnect/reconnect on or off for several subcases:
- No-Data: Messages with more
non-encoded lines than the no-data threshold set
below.
- Dupe-File: Files caught by the
dupe-file checker (not relevent to the
dupe-subject checker, as the dupe-subject checker
won't download the message in the first place)
- No-Mask: Files which do not
match an acceptable file mask.
- Current-Delete: Files which were
deleted during transfer by a user request.
- Lockout-File: Files whose names
are matched by the lockout filename system.
- Lockout-XRef: Messages where a
string in the XRef field matches the lockout-xref
system.
- Lockout-Host: Messages where the
NNTP-Posting-Host name matches the lockout-host
system.
Note: Sometimes the Disconnect/Reconnect options may
cause your news server to record an excessive load. This
is because Newsbot will disconnect the connection, but
your news server may continue to keep the connection open
for a short period of time. Thus, the news server may
think that you are using more connections than you really
are. This situation is unlikely, but if it does present a
problem, then you may wish to disable the
disconnect/reconnect options.
- Message Line Limits: Newsbot can limit
which messages are downloaded based on the length (number
of lines) of the message.
- Minimum Lines: Messages with
fewer than the specified number of lines will not
be downloaded. The rationale is that small
messages do not contain any useful information
(pictures are big!) and some efficiency may be
obtained by not downloading them.
- Maximum Lines: Messages with
greater than the specified number of lines will
not be downloaded. The rationale is that really
huge messages are a waste of time. It's usually
the case of someone who scanned in a picture at
too fine a resolution.
- "No Data" Threshold:
This specifies the number of lines that can be in
a message when no attached files (eg images) can
be found before the message is skipped. For
example, if 500 lines are scanned, and no
attached image is present, then the message is
probably grunged, a misplaced part of a multipart
encode, or some other useless data.
- Maximum XRef Limit: Many people have
complained about an excess of off-topic advertisement
messages (i.e. SPAM) being present in the newsgroups.
This option is designed in an attempt to avoid these
advertisements. Typically, advertisers will post their
messages to a large number of newsgroups, and thus the
XREF list for the article will have many entries. Newsbot
can be set to filter messages with too many XREF's out.
The default is set at 999 which effectively disables XREF
filtering. You can enable XREF filtering by entering a
lower number here (10 is a good place to start).
- Preload XRef hdrs: By default, the
"lockout XRef" and "Maximum XRef
Limit" systems do not filter out an invalid message
until the download has begun. This is because the XREF
line is not transmitted until the message is sent.
However, you can tell Newsbot to download all the XREF
headers ahead of time, before any messages are
downloaded. Thus, Newsbot will know the XRef information
ahead of time and can avoid the unwanted messages
entirely. However, XRef information is somewhat bulky,
and does require a fair amount of time to download the
headers. Thus, there is a trade-off involved. Feel free
to experiment.
The lockout lists are used to lock out messages that contain
certain text strings. For example, you may not wish to download
messages written by a certain person, or messages whose title
contains a certain string. There are several types of lockout
lists supported:
- Lockout Poster (i.e. Author): Applies to the
"From:" line of the nntp message. Use this if
you wish not to receive messages from a specific person.
- Lockout Subject: Applies to the
"Subject:" line of the news message. This is
useful if there are certain keywords which you want to
lockout if they are present in the subject field.
- Lockout FileName: Prevents specific filenames from
being downloaded. You can type a specific filename or a
wildcard specification to lockout a group of files.
- Lockout Host: Checks the
"NNTP-Posting-Host" field of the news message.
This is useful to eliminate posts from a specific news
server. Many obnoxious posters will alter their email
name in the from field, making them impossible to lockout
by normal means. However, the news server usually will
put it's correct host name in the
"NNTP-Posting-Host". Thus, this field may be
used to weed out the obnoxious poster. However, be warned
that locking out a NNTP-Posting-Host will lockout all users
who post from that host!
- Lockout XRef: Checks the "XRef " field
of the news message. The XRef field contains a list of
other newsgroups that the message is cross-posted in.
Sometimes a person will post a message into several
groups at once -- this is called crossposting. Sometimes
an obnoxious poster will cross-post their messages into
groups that they don't belong in.
All lockout lists are case insensitive (i.e. capitalization
does not matter). All lockout lists support wildcard characters,
* and ?. "*" is interpreted as "any sequence of
zero or more characters" and "?" is interpreted as
"any single character". Here are a few sample wildcard
strings:
- (Lockout-Poster) "joe@fubar.com" would refuse
any messages from joe@fubar.com
- (Lockout-Poster) "*@foobar.com" would refuse
messages from ANYONE at fubar.com
- (Lockout-Poster) "joe@*" would refuse messages
from "joe" no matter where he posts from
- (Lockout-Filename) "*.zip" would refuse any
file that had an extention of ZIP
- (Lockout-Filename) "fastcash*.*" would refuse
any file which had "fastcash" as the first 8
characters.
- (Lockout-Filename) "index-?.jpg" would refuse
any file which had "index-", exactly one more
character, and then ".jpg" as an extention.
- (Lockout-Host) "news.somehost.com" would refuse
any messages posted on "news.somehost.com"
- (Lockout-XRef) "alt.binaries.pictures.erotica"
would refuse any messages that are cross-posted into
"alt.binaries.pictures.erotica"
The is an additional toggle which lets you toggle between an
"exact match" and "match any position in
text".
- Exact-Match: The search pattern must match all of
the text exactly; There can be no leading or trailing
characters. For example, "abc" would not match
"abcd" or "xabc".
- Match-Any-Position: The search pattern can match
any position in the text. There can be leading or
trailing characters. For example, "abc" would
match "abcd" and "xabc".
There is a TEST button on the lockout dialog which will let
you enter a string and see if it matches anything. This is useful
if your a little confused about the wildcard strings and want to
make sure what you entered does actually do what you think it
should.
Note #1: The "NNTP-Posting-Host" and
"XRef" lines of a message are not normally displayed by
SBNews, so you might be wondering where to get the information to
type into those lockout lists. If you notice a particularly
annoying message, you can use the Headers
button to bring up a header listing for the group, then use the Read
button to display the full text of the annoying message. Inside
the read screen will be a header listing where you can find
NNTP-Posting-Host and XRef fields.
Note #2: Some lockout settings, such as Lockout
Poster and Lockout Subject can be
determined before the message is downloaded. Thus, Newsbot will
bypass the message completely. Others, such as Lockout
File, Lockout Host, and Lockout
XRef, can not be determined until the message has
begun to be received. Thus, Newsbot must start receiving the
message and then abort the message while the download is in
progress.
The authentication options are used for news servers which
require a user name and password to access the newsgroups. If
your news server does not require a name and password, then you
should leave this option alone. Authentication is located under
the Configure menu.
Newsbot allows you to specify which types of files will be
downloaded. Specifying "*.*" will enable any file to be
downloaded, and this is the default. For example, if you only
wanted to receive images, then you may wish to remove *.* and add
in *.gif and *.jpg. If you only wanted archives, add in *.zip.
etc.
Two different types of dupe checking are supported. They may
be used independantly, or you can use both of them at the same
time if you wish. By default, the dupe checkers maintain a list
of approximately the most recent 2048 messages received.
- Subject Dupe Checker: Remembers
duplicates by keeping track of the subject of the
message. Normally, the Dupe-Subject checker should be
used with the "Consider 'From'" setting checked
as enabled. Thus, if the same person posts multiple
messages with the same subject, the message will be
flagged as a dupe. This is useful for preventing download
of crossposted messages (i.e. identical messages posted
in multiple groups)
- Filename Dupe Checker: Remembers
duplicates by keeping track of the names of the files
that were downloaded. This is very effective at weeding
out dupes, but has the side effect of sometimes deleting
messages that are not really duplicates. For example,
there are a lot of files name "1.JPG" out
there!
Both of the two above mentioned dupe checkers have some
options that control their behaviour:
- Reject Duplicates: If checked, then dupe
checking will be performed as stated above. If not
checked, then no dupe checking will be performed.
- Save dupe list: If checked, then dupe
information will be saved from session to session. If not
checked, then dupe information will only be kept for the
current session.
- Items to Keep: This is the number of
items to "remember" for the dupe checker. A
bigger number here will remember more messages (or
files), but will also require more memory and/or disk
space to hold to dupe checker information.
- Consider "From": If enabled,
requires the from fields of two messages to be identical
for the files to be considered duplicates. A good idea
for the dupe subject checker, but probably not advisable
for the dupe file checker.
- Consider "Lines": If enabled,
requires the number of lines of two messages to be within
a 10% tolerance for the messages to be considered
duplicates. For multi-part files, only the first message
is considered.
Automatic save is used to automatically save the current state
of Newsbot periodically during operation. This is useful since if
Newsbot crashes, Newsbot will be able to start back up at the
correct position in the newsgroups and with all of your dupe
checking and lockout information enabled. There are two types of
autosave:
- AutoSave Pointers: Saves message
pointers. These pointers tell newsbot your current
position in the newsgroups, i.e. how many messages you
have already seen in each groups.
- AutoSave Everything: This saves all
state information, including the dupe checkers (assuming
the dupe checkers have saving enabled), HTTP grabber,
lockout lists, messages pointers, etc.
Auto-Encryption is intended to keep files that you download
"private". This is primarily in case you are
downloading sensitive information that you don't want
unauthorized people to be able to access. If Auto-Encryption is
enabled, then files will be encrypted with a key that you
specify. Only those persons who have this key will be able to
retrieve the information in these files.
Note #1: Be careful when you pick an encryption key. If you
forget what you picked, then there is no way to "find
out" what it was. Case is important, so make sure you know
what parts (if any) are capitolized. Longer encryption keys do
offer better protection. However, my algorithm is by no means
perfect, and someone with enough time and computing power could
undoubtably crack the code.
Note #2: Encrypted files cannot be "read" by
another other programs until they are decrpyted in some manner!
No other program except Newsbot and it's associated utilities can
deal with files encrypted in this manner.
There are two ways to retrieve (or "decrypt")
encrypted files:
- With the built-in jpeg viewer. The built
in jpeg viewer will automatically detect encrypted files
and prompt you for the decryption key if necessary.
- With decrypt.exe. Decrypt.exe (and for
that matter, Encrypt.exe) are standalone utilities that
you can use to encrypt and/or decrypt a series of files.
Once decrypted with decrypt.exe, a file may be read by
any other program. To decrypt files, use a command line
similar to the following:
decrypt
key filename.ext
Where "key" is the decrpytion key and
"filename.ext" is the filename that you want
decrypted. You may use wildcards in the filename and
decrypt.exe will decryypt all files matching the
specification. Decrypt.exe will not "mess up"
unencrypted files, nor will it improperly decrypt a file
if an improper key is supplied.
Decrypt.exe and Encrypt.exe have some additional
options. Run the program without any command line
parameters to get a list of applicable options.
FILES.BBS files are used by bulletin board systems to catalog
files in download directories. If you're the sysop of a bbs, then
you may find the automatic FILES.BBS generation useful.
Otherwise, it probably won't be of much interest to you.
Newsbot can create a variety of different FILES.BBS styles;
You may select whichever one works best with your bbs. If none of
them do, then you can probably find a conversion or import
program for your bbs software which will do the trick. The
various styles are listed in the listbox. Selecting
"none" will disable files.bbs generation.
By default, Newsbot will generate a FILES.BBS file in each
directory that files are downloaded into. The FILES.BBS file will
be "appended", not "overwritten". You may
specify an alternate filename if you wish. If you include a
drive/directory in the filename, then all groups will be listed
in one big file.
The "Convert to 8.3" option will put standard DOS
8.3 (filename.ext) filenames in the FILES.BBS rather than the
Windows-95 long filenames. For example, "longname.jpg"
would be entered as "longna~1.jpg". This may be useful
for bbs programs which do not understand long filenames.
There are a variety of options listed under the statistics
menu item. Most of these simply return information that SBNews
has collected while processing newsgroups.
- Similar NewsGroups: Newsgroup headers
contain an "Xref:" line which lists other
newsgroups to which a specific article has been posted.
Many times, a poster will "cross-post" an
article to multiple groups which share the same
interests. The Similar Newsgroups list will display all
of the accumulated Xref's for the current area.
- HTTP Grabber: The HTTP Grabber
automatically keeps a list of any WWW addresses that are
found in the text and/or subject lines of downloaded
messages. In addition, a HTML file called HTTPGRAB.HTM
will be written to the newsbot directory. You may load
this file with your WWW browser and look up any
references that SBNews found.
- Dupe File List: The list that the
dupe-checker keeps in memory to catch dupe files.
- Dupe Subject List: The list that the
dupe-checker keeps in memory to catch duplicate subject
lines.
- Latency: A measurement of how long it
takes your news server to respond to a command message.
Times listed are in milliseconds (1 second = 1000
milliseconds). Slower or highly-used servers will have
greater latency times, whereas faster servers will have
low latency times. If you are using other Internet
applications at the same time as NewsBot, then the
latency figure may rise.
Coord.Exe (only available in 32-bit version) may be used to
synchronize the dupe-checking capabilities of multiple news
robots. Running multiple Newsbots at the same time may allow you
to better utilize some of the slower news servers out there. To
do this, you must run COORD.EXE before loading any instances of
Newsbot.
Some notes about using Coord:
- Coord should be considered EXTREMELY BETA. It seems to be
working for most of our testers, but there hasn't been
time to conduct widespread testing at this time.
- Make sure you run it before loading any newsbots.
- Coord is used to run multiple newsbots at a time -- if
you only want to run one newsbot, then don't bother with
it!
- Coord has it's own preferences for the dupe checkers --
you might want to be sure to set them to what you want.
- Dupe Checking Statistics in Newsbot will be unavailable
while running coord (use coord's dupe statistics instead)
Although most of the core of Newsbot is finished, there are
still quite a few bells and whistles that I'm planning to add. If
you have any features that you would like to see that aren't
listed below, then please email me. The following are some of the
more important things on my list:
- Ability to specify search filters (i.e. only retrieve
messages with xxx in the subject field)
- User specified download paths
- Multithreaded operation in 32-bit version
SBNews/Newsbot is a shareware program and as such, you are
only granted the right to operate it for a limited time to
evaluate it's performance. Continued usage requires registration
in the amount of $15.00.
For information on registration, please see REGISTER.DOC (or
REGISTER.HTM).