bottom of page  |  cpt index

CPT - The Primer

1. Introduction
    1.1 What is CPT for? 1.2 Who will need it?
    1.3 Multi-platform support? 1.4 What is the primer for?
2. The Crossword Power Tools Ideology
3. Crossword Set

    3.1 Header Elements 3.2 Crossword Elements
4. The Source-Target Paradigm
5. The Data

    5.1 Data Folders 5.2 Data Formats 5.3 Encoding
6. Crossword Properties
    6.1 Styles 6.2 Words
    6.3 Blacks and Whites 6.4 Symmetries
    6.5 Structure
7. General Notes about the GUI

1. Introduction

1.1 What is CPT for?

The Crossword Power Tools (CPT for short) is software for creation of crosswords from scratch to Postscript or HTML files. It is a collection of tools that allow just with some number of mouse clicks to see the printout of the new generated crossword.

The features include:

1.2 Who will need it?

It is for all crossword setters. The CPT programs are not demanding too many resources and can be run on home PC. The dictionary tools are stand alone applications and can be used by anyone.

1.3 Multi-platform support?

CPT modules are written in Java and in ANSI C but the current programs can be run only on x86 PCs under Windows and Linux. The supported Java versions are from 1.1 and above.

1.4 What is the primer for?

First of all, it is an overview of the current version of CPT. Second, it is a 'glossary' - all basic notions are described here. Any of the programs has its separate manual but it should be used together with this document.

The version of this document is 1.0.

 

2. The Crossword Power Tools Ideology

We don't think the software alone can do high quality crosswords. This is creative human task. Our goal is to support the users with flexible tools.

There are three major stages of creating the crossword in CPT. The first step is to create the layout of the diagram. The second one is filling the grid with words and the third one is adding the clues.

 

All steps could be done by hand (with the Editor) or using the generators. The outputs of the generators are in different data formats called 'diagrams', 'grids', and 'crosswords'. They differ in the contents and in the presentation of the diagrams. For example, B&W diagrams are simple bitmaps. The 'grids' hold the diagrams and the words. The 'crosswords' contain all information of a complete crossword.

The 'minor' stages include handling the word lists and the clues. The dictionary tools include the programs CPT Word Lists and CPT Dictionary. The first is able to create highly compressed word lists/dictionaries and the second is dictionary browser.

For the major stages we have two applications: CPT Diagrams and CPT Crosswords. The second program includes CPT-Diagrams, CPT-Words, CPT-Clues generators, CPT-Editor, and a small subset of the dictionary tools.

The data processing in CPT is pipelined. For most of the tasks the input is a collection of items or set, and the output is a new set. This way the unit of processing is not a single crossword but a crossword set.

 

3. Crossword Set

Strictly speaking, the crossword set is collection of items having the same size (columns x rows) and the same data format. Here 'set' is used in the mathematical sense - the software is ensuring no repetitions of items when the set is saved after any operation. The set has a header and contains one or more crosswords.

3.1 Header Elements

3.2 Crossword Elements

The elements described here do not reflect the real data structures but they appear this way in the dialogs where the user can see or edit the data.

 

4. The Source-Target Paradigm

The Source and the Target are merely working files that serve all modules in the system. For example, for any generator you have to select the input data into a Source and the result of the generation will be written into a Target. For the automated queries and for the manual selections the results are put into a Source. Here is a simple picture of these operations:

Any of these working files is 'crossword set', described in the previous chapter. Once again, they are working files and after any operation if you want to keep the result, save them into library file, because after the next operation they might be overwritten. The library files are also sets and they hold your data. During the save operation the Source/Target contents is added to the selected library with possible conversion and all duplicates are removed.

When the target operation includes supporting data (dictionaries), this data is called 'Base'. The notion of 'Base' is used in CPT Word Lists and in CPT Crosswords programs.

 

5. The Data

There are numerous details in the data management, which are not interesting for the user but the notions described here should be clear.

5.1 Data Folders

The Data Folder in general is filtered set of files of a directory defined by the user. The Data Folder types are:

To browse or edit something you have to select the folder type first, to define the directory and filters if necessary, and then to start the Editor.

5.2 Data Formats

The Data Formats of crossword library files are:

Why many data formats? Because this way we can hold for example, 10000 B&W diagrams of size 15x15 just in 120 KB library file first, and second, for effective data processing. The penalty is that conversions should be done. When the user follows the standard steps, the conversions are done automatically. But there are cases where the user should point out the conversion.

Single crossword files

The files from Files type folder can have different data format as well. The files with extension INI, PAT, LAT, and XBM can hold diagrams only. The others can hold diagrams or complete crosswords. Our native single crossword text file formats/extensions are INI and CPT. They can hold the data elements described in the Crossword Set chapter. This may not be true for the other external files produced by programs similar to the CPT kit. The only file format/extension that can support Locale and Encoding is CPT. On the other site, the non-native external files might contain properties, which are not supported and ignored by CPT modules. These are the scrambling of answers, non standard word numbers, and more than one letter in a cell.

The Data Formats of dictionary files are:

The CTree is binary dictionary file supported by all CPT programs. 'Crossword form' means that the words are converted to lower case, and all non-letter characters are ignored during the creation. The data attached to the word could be one or more clues with 'xc' tag, and one or more answers with 'xa' tag.

5.3 Encoding

The different language writings are supported by computers via hundreds of encoding schemes. They are classified by the supported scripts and by the number bytes used per character. For example, the European languages can use one byte, while some of the Asian languages and Unicode schemes use two and more bytes.

The processing of different encoded texts is supported via pair of converters. The input converter is byte-to-character (btc), which translates bytes from the source encoding to Unicode characters. The output converter is character-to-byte (ctb) and its task is to translate Unicode characters to the target encoding. In CPT programs most of the names of encoding converters from Sun's Java international RTE are built in. There is a mechanism via 'User Defined Encoding' to use any available converter when it is not in the built in list. The same mechanism is used to select the converters, developed by CPT programmers. In our dialog boxes, where the built in list is shown, the encoding names are ordered as follows:

The input files could be in any encoding, but internally, our programs work only in 'one-byte' and 'two-bytes Unicode' modes. This means that the crosswords and the dictionaries could be in any one-byte encoding or in Unicode, and in some dialogs the bottom half of the list will contain only one entry - Unicode.

The crossword modules support single character only (one-byte or two-bytes Unicode) in a letter cell. These means that an encoding using two or more character per letter should not be used. If there is no proper encoding converter, a new converter could be created or a custom Unicode normalization should be done. Our samples include Vietnamese crosswords using the VN1 converter, and Thai crosswords using custom Unicode normalization.

When a Unicode crossword is saved as a text file (CPT format), it is converted to the custom UnicodeASCII encoding.

 

6. Crossword Properties

For any crossword, in any data format, the CPT modules are maintaining list of attributes or properties, which are shown in the windows, used in the queries or used for evaluation of the crossword quality.

6.1 Styles

The traditions and the crossword publishers have imposed some restrictions of the structure of the diagrams. We made summaries of these restrictions in several styles and built them into the software. We have to note that the requirements for high quality crosswords lie on the shoulders of the designers, we support them just with low level technical details.

NY Times

According to this style the minimum word length should be 3. The diagram should be square and should have standard symmetry. There should be no unches (unchecked letters), and there are restrictions on the number of words and blacks depending on the grid size. Usually the standard sizes are 15x15, 19x19, and 21x21. In CPT this restriction is relaxed and sizes as 12x12, 30x30, etc., are accepted as well.

Scandy

This style has very few restrictions. The top row and the left column should start with black and any odd position should contain black on these lines. The structure of the diagram should allow drawing the clues into the black boxes and the direction of clue arrow marks should be only on right and down.

Clues Inside

The only restriction of this style is the structure of the diagram to allow allocating the clues inside the black boxes. The Scandy style is subset of this one but the diagrams can have very different layout and the algorithms for allocating clue positions are different (see below).

Black Grid

The diagram contains blacks in even rows in every even column (as minimum) . These diagrams usually are used for cryptic crossword. The style is supported only by CPT Diagrams generator.

Free

This is the style where no built in restrictions are imposed. The diagram generation process depends only on the parameters given by the user.

Clue Allocation

The clue allocation inside black boxes is supported for B&W diagrams having enough blacks. Any black cell can contain one or two clues. The program uses two algorithms: 'light' one for Scandy style and 'heavy' one for Clues Inside style. The user can start the algorithms manually for any diagram as well. The first one is relatively fast because the clue arrow direction is only down or right - for any clue the black box used is on top or on left of the word starting cell (and the like for reversed words: for horizontal word - on top or on right, and for vertical word - on bottom or on right). The second algorithm can take very long time because all possible combinations are checked - the clue arrow can be in any direction and in the word ending cell as well.

The clue allocated positions could be saved if the crossword contains clues. In all other cases the clue allocated positions are used just for the display and the printing.

6.2 Words

The structural properties about the words are:

6.3 Blacks and Whites

These properties include:

6.4 Symmetries

The list includes:

When the diagram has horizontal and vertical symmetry it is standard symmetrical as well but the reverse is not always true.

6.5 Structure

These properties include:

 

7. General Notes about the GUI

When you start some of the programs, first you will see only the small top window with tabs like Source, Target, and Browse, and a button bar. Very soon the screen will not be enough.

There are no menus (with small exceptions in CPT Editor and CPT Word Lists). All options are selected via numerous dialogs started by buttons. There is a general rule about the numeric parameters: the value of -1 means "set the default value or ignore". For non-modal dialogs the OK button will only save the data but will not close the window. In these cases the Dismiss button will hide the window. The OK button in the top window will save the current state of all parameters you have set.

The layout of the buttons is just pictures (without text) and in the documentation they are referred by the contents of their tool tips, shown when the mouse is over a button. Some of the buttons are context sensitive - especially the Start button in the top window, it can run many different operations depending on the current tab and the options set.

The font used by all modules is defined in View Options dialog in CPT Editor.

In all windows the input focus is following the mouse movements. The 'keyboard focus' (using the Tab key) is also supported. When the focus is on a button, the keys Enter or Space Bar can be used to 'press' the button.

The communication with the clipboard is in Unicode and in logical order for the RTL scripts (under Linux you can use keyboard/clipboard one-byte encoding as well).
In most text fields/areas of the programs you can click with the right mouse button to show the Popup menu having the following, depending of the context, items:

Note: For the Linux version to select an item from the Popup menu, move the mouse pointer over the item and then release the right button.


top of page  |  cpt index
© 1998-2004 CPT Software. All rights reserved.
1