Factor Snob, commands and hints

Chris Wallace's 'Factor Hierarchical Snob'

The computer program: [Factor Snob], download.
 
Please cite:
C. S. Wallace, Statistical and Inductive Inference by Minimum Message Length, Springer, 2005,
and
C. S. Wallace and P. R. Freeman, Single Factor Estimation by MML, J. Royal Stat. Soc. B, Vol.54, No.1, pp.195-209, 1992.

Commands

adjust         deldad    insdad     prclass     splitleaf      
assign         doall     killclass  ranclass    sto            
bestdeldad     dodads    killpop    readsamp    stop           
bestinsdad     dogood    killsons   rebuild     thing          
bestmoveclass  doleaves  moveclass  restorepop  tree           
binhier        file      nosubs     samps       trep           
copypop        flatten   pickpop    savepop     trymoves       
crosstab       help      pops       select         

Data

The following works. (It is possible that factor Snob may be more flexible about input format than suggested here.)
datafile.vset describes the "type" of the data, and datafile.samp gives the precision and the data values.
 
datafile.vset
Variables_for_Hybrids
6
 
height  1
sex     2       2
colour  2       6
yield   1
orientation     4
dry_weight      1
description of variables (columns), # of columns, blank line, then one line per column, 1 ~ continuous, 2 ~ discrete plus arity, 3 ~ Poisson(?), 4 ~ von Mises(?).
 
datafile.samp
Name_of_the_dataset
Variables_for_Hybrids
0.3
 
 
.01
1  10
2.0
 
5
101 37.1 1 3   18.2  73  13
115     22      2       6       ==      -50     14
103     =       =       1       30.5    100     30
-555    16.2    1       2       12      -30     8.4
501     40      2       4       12.2    200     ===
Name of the data set, description of variables (as in datafile.vset), then one line per column: precision (of a real variable), blank for a discrete variable, blank line, then number of data (rows), data (NB. 1st col is datum number, -ve ~ don't use).
NB. Discrete values are coded from 1 up.
(=+ for missing values.)

Hints and Tactics:

repeat a few times:
doall 50
trymoves 2
 
In interactive mode, experiment to find a sequence of commands that does what you want, then put that sequence into a command file. Then run factor snob with std-input redirected (<) from the command file and std-output redirected (>) to an output file. (It looks like csw had not gotten around to making file versions of all of the reporting commands. Also see 'file' command.) It works for me, e.g.,
 
cmdfile:
datafile.vset
datafile.samp
 
doall 50
trymoves 2
 
doall 50
trymoves 2
 
doall 50
tree
 
prclass -2 1
trep thingReportFile
stop
 
then run:
factorsnob < cmdfile > outputfile
-- L.A., 2008.
adjust controls which aspects of a population model will be changed. Follow it with one or more characters from:
'a'll, 's'cores, 't'ree, 'p'arams, each followed by '+' or '-' to turn adjustment on or off. Thus "adjust t-p+" disables tree structure adjustment and enables class parameter adjustment. 'p' does not include scores.
assign <c> controls how things are assigned to classes. The character c should be one of 'p'artial, 'm'ost_likely or 'r'andom; default is 'p'.
Partial is usually what you want. I'm guessing that random is in proportion to posterior prob. of membership.
bestdeldad guesses the most profitable dad to delete and tries it.
bestinsdad makes a guess at the most profitable insdad and tries it.
bestmoveclass guesses the most profitable moveclass and tries it.
binhier <N> inserts dads to convert tree to a binary hierarchy, then deletes dads to improve. If N>0, first flattens tree.
copypop <N, P> copies the "work" model to a new model. N selects the level of detail:
N=0 copies no thing weights or scores, leaving the new popln unattached to a sample.
P is the new popln name.
crosstab <P> shows the overlap between the leaves of work and the leaves of model P (which must be attached to the current sample.) The overlap is shown as a permillage of the number of active things, in a cross-tabulation table. A table entry for leaf serial Sw of work and leaf serial Sp of P shows the permillage of all active things which are in both leaves. Crosstab <work> is also meaningful. Here, an entry for leaves S1, S2 shows the permillage of thing which are partially assigned to both classes.
deldad <S> replaces dad class S by its sons
doall <N> does N steps of top-down re-estimation and assignment.
Seems to produce only binary trees; seem to need 'trymoves' to change tree structure.
dodads <N> iterates adjustment of parameter costs and dad parameters till stability or for N cycles
dogood <N> does N cycles of doleaves(3) and dodads(2) or until stable.
doleaves <N> does N steps of re-estimation and assignment to leaf classes subclasses and tree structure usually unaffected. Bottom-up reassignment of weights to dad classes. Doleaves gives faster refinement than Doall, but no tree adjustment. Always followed by one Doall step.
file <F> switches command input to file name F. Command input will return to keyboard at end of F or error, unless F contains a "file" command. F may contain a file command with its own name, returning to the start of F.
flatten makes all twigs (non-root classes with no sons) immediate leaves
insdad <S1, S2> where classes S1,S2 are sibs inserts a new Dad with S1,S2 as childen. The new Dad is son of original Dad od S1, S2.
killclass <S> kills class serial S. Descendants also die.
killpop <P> destroys a popln model. P must be the popln index or FULL name
killsons <S> kills all sons, grandsons etc. of class serial S. S becomes leaf
moveclass <S1 S2> moves class S1 (and any dependent subtree) to be a son of class S2, which must be a dad. S1 may not be an ancestor of S2.
nosubs <N> kills and prevents birth of subclasses if N>0
pickpop <P> copies popln P to "work". P = popln index or FULL name
pops lists the defined models.
prclass <S, N> prints properties of class S. If N>0, prints parameters If S=-1, prints all dads and leaves. If S=-2, includes subs.
'prclass -2 1' works (?there may be a bug in prclass -1?)
ranclass <N> destroys the current model and inserts N random leaves
readsamp <F> reads in a new sample from file F. Sample must use a Vset already loaded.
rebuild flattens the tree, then greedily rebuilds it.
restorepop <F> reads a model from file named F, as saved by 'savepop'. If model unattached, or attached to an unknown sample, it is attached to the current sample if any.
samps lists the known samples.
savepop <P, N, F> records model P on file named F, unattached if N=0
select <S> first copies the current work model to an unattached model called OldWork, replaces the current sample by sample <S> where S is either name or index, then picks OldWork to model it, getting thing weights and scores. The 'adjust' state is left as adjusting only scores, not params or tree.
splitleaf <S>, if class S is a leaf, makes S a dad and its subclasses leaves
sto , to stop, please use the full word "stop".
stop stops both sprompt and cnob
thing <N> finds the sample thing with identifier N.
tree prints a summary of control settings and the hierarchic tree
trep <F> writes a thing report on file F. Assigns each thing to smallest class in which thing has weight >0.5, and to most likely leaf.
trymoves <N> attempts moveclasses until N successive failures; converts a binary-tree into a better structure.