tnt.metrics
Class TnKSPC

java.lang.Object
  extended by tnt.metrics.TnKSPC
All Implemented Interfaces:
TnMetric

public class TnKSPC
extends java.lang.Object
implements TnMetric

This class calculates the KSPC (Keystrokes per Character) for a given corpus and ambiguous keyboard mapping. It combines much of the functionality of previously written classes EncodedWord.java, KSPCWords.java and T9.java, but also expands functionality by allowing user-defined letter-key mappings.

An ambiguous keyboard is one whereby multiple letters map to the same key, such as a telephone keypad. Unlike a standard, full-sized QWERTY keyboard where there exists a one-to-one mapping of letter to key, an ambiguous keyboard requires a disambiguating algorithm to determine which letters the user intended. (A common technology for use with the telephone keypad is T9 by Tegic Communications, Inc. [http://www.tegic.com].)

A typical approach is to analyze the keystrokes on a word-by-word basis. For each cluster of keystrokes representing a word, determine all possible words represented by that sequence of keystrokes and present them to the user in decreasing order of frequency within the language. The user then cycles through the list and selects the intended word. The current word is terminated with a space and the input of the next word begins. This implementation assumes the user presses the 'NEXT' key to cycle through the list and the 'SPACE' to select the word and space-terminate the word.

This approach requires two inputs prior to running: a mapping of letters to keys, and a list of words and their frequencies within a corpus (a body of work representative of a language).

File Formats:

Letter-Key Mapping:

The text file containing mapping information should have a descriptive name (e.g. T9.txt) and contain two lines consisting of the letters of the alphabet in lower case and the keys. Each key must be represented by a single character and each letter must align with its corresponding key.

                abcdefghijklmnopqrstuvwxyz
                22233344455566677778889999
        
Word Frequency Data:

The text file containing word frequency information should have a descriptive name (e.g. the name of the corpus used) and each line must be a whitespace delimited list of a word in lower case and its frequency.

                ...
                able    26890
                cake    2256
                bald    569
                ...
        
Invocation:

                PROMPT>java TnKSPC mapping wordfreq [-e] [-s] [-a] [-k]

                where:
                        mapping  = file containing letter-key mapping
                        wordfreq = file containing word and frequency values
                        -e = outputs KCME value
                        -s = outputs summary data
                        -a = outputs ambiguous word sets
                        -k = outputs word-freq-keystroke data

                Default output is KSPC26 and KSPC27 values only.

                See JavaDoc for more information.
        

Version:
1.0 - 2001 (EncodedWord.java, KSPCWords.java, and T9.java),
1.1 - 08/2005 (Streamlined and renamed TnKSPC.java.),
1.2 - 02/2006 (Progress reporting and halting functionality added.),
1.3 - 06/2006 (Added KCME calculation and reporting.)
Author:
Scott MacKenzie,
Steven J. Castellucci
See Also:
KSPC (Keystrokes per Character) as a characteristic of text entry techniques

Constructor Summary
TnKSPC(java.lang.String mapping, java.lang.String wordfreq)
          Reads the mapping file and initializes parameters to calculate KSPC.
TnKSPC(java.lang.String alpha, java.lang.String keys, java.lang.String wordfreq)
          Initializes parameters to calculate KSPC.
 
Method Summary
 void cancel()
          Signals calculation of this metric to stop.
 boolean cancelled()
          Returns whether or not cancel() was called on this metric.
 java.lang.String[] getAmbigWords()
          Returns an array of strings, each string is a space-delimited list of similarly ambiguous words (i.e.
 double getKCME()
          Returns the KCME value for this key-character mapping.
 double getKSPC26()
          Returns the KSPC value for the 26 letters of English alphabet.
 double getKSPC27()
          Returns the KSPC value for the 26 letters of English alphabet, plus the space character.
 int getMaxNext()
          Returns the maximum number of presses of NEXT used to input any word in the corpus.
 int getNumAmbigWords()
          Returns the number of ambiguous words (i.e.
 int[] getPressesOfNext()
          Returns a breakdown of the number of words requiring presses of NEXT.
 float getProgress()
          Returns a float in the range [0..1], representing the progress of the process() method.
 int getTotalChars26()
          Returns the total number of characters in the corpus (not including the space character).
 int getTotalChars27()
          Returns the total number of characters in the corpus (including the space character).
 int getTotalKs26()
          Returns the total number of keystrokes represented by the corpus (not including the space character).
 int getTotalKs27()
          Returns the total number of keystrokes represented by the corpus (including the space character).
 int getTotalWords()
          Returns the number of words in the corpus.
 WordFreqKs[] getWordFreqKs()
          Returns the array of entries, each consisting of a word, its frequency and the keystrokes required to input it.
 java.lang.String kcmeText()
          Returns the KCME value, formatted to be printed to the console.
 java.lang.String kspcText()
          Returns the KSPC values, formatted to be printed to the console.
static void main(java.lang.String[] args)
          Allows this class to be run from the command-line.
 void printAmbiguousWordSets()
          Outputs ambiguous word sets to the console.
 void printKeystrokeData()
          Outputs word-freq+keystroke data to the console.
 void process()
          Performs the required calculations and actions to determine the value and result of this metric.
 java.lang.String summaryDataText()
          Returns summary data, formatted to be printed to the console.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TnKSPC

public TnKSPC(java.lang.String mapping,
              java.lang.String wordfreq)
       throws FormatException,
              java.io.FileNotFoundException,
              java.io.IOException
Reads the mapping file and initializes parameters to calculate KSPC.

Parameters:
mapping - the name of the mapping file.
wordfreq - the name of the word-frequency file.
Throws:
FormatException - if the file has a formating error.
java.io.FileNotFoundException - if the file cannot be read.
java.io.IOException - if an IO error occurs.

TnKSPC

public TnKSPC(java.lang.String alpha,
              java.lang.String keys,
              java.lang.String wordfreq)
       throws FormatException,
              java.io.FileNotFoundException,
              java.io.IOException
Initializes parameters to calculate KSPC.

Parameters:
alpha - a String with all the letters of the alphabet.
keys - a String with the keys in the same index as their corresponding letters of the alphabet.
wordfreq - the name of the word-frequency file.
Throws:
FormatException - if the file has a formating error.
java.io.FileNotFoundException - if the file cannot be read.
java.io.IOException - if an IO error occurs.
Method Detail

process

public void process()
Description copied from interface: TnMetric
Performs the required calculations and actions to determine the value and result of this metric.

Specified by:
process in interface TnMetric

getProgress

public float getProgress()
Description copied from interface: TnMetric
Returns a float in the range [0..1], representing the progress of the process() method.

Specified by:
getProgress in interface TnMetric
Returns:
a float in the range [0..1], representing the progress of the process() method.

cancel

public void cancel()
Description copied from interface: TnMetric
Signals calculation of this metric to stop.

Specified by:
cancel in interface TnMetric

cancelled

public boolean cancelled()
Description copied from interface: TnMetric
Returns whether or not cancel() was called on this metric.

Specified by:
cancelled in interface TnMetric
Returns:
true iff cancel() was called on this metric object.
See Also:
TnMetric.cancel()

getKSPC26

public double getKSPC26()
Returns the KSPC value for the 26 letters of English alphabet.

Returns:
the KSPC value for the 26 letters of English alphabet.

getKSPC27

public double getKSPC27()
Returns the KSPC value for the 26 letters of English alphabet, plus the space character.

Returns:
the KSPC value for the 26 letters of English alphabet, plus the space character.

getKCME

public double getKCME()
Returns the KCME value for this key-character mapping.

Returns:
the KCME value for this key-character mapping.

getMaxNext

public int getMaxNext()
Returns the maximum number of presses of NEXT used to input any word in the corpus.

Returns:
the most presses of NEXT used to input any word in the corpus.

getNumAmbigWords

public int getNumAmbigWords()
Returns the number of ambiguous words (i.e. the number of words that require at least one presses of NEXT) in the corpus when using the defined key-letter mapping.

Returns:
the number of ambiguous words.

getTotalChars26

public int getTotalChars26()
Returns the total number of characters in the corpus (not including the space character).

Returns:
the number of characters in the corpus.

getTotalChars27

public int getTotalChars27()
Returns the total number of characters in the corpus (including the space character).

Returns:
the number of characters (including spaces) in the corpus.

getTotalKs26

public int getTotalKs26()
Returns the total number of keystrokes represented by the corpus (not including the space character).

Returns:
the number of keystrokes represented by the corpus.

getTotalKs27

public int getTotalKs27()
Returns the total number of keystrokes represented by the corpus (including the space character).

Returns:
the number of keystrokes represented by the corpus.

getTotalWords

public int getTotalWords()
Returns the number of words in the corpus.

Returns:
the number of words in the corpus.

getPressesOfNext

public int[] getPressesOfNext()
Returns a breakdown of the number of words requiring presses of NEXT.

Returns:
an array of int representing the number of words that require the number of presses of NEXT represented by the array index.

getAmbigWords

public java.lang.String[] getAmbigWords()
Returns an array of strings, each string is a space-delimited list of similarly ambiguous words (i.e. words represented by the same keystrokes, not including presses of NEXT).

Returns:
an array of String, each of which lists words represented by the same keystrokes.

getWordFreqKs

public WordFreqKs[] getWordFreqKs()
Returns the array of entries, each consisting of a word, its frequency and the keystrokes required to input it. The keystrokes include the appropriate presses of NEXT and SPACE.

Returns:
an array of WordFreqKs objects.

kspcText

public java.lang.String kspcText()
Returns the KSPC values, formatted to be printed to the console. For example:
                KSPC26 = 1.0078596059054359
                KSPC27 = 1.0064113710167126
                

Returns:
a String representing KSPC values formatted to be printed to the console.

kcmeText

public java.lang.String kcmeText()
Returns the KCME value, formatted to be printed to the console. For example:
                KCME = 2.680683434
                

Returns:
a String representing KCME values formatted to be printed to the console.

summaryDataText

public java.lang.String summaryDataText()
Returns summary data, formatted to be printed to the console. For example:
                Number of words: 9022
                Ambiguous words: 1064 (11.8%)
                Ambiguous words requiring...
                        0 presses of NEXT: 476
                        1 presses of NEXT: 476
                        2 presses of NEXT: 83
                        3 presses of NEXT: 23
                        4 presses of NEXT: 5
                        5 presses of NEXT: 1
                Words requiring at least one press of NEXT: 588 ( 6.5%)
                

Returns:
a String representing summary data formatted to be printed to the console.

printAmbiguousWordSets

public void printAmbiguousWordSets()
Outputs ambiguous word sets to the console. For example:
                ... (truncated)
                car bar cap
                case care base card bare cape
                based cared
                cases cards acres bases
                basin cargo
                cars bars bass
                ... (truncated)
                


printKeystrokeData

public void printKeystrokeData()
Outputs word-freq+keystroke data to the console. For example:
                ... (truncated)
                able    26890   2253S
                cake    2256    2253NS
                bald    569     2253NNS
                calf    561     2253NNNS
                calendar        1034    22536327S
                baker   1716    22537S
                cakes   828     22537NS
                ... (truncated)
                


main

public static void main(java.lang.String[] args)
                 throws FormatException,
                        java.io.FileNotFoundException,
                        java.io.IOException
Allows this class to be run from the command-line.

Throws:
FormatException
java.io.FileNotFoundException
java.io.IOException


Copyright © 2006 Steven Castellucci and Scott MacKenzie. All Rights Reserved.