|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objecttnt.metrics.TnKSPC
public class TnKSPC
This class calculates the KSPC (Keystrokes per Character) for a given
corpus and ambiguous keyboard mapping. It combines much of the
functionality of previously written classes EncodedWord.java
,
KSPCWords.java
and T9.java
, but also expands
functionality by allowing user-defined letter-key mappings.
An ambiguous keyboard is one whereby multiple letters map to the same key, such as a telephone keypad. Unlike a standard, full-sized QWERTY keyboard where there exists a one-to-one mapping of letter to key, an ambiguous keyboard requires a disambiguating algorithm to determine which letters the user intended. (A common technology for use with the telephone keypad is T9 by Tegic Communications, Inc. [http://www.tegic.com].)
A typical approach is to analyze the keystrokes on a word-by-word basis. For each cluster of keystrokes representing a word, determine all possible words represented by that sequence of keystrokes and present them to the user in decreasing order of frequency within the language. The user then cycles through the list and selects the intended word. The current word is terminated with a space and the input of the next word begins. This implementation assumes the user presses the 'NEXT' key to cycle through the list and the 'SPACE' to select the word and space-terminate the word.
This approach requires two inputs prior to running: a mapping of letters to keys, and a list of words and their frequencies within a corpus (a body of work representative of a language).
File Formats:
Letter-Key Mapping:
The text file containing mapping information should have a descriptive name
(e.g. T9.txt
) and contain two lines consisting of the letters
of the alphabet in lower case and the keys. Each key must be
represented by a single character and each letter must align with its
corresponding key.
abcdefghijklmnopqrstuvwxyz 22233344455566677778889999Word Frequency Data:
The text file containing word frequency information should have a descriptive name (e.g. the name of the corpus used) and each line must be a whitespace delimited list of a word in lower case and its frequency.
... able 26890 cake 2256 bald 569 ...Invocation:
PROMPT>java TnKSPC mapping wordfreq [-s] [-a] [-k] where: mapping = file containing letter-key mapping wordfreq = file containing word and frequency values -s = outputs summary data -a = outputs ambiguous word sets -k = outputs word-freq-keystroke data Default output is KSPC26 and KSPC27 values only. See JavaDoc for more information.
EncodedWord.java
,
KSPCWords.java
, and T9.java
), TnKSPC.java
.), Constructor Summary | |
---|---|
TnKSPC(java.lang.String mapping,
java.lang.String wordfreq)
Reads the mapping file and initializes parameters to calculate KSPC. |
|
TnKSPC(java.lang.String alpha,
java.lang.String keys,
java.lang.String wordfreq)
Initializes parameters to calculate KSPC. |
Method Summary | |
---|---|
void |
cancel()
Signals calculation of this metric to stop. |
boolean |
cancelled()
Returns whether or not cancel() was called on this metric. |
java.lang.String[] |
getAmbigWords()
Returns an array of strings, each string is a space-delimited list of similarly ambiguous words (i.e. |
double |
getKSPC26()
Returns the KSPC value for the 26 letters of English alphabet. |
double |
getKSPC27()
Returns the KSPC value for the 26 letters of English alphabet, plus the space character. |
int |
getMaxNext()
Returns the maximum number of presses of NEXT used to input any word in the corpus. |
int |
getNumAmbigWords()
Returns the number of ambiguous words (i.e. |
int[] |
getPressesOfNext()
Returns a breakdown of the number of words requiring presses of NEXT. |
float |
getProgress()
Returns a float in the range [0..1], representing the
progress of the process() method. |
int |
getTotalChars26()
Returns the total number of characters in the corpus (not including the space character). |
int |
getTotalChars27()
Returns the total number of characters in the corpus (including the space character). |
int |
getTotalKs26()
Returns the total number of keystrokes represented by the corpus (not including the space character). |
int |
getTotalKs27()
Returns the total number of keystrokes represented by the corpus (including the space character). |
int |
getTotalWords()
Returns the number of words in the corpus. |
WordFreqKs[] |
getWordFreqKs()
Returns the array of entries, each consisting of a word, its frequency and the keystrokes required to input it. |
java.lang.String |
kspcText()
Returns the KSPC values, formatted to be printed to the console. |
static void |
main(java.lang.String[] args)
Allows this class to be run from the command-line. |
void |
printAmbiguousWordSets()
Outputs ambiguous word sets to the console. |
void |
printKeystrokeData()
Outputs word-freq+keystroke data to the console. |
void |
process()
Performs the required calculations and actions to determine the value and result of this metric. |
java.lang.String |
summaryDataText()
Returns summary data, formatted to be printed to the console. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TnKSPC(java.lang.String mapping, java.lang.String wordfreq) throws FormatException, java.io.FileNotFoundException, java.io.IOException
mapping
- the name of the mapping file.wordfreq
- the name of the word-frequency file.
FormatException
- if the file has a formating error.
java.io.FileNotFoundException
- if the file cannot be read.
java.io.IOException
- if an IO error occurs.public TnKSPC(java.lang.String alpha, java.lang.String keys, java.lang.String wordfreq) throws FormatException, java.io.FileNotFoundException, java.io.IOException
alpha
- a String
with all the letters of the
alphabet.keys
- a String
with the keys in the same index as
their corresponding letters of the alphabet.wordfreq
- the name of the word-frequency file.
FormatException
- if the file has a formating error.
java.io.FileNotFoundException
- if the file cannot be read.
java.io.IOException
- if an IO error occurs.Method Detail |
---|
public void process()
TnMetric
process
in interface TnMetric
public float getProgress()
TnMetric
float
in the range [0..1], representing the
progress of the process()
method.
getProgress
in interface TnMetric
float
in the range [0..1], representing the
progress of the process()
method.public void cancel()
TnMetric
cancel
in interface TnMetric
public boolean cancelled()
TnMetric
cancel()
was called on this metric.
cancelled
in interface TnMetric
true
iff cancel()
was called on this
metric object.TnMetric.cancel()
public double getKSPC26()
public double getKSPC27()
public int getMaxNext()
public int getNumAmbigWords()
public int getTotalChars26()
public int getTotalChars27()
public int getTotalKs26()
public int getTotalKs27()
public int getTotalWords()
public int[] getPressesOfNext()
int
representing the number of words
that require the number of presses of NEXT represented by the
array index.public java.lang.String[] getAmbigWords()
String
, each of which lists words
represented by the same keystrokes.public WordFreqKs[] getWordFreqKs()
WordFreqKs
objects.public java.lang.String kspcText()
KSPC26 = 1.0078596059054359 KSPC27 = 1.0064113710167126
String
representing KSPC values formatted to be
printed to the console.public java.lang.String summaryDataText()
Number of words: 9022 Ambiguous words: 1064 (11.8%) Ambiguous words requiring... 0 presses of NEXT: 476 1 presses of NEXT: 476 2 presses of NEXT: 83 3 presses of NEXT: 23 4 presses of NEXT: 5 5 presses of NEXT: 1 Words requiring at least one press of NEXT: 588 ( 6.5%)
String
representing summary data formatted to be
printed to the console.public void printAmbiguousWordSets()
... (truncated) car bar cap case care base card bare cape based cared cases cards acres bases basin cargo cars bars bass ... (truncated)
public void printKeystrokeData()
... (truncated) able 26890 2253S cake 2256 2253NS bald 569 2253NNS calf 561 2253NNNS calendar 1034 22536327S baker 1716 22537S cakes 828 22537NS ... (truncated)
public static void main(java.lang.String[] args) throws FormatException, java.io.FileNotFoundException, java.io.IOException
FormatException
java.io.FileNotFoundException
java.io.IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |