Parses a stream into a set of defined tokens, one at a time. The different types of tokens that can be found are numbers, identifiers, quoted strings, and different comment styles. The class can be used for limited processing of source code of programming languages like Java, although it is nowhere near a full parser.
Constant Summary
int | TT_EOF | The constant representing the end of the stream. |
int | TT_EOL | The constant representing the end of the line. |
int | TT_NUMBER | The constant representing a number token. |
int | TT_WORD | The constant representing a word token. |
Field Summary
public double | nval | Contains a number if the current token is a number (ttype ==
TT_NUMBER ). |
public String | sval | Contains a string if the current token is a word (ttype ==
TT_WORD ). |
public int | ttype | After calling nextToken() , ttype contains the type of
token that has been read. |
Public Constructor Summary
Public Method Summary
void |
commentChar(int ch)
Specifies that the character
ch shall be treated as a comment
character. |
void |
eolIsSignificant(boolean flag)
Specifies whether the end of a line is significant and should be returned
as
TT_EOF in ttype by this tokenizer. |
int |
lineno()
Returns the current line number.
|
void |
lowerCaseMode(boolean flag)
Specifies whether word tokens should be converted to lower case when they
are stored in
sval . |
int |
nextToken()
Parses the next token from this tokenizer's source stream or reader.
|
void |
ordinaryChar(int ch)
Specifies that the character
ch shall be treated as an ordinary
character by this tokenizer. |
void |
ordinaryChars(int low, int hi)
Specifies that the characters in the range from
low to hi
shall be treated as an ordinary character by this tokenizer. |
void |
parseNumbers()
Specifies that this tokenizer shall parse numbers.
|
void |
pushBack()
Indicates that the current token should be pushed back and returned again
the next time
nextToken() is called. |
void |
quoteChar(int ch)
Specifies that the character
ch shall be treated as a quote
character. |
void |
resetSyntax()
Specifies that all characters shall be treated as ordinary characters.
|
void |
slashSlashComments(boolean flag)
Specifies whether "slash-slash" (C++-style) comments shall be recognized.
|
void |
slashStarComments(boolean flag)
Specifies whether "slash-star" (C-style) comments shall be recognized.
|
String |
toString()
Returns the state of this tokenizer in a readable format.
|
void |
whitespaceChars(int low, int hi)
Specifies that the characters in the range from
low to hi
shall be treated as whitespace characters by this tokenizer. |
void |
wordChars(int low, int hi)
Specifies that the characters in the range from
low to hi
shall be treated as word characters by this tokenizer. |
Inherited Method Summary
Constants
public static final int TT_EOF
The constant representing the end of the stream.
public static final int TT_EOL
The constant representing the end of the line.
public static final int TT_NUMBER
The constant representing a number token.
public static final int TT_WORD
The constant representing a word token.
Fields
public double nval
Contains a number if the current token is a number (ttype
==
TT_NUMBER
).
public int ttype
After calling nextToken()
, ttype
contains the type of
token that has been read. When a single character is read, its value
converted to an integer is stored in ttype
. For a quoted string,
the value is the quoted character. Otherwise, its value is one of the
following:
-
TT_WORD
- the token is a word. -
TT_NUMBER
- the token is a number. -
TT_EOL
- the end of line has been reached. Depends on whethereolIsSignificant
istrue
. -
TT_EOF
- the end of the stream has been reached.
Public Constructors
public StreamTokenizer (InputStream is)
This constructor is deprecated.
Use StreamTokenizer(Reader)
Constructs a new StreamTokenizer
with is
as source input
stream. This constructor is deprecated; instead, the constructor that
takes a Reader
as an arugment should be used.
Parameters
is | the source stream from which to parse tokens. |
---|
Throws
NullPointerException | if is is null . |
---|
public StreamTokenizer (Reader r)
Constructs a new StreamTokenizer
with r
as source reader.
The tokenizer's initial state is as follows:
- All byte values 'A' through 'Z', 'a' through 'z', and '\u00A0' through '\u00FF' are considered to be alphabetic.
- All byte values '\u0000' through '\u0020' are considered to be white space. '/' is a comment character.
- Single quote '\'' and double quote '"' are string quote characters.
- Numbers are parsed.
- End of lines are considered to be white space rather than separate tokens.
- C-style and C++-style comments are not recognized.
Parameters
r | the source reader from which to parse tokens. |
---|
Public Methods
public void commentChar (int ch)
Specifies that the character ch
shall be treated as a comment
character.
Parameters
ch | the character to be considered a comment character. |
---|
public void eolIsSignificant (boolean flag)
Specifies whether the end of a line is significant and should be returned
as TT_EOF
in ttype
by this tokenizer.
Parameters
flag | true if EOL is significant, false otherwise.
|
---|
public int lineno ()
Returns the current line number.
Returns
- this tokenizer's current line number.
public void lowerCaseMode (boolean flag)
Specifies whether word tokens should be converted to lower case when they
are stored in sval
.
Parameters
flag | true if sval should be converted to lower
case, false otherwise.
|
---|
public int nextToken ()
Parses the next token from this tokenizer's source stream or reader. The
type of the token is stored in the ttype
field, additional
information may be stored in the nval
or sval
fields.
Returns
- the value of
ttype
.
Throws
IOException | if an I/O error occurs while parsing the next token. |
---|
public void ordinaryChar (int ch)
Specifies that the character ch
shall be treated as an ordinary
character by this tokenizer. That is, it has no special meaning as a
comment character, word component, white space, string delimiter or
number.
Parameters
ch | the character to be considered an ordinary character. |
---|
public void ordinaryChars (int low, int hi)
Specifies that the characters in the range from low
to hi
shall be treated as an ordinary character by this tokenizer. That is,
they have no special meaning as a comment character, word component,
white space, string delimiter or number.
Parameters
low | the first character in the range of ordinary characters. |
---|---|
hi | the last character in the range of ordinary characters. |
public void parseNumbers ()
Specifies that this tokenizer shall parse numbers.
public void pushBack ()
Indicates that the current token should be pushed back and returned again
the next time nextToken()
is called.
public void quoteChar (int ch)
Specifies that the character ch
shall be treated as a quote
character.
Parameters
ch | the character to be considered a quote character. |
---|
public void resetSyntax ()
Specifies that all characters shall be treated as ordinary characters.
public void slashSlashComments (boolean flag)
Specifies whether "slash-slash" (C++-style) comments shall be recognized. This kind of comment ends at the end of the line.
Parameters
flag | true if // should be recognized as the start
of a comment, false otherwise.
|
---|
public void slashStarComments (boolean flag)
Specifies whether "slash-star" (C-style) comments shall be recognized. Slash-star comments cannot be nested and end when a star-slash combination is found.
Parameters
flag | true if /* should be recognized as the start
of a comment, false otherwise.
|
---|
public String toString ()
Returns the state of this tokenizer in a readable format.
Returns
- the current state of this tokenizer.
public void whitespaceChars (int low, int hi)
Specifies that the characters in the range from low
to hi
shall be treated as whitespace characters by this tokenizer.
Parameters
low | the first character in the range of whitespace characters. |
---|---|
hi | the last character in the range of whitespace characters. |
public void wordChars (int low, int hi)
Specifies that the characters in the range from low
to hi
shall be treated as word characters by this tokenizer. A word consists of
a word character followed by zero or more word or number characters.
Parameters
low | the first character in the range of word characters. |
---|---|
hi | the last character in the range of word characters. |