How To Use Scanf
The Scan For field in the Import/Export field specification is used to pull out parts of a string before importing or exporting.
This field uses the scanf function in 'C' which offers very flexible ways to parsing strings.
We use a single string argument, so the simplest expression you can use in the Scan For field is %s. This would
extract the first white space delimited text in the target string.
Description
The scanf function reads data from the standard input stream, and writes the data into the location given by argument.
Each argument must be a pointer to a variable; this variable must be of a type that corresponds to a type specifier in format.
If copying takes place between strings that overlap, the behavior is undefined.
Format Specification
A format specification has the following form:
% [ * ] [ width ] [ { h | l | L } ] type
The format argument specifies the interpretation of the input and can contain one or more of the following.
* White space characters: blank (' '); tab ('\t'); or newline ('\n').
A white space character causes scanf to read, but not store, all consecutive white space characters in the input up to the next
non-white space character. One white space character in the format matches any number (including 0) and combination of
white space characters in the input.
* Non white space characters, except for the percent sign (%).
A non white space character causes scanf to read, but not store, a matching non white space character. If the next character in
stdin does not match, scanf terminates.
* Format specifications, introduced by the percent sign (%).
A format specification causes scanf to read and convert characters in the input into values of a specified type.
The value is assigned to an argument in the argument list.
The format is read from left to right. Characters outside format specifications are expected to match the sequence of
characters in stdin; the matching characters in stdin are scanned but not stored. If a character in stdin conflicts
with the format specification, scanf terminates, and the character is left in stdin as if it had not been read.
When the first format specification is encountered, the value of the first input field is converted according to this
specification. It is then stored in the location that is specified by the first argument. The second format
specification causes the second input field to be converted and stored in the second argument,
and so on through the end of the format string.
An input field is defined as all characters up to the first white space character (space, tab, or newline), or up to the first
character that cannot be converted according to the format specification, or, finally, until the field width (if specified) is
reached. If there are too many arguments for the given specifications, the extra arguments are evaluated but ignored. The
results are unpredictable if there are not enough arguments for the format specification.
Each field of the format specification is a single character or a number signifying a particular format option. The type
character, which appears after the last optional format field, determines whether the input field is interpreted as a character,
a string, or a number. The simplest format specification contains only the percent sign and a type character (for example, %s).
If a percent sign (%) is followed by a character that has no meaning as a format-control character, that character and the
following characters (up to the next percent sign) are treated as an ordinary sequence of characters; in other words, a sequence
of characters that must match the input. For example, to specify that a percent-sign character is to be input, use %%.
An asterisk (*) following the percent sign suppresses assignment of the next input field, which is interpreted as a field of the
specified type. The field is scanned but not stored. See the examples at the end of this document for clarification of this
point.
Type
The type character is the only required format field; it appears after any optional format fields. The type character determines
whether the associated argument is interpreted as a character, string, or number.
In the Collect! "Scan For" function, only the 's' type specifier is used. A 1024 character buffer is used to store the
text extracted by the scanf function. The resulting text is then read into a specified field, or exported to file
based on standard Collect! formatting specifiers.
Type Characters for scanf functions |
Character | Type of Input Expected | Type of Argument |
c | When used with scanf functions, specifies single-byte character. |
White space characters that are ordinarily skipped are read when c is specified. To read next non white space
single-byte character, use %1s; to read next non white space wide character, use %1ws. Pointer to char when used with
scanf functions, pointer to wchar_t when used with wscanf functions. |
C | When used with scanf functions, specifies wide character. |
White space characters that are ordinarily skipped are read when C is specified. To read next non white space single-by
te
character, use %1s; to read next non white space wide character, use %1ws. Pointer to wchar_t when used with scanf
functions, pointer to char when used with wscanf functions. |
d | Decimal integer. | Pointer to int. |
i | Decimal, hexadecimal, or octal integer. | Pointer to int. |
o | Octal integer. | Pointer to int. |
u | Unsigned decimal integer. | Pointer to unsigned int. |
x | Hexadecimal integer. | Pointer to int. |
e, E, f, g, G | Floating-point value consisting of optional sign (+ or -), series of one or more
decimal digits containing decimal point, and optional exponent ('e' or 'E') followed by an optionally signed integer value.
| Pointer to float. |
n | No input read from stream or buffer. | Pointer to int, into which is stored number of
characters successfully read from stream or buffer up to that point in current call to scanf functions or wscanf functions.
|
s | String, up to first white space character (space, tab or newline). | To read strings not
delimited by space characters, use set of square brackets ([ ]), as discussed following Table R.7. When used with scanf
functions, signifies single-byte character array; when used with wscanf functions, signifies wide-character array. In
either case, character array must be large enough for input field plus terminating null character, which is automatically
appended. |
S | String, up to first white space character (space, tab or newline). | To read strings not
delimited by space characters, use set of square brackets ([ ]), as discussed preceding this table. When used with scanf
functions, signifies wide-character array; when used with wscanf functions, signifies single-byte character array. In
either case, character array must be large enough for input field plus terminating null character, which is automatically
appended. |
The types c, C, s, and S are Microsoft extensions and are not ANSI-compatible.
Thus, to read single-byte or wide characters with scanf functions and wscanf functions, use format specifiers as follows.
To Read Character As |
Use This Function |
With These Format Specifiers |
single byte | scanf functions | c, hc, or hC |
wide | scanf functions | C, lc, or lC |
To scan strings with scanf functions, and wscanf functions, use the prefixes h and l analogously with format
type-specifiers s and S.
Width
Width is a positive decimal integer which controls the maximum number of characters to be read from stdin. No more than width
characters are converted and stored at the corresponding argument. Fewer than width characters may be read if a white space
character (space, tab, or newline) or a character that cannot be converted according to the given format occurs before
width is reached.
The optional prefixes h, l, and L indicate the size of the argument (long or short, single-byte character or wide character,
depending upon the type character that they modify). These format-specification characters are used with type characters in
scanf or wscanf functions to specify interpretation of arguments as shown in the table below. The type prefixes h, l, and L
are Microsoft extensions and are not ANSI-compatible. The type characters and their meanings are described in ANSI C
documentation.
Size Prefixes for scanf and wscanf Format-Type Specifiers |
To Specify | Use Prefix |
With Type Specifier |
double | l | e, E, f, g, or G |
long int | l | d, i, o, x, or X |
long unsigned int | l | u |
short int | h | d, i, o, x, or X |
short unsigned int | h | u |
Single-byte character with scanf | h | c or C |
Single-byte character with wscanf | h | c or C |
Wide character with scanf | l | c or C |
Wide character with wscanf | l | c, or C |
Single-byte character string with scanf | h | s or S |
Wide-character string with scanf | l | s or S |
Examples
Following are examples of the use of scanf functions.
Scan For: %s
// Reads a string.
In the Collect! Scan For function only the 's' type specifier is used. A 1024 character buffer is used to store the
text extracted by the scanf function. The resulting text is then read into a specified field, or exported to file
based on standard Collect! formatting specifiers.
To read strings not delimited by space characters, a set of characters in brackets ([ ]) can be substituted for the s (string)
type character. The corresponding input field is read up to the first character that does not appear in the bracketed character
set. If the first character in the set is a caret (^), the effect is reversed. The input field is read up to the first character
that does appear in the rest of the character set.
Note that %[a-z] and %[z-a] are interpreted as equivalent to %[abcde...z]. This is a common scanf function extension, but note
that the ANSI standard does not require it.
To store a string without storing a terminating null character ('\0'), use the specification %nc where n is a decimal integer.
In this case, the c type character indicates that the argument is a pointer to a character array. The next n characters are read
from the input stream into the specified location, and no null character ('\0') is appended. If n is not specified, its
default value is 1.
The scanf function scans each input field, character by character. It may stop reading a particular input field before it
reaches a space character for a variety of reasons.
1. The specified width has been reached.
2. The next character cannot be converted as specified.
3. The next character conflicts with a character in the control string
that it is supposed to match.
4. The next character fails to appear in a given character set.
For whatever reason, when the scanf function stops reading an input field, the next input field is considered to begin at the
first unread character. The conflicting character, if there is one, is considered unread and is the first character of the next
input field or the first character in subsequent read operations.
Example 1
Scan For: %*s %s
// Omits the first string and reads the next string.
This command will skip over the first string and output the second. This is useful for extracting the last name from a name
string such as John Doe.
Input: John Doe
Using %*s %s the output is Doe.
Using %s the output is John.
Example 2
Scan For: %*s %[a-zA-Z,.]
// Omits the first string and reads all the characters in the [ ] and stops at the first unknown character.
This command will skip over the first string and output the remaining text. This is useful for extracting the first string from
a field and bringing in the remaining data from a Debtor Company such as 567 Collections, Inc.
Input: 567 Collections, Inc.
Using %*s %[a-zA-Z,.] the output is Collections, Inc.
Example 3
Scan For: %[^*]
// Brings in all characters before the first [*].
This command will output all the text from a field that is before the first [*] character. This is useful for extracting text
from a field with extra unneeded text such as John Doe **.
Input: John Doe **
Using %[^*] the output is John Doe
Example 4
Scan For: %[^,]
// Outputs all the text before the first comma.
This command will all the text before the first comma.
This is useful for extracting a last name with a generation from a Name field such as Doe III, John.
Input: Doe III, John
Using %[^,] the output is Doe III
Example 5
Scan For: %*[^,]%*[,]%[^\"]
// Omits everything before the comma. Then omits the comma. Lastly outputs all remaining text to end of field.
This command will skip over all the text before and including the comma, then output the remaining text. This is useful for
extracting a first name with middle names from a Name field such as Doe, John Harry William.
Input: Doe, John Harry William
Using %*[^,]%*[,]%[^\"] the output is John Harry William
Phone Number Import
You can use scanf strings in imports for outputting different parts of phone numbers. The commands will extract
a specific number of characters from a phone number string.
Your import map needs three fields - Area Code, Exchange and Number. You can use the append, default values and other parameters
in the field specifications.
Scan For: %3s // Outputs Area Code
Scan For: %*3s %3s // Outputs Exchange
Scan For: %*6s %4s // Outputs Number
You cannot use a default value like a dash in the field specification using the scanf. They must be a
separate field specification.
Input: 2503910466
Using %3s for Area Code, the output is 250
Using %*3s %3s for Exchange, the output is 391
Using %*6s %4s for Number, the output is 0466
Troubleshooting
Scan For only supports 1 return result. If you try to load more than one result into a
field, Collect! will provide the below message indicating the Record Definition and
Field that need to be corrected.
ScanF Message
|
Was this page helpful? Do you have any comments on this document? Can we make it better? If so how may we improve this page.
Please click this link to send us your comments: helpinfo@collect.org