Awk: Difference between revisions
awk one-liners |
No edit summary |
||
| Line 1: | Line 1: | ||
== One-liners == | |||
<pre> | <pre> | ||
HANDY ONE-LINERS FOR AWK 22 July 2003 | HANDY ONE-LINERS FOR AWK 22 July 2003 | ||
compiled by Eric Pement <pemente@northpark.edu> version 0.22 | compiled by Eric Pement <pemente@northpark.edu> version 0.22 | ||
Latest version of this file is usually at: | |||
http://www.student.northpark.edu/pemente/awk/awk1line.txt | |||
USAGE: | USAGE: | ||
Unix: awk '/pattern/ {print "$1"}' # standard Unix shells | |||
DOS/Win: awk '/pattern/ {print "$1"}' # okay for DJGPP compiled | |||
awk "/pattern/ {print \"$1\"}" # required for Mingw32 | |||
Most of my experience comes from version of GNU awk (gawk) compiled for | Most of my experience comes from version of GNU awk (gawk) compiled for | ||
| Line 37: | Line 32: | ||
FILE SPACING: | FILE SPACING: | ||
# double space a file | |||
awk '1;{print ""}' | |||
awk 'BEGIN{ORS="\n\n"};1' | |||
# double space a file which already has blank lines in it. Output file | |||
# should contain no more than one blank line between lines of text. | |||
# NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are | |||
# often treated as non-blank, and thus 'NF' alone will return TRUE. | |||
awk 'NF{print $0 "\n"}' | |||
# triple space a file | |||
awk '1;{print "\n"}' | |||
NUMBERING AND CALCULATIONS: | NUMBERING AND CALCULATIONS: | ||
# precede each line by its line number FOR THAT FILE (left alignment). | |||
# Using a tab (\t) instead of space will preserve margins. | |||
awk '{print FNR "\t" $0}' files* | |||
# precede each line by its line number FOR ALL FILES TOGETHER, with tab. | |||
awk '{print NR "\t" $0}' files* | |||
# number each line of a file (number on left, right-aligned) | |||
# Double the percent signs if typing from the DOS command prompt. | |||
awk '{printf("%5d : %s\n", NR,$0)}' | |||
# number each line of file, but only print numbers if line is not blank | |||
# Remember caveats about Unix treatment of \r (mentioned above) | |||
awk 'NF{$0=++a " :" $0};{print}' | |||
awk '{print (NF? ++a " :" :"") $0}' | |||
# count lines (emulates "wc -l") | |||
awk 'END{print NR}' | |||
# print the sums of the fields of every line | |||
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}' | |||
# add all fields in all lines and print the sum | |||
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}' | |||
# print every line after replacing each field with its absolute value | |||
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }' | |||
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }' | |||
# print the total number of fields ("words") in all lines | |||
awk '{ total = total + NF }; END {print total}' file | |||
# print the total number of lines that contain "Beth" | |||
awk '/Beth/{n++}; END {print n+0}' file | |||
# print the largest first field and the line that contains it | |||
# Intended for finding the longest string in field #1 | |||
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}' | |||
# print the number of fields in each line, followed by the line | |||
awk '{ print NF ":" $0 } ' | |||
# print the last field of each line | |||
awk '{ print $NF }' | |||
# print the last field of the last line | |||
awk '{ field = $NF }; END{ print field }' | |||
# print every line with more than 4 fields | |||
awk 'NF > 4' | |||
# print every line where the value of the last field is > 4 | |||
awk '$NF > 4' | |||
TEXT CONVERSION AND SUBSTITUTION: | TEXT CONVERSION AND SUBSTITUTION: | ||
# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format | |||
awk '{sub(/\r$/,"");print}' # assumes EACH line ends with Ctrl-M | |||
# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format | |||
awk '{sub(/$/,"\r");print} | |||
# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format | |||
awk 1 | |||
# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format | |||
# Cannot be done with DOS versions of awk, other than gawk: | |||
gawk -v BINMODE="w" '1' infile >outfile | |||
# Use "tr" instead. | |||
tr -d \r <infile >outfile # GNU tr version 1.22 or higher | |||
# delete leading whitespace (spaces, tabs) from front of each line | |||
# aligns all text flush left | |||
awk '{sub(/^[ \t]+/, ""); print}' | |||
# delete trailing whitespace (spaces, tabs) from end of each line | |||
awk '{sub(/[ \t]+$/, "");print}' | |||
# delete BOTH leading and trailing whitespace from each line | |||
awk '{gsub(/^[ \t]+|[ \t]+$/,"");print}' | |||
awk '{$1=$1;print}' # also removes extra space between fields | |||
# insert 5 blank spaces at beginning of each line (make page offset) | |||
awk '{sub(/^/, " ");print}' | |||
# align all text flush right on a 79-column width | |||
awk '{printf "%79s\n", $0}' file* | |||
# center all text on a 79-character width | |||
awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file* | |||
# substitute (find and replace) "foo" with "bar" on each line | |||
awk '{sub(/foo/,"bar");print}' # replaces only 1st instance | |||
gawk '{$0=gensub(/foo/,"bar",4);print}' # replaces only 4th instance | |||
awk '{gsub(/foo/,"bar");print}' # replaces ALL instances in a line | |||
# substitute "foo" with "bar" ONLY for lines which contain "baz" | |||
awk '/baz/{gsub(/foo/, "bar")};{print}' | |||
# substitute "foo" with "bar" EXCEPT for lines which contain "baz" | |||
awk '!/baz/{gsub(/foo/, "bar")};{print}' | |||
# change "scarlet" or "ruby" or "puce" to "red" | |||
awk '{gsub(/scarlet|ruby|puce/, "red"); print}' | |||
# reverse order of lines (emulates "tac") | |||
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file* | |||
# if a line ends with a backslash, append the next line to it | |||
# (fails if there are multiple lines ending with backslash...) | |||
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file* | |||
# print and sort the login names of all users | |||
awk -F ":" '{ print $1 | "sort" }' /etc/passwd | |||
# print the first 2 fields, in opposite order, of every line | |||
awk '{print $2, $1}' file | |||
# switch the first 2 fields of every line | |||
awk '{temp = $1; $1 = $2; $2 = temp}' file | |||
# print every line, deleting the second field of that line | |||
awk '{ $2 = ""; print }' | |||
# print in reverse order the fields of every line | |||
awk '{for (i=NF; i>0; i--) printf("%s ",i);printf ("\n")}' file | |||
# remove duplicate, consecutive lines (emulates "uniq") | |||
awk 'a !~ $0; {a=$0}' | |||
# remove duplicate, nonconsecutive lines | |||
awk '! a[$0]++' # most concise script | |||
awk '!($0 in a) {a[$0];print}' # most efficient script | |||
# concatenate every 5 lines of input, using a comma separator | |||
# between fields | |||
awk 'ORS=%NR%5?",":"\n"' file | |||
| Line 196: | Line 191: | ||
SELECTIVE PRINTING OF CERTAIN LINES: | SELECTIVE PRINTING OF CERTAIN LINES: | ||
# print first 10 lines of file (emulates behavior of "head") | |||
awk 'NR < 11' | |||
# print first line of file (emulates "head -1") | |||
awk 'NR>1{exit};1' | |||
# print the last 2 lines of a file (emulates "tail -2") | |||
awk '{y=x "\n" $0; x=$0};END{print y}' | |||
# print the last line of a file (emulates "tail -1") | |||
awk 'END{print}' | |||
# print only lines which match regular expression (emulates "grep") | |||
awk '/regex/' | |||
# print only lines which do NOT match regex (emulates "grep -v") | |||
awk '!/regex/' | |||
# print the line immediately before a regex, but not the line | |||
# containing the regex | |||
awk '/regex/{print x};{x=$0}' | |||
awk '/regex/{print (x=="" ? "match on line 1" : x)};{x=$0}' | |||
# print the line immediately after a regex, but not the line | |||
# containing the regex | |||
awk '/regex/{getline;print}' | |||
# grep for AAA and BBB and CCC (in any order) | |||
awk '/AAA/; /BBB/; /CCC/' | |||
# grep for AAA and BBB and CCC (in that order) | |||
awk '/AAA.*BBB.*CCC/' | |||
# print only lines of 65 characters or longer | |||
awk 'length > 64' | |||
# print only lines of less than 65 characters | |||
awk 'length < 64' | |||
# print section of file from regular expression to end of file | |||
awk '/regex/,0' | |||
awk '/regex/,EOF' | |||
# print section of file based on line numbers (lines 8-12, inclusive) | |||
awk 'NR==8,NR==12' | |||
# print line number 52 | |||
awk 'NR==52' | |||
awk 'NR==52 {print;exit}' # more efficient on large files | |||
# print section of file between two regular expressions (inclusive) | |||
awk '/Iowa/,/Montana/' # case sensitive | |||
SELECTIVE DELETION OF CERTAIN LINES: | SELECTIVE DELETION OF CERTAIN LINES: | ||
# delete ALL blank lines from a file (same as "grep '.' ") | |||
awk NF | |||
awk '/./' | |||
| Line 266: | Line 261: | ||
"sed & awk, 2nd Edition," by Dale Dougherty and Arnold Robbins | "sed & awk, 2nd Edition," by Dale Dougherty and Arnold Robbins | ||
O'Reilly, 1997 | |||
"UNIX Text Processing," by Dale Dougherty and Tim O'Reilly | "UNIX Text Processing," by Dale Dougherty and Tim O'Reilly | ||
Hayden Books, 1987 | |||
"Effective awk Programming, 3rd Edition." by Arnold Robbins | "Effective awk Programming, 3rd Edition." by Arnold Robbins | ||
O'Reilly, 2001 | |||
To fully exploit the power of awk, one must understand "regular | To fully exploit the power of awk, one must understand "regular | ||
expressions." For detailed discussion of regular expressions, see | expressions." For detailed discussion of regular expressions, see | ||
"Mastering Regular Expressions, 2d edition" by Jeffrey Friedl | "Mastering Regular Expressions, 2d edition" by Jeffrey Friedl | ||
(O'Reilly, 2002). | |||
The manual ("man") pages on Unix systems may be helpful (try "man awk", | The manual ("man") pages on Unix systems may be helpful (try "man awk", | ||
| Line 289: | Line 284: | ||
#---end of file--- | #---end of file--- | ||
</pre> | |||
== Explained == | |||
# http://www.catonmat.net/blog/awk-one-liners-explained-part-one/ | |||
# http://www.catonmat.net/blog/awk-one-liners-explained-part-two/ | |||
# http://www.catonmat.net/blog/awk-one-liners-explained-part-three/ | |||
# http://www.catonmat.net/blog/update-on-famous-awk-one-liners-explained/ | |||
# http://www.catonmat.net/download/awk.cheat.sheet.pdf | |||
== Cheatsheet == | |||
http://www.catonmat.net/blog/wp-content/plugins/wp-downloadMonitor/user_uploads/awk.cheat.sheet.pdf | |||
<pre> | |||
.-----------------------------------------------------------------------. | |||
| | | |||
| AWK Cheat Sheet | | |||
| | | |||
'-----------------------------------------------------------------------' | |||
| Peteris Krumins (peter@catonmat.net), 2007.08.22 | | |||
| http://www.catonmat.net - good coders code, great reuse | | |||
'-----------------------------------------------------------------------' | |||
===================== Predefined Variable Summary ===================== | |||
.-------------+-----------------------------------.---------------------. | |||
| | | Support: | | |||
| Variable | Description '-----.-------.-------' | |||
| | | AWK | NAWK | GAWK | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| FS | Input Field Separator, a space by | + | + | + | | |||
| | default. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| OFS | Output Field Separator, a space | + | + | + | | |||
| | by default. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| NF | The Number of Fields in the | + | + | + | | |||
| | current input record. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| NR | The total Number of input Records | + | + | + | | |||
| | seen so far. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| RS | Record Separator, a newline by | + | + | + | | |||
| | default. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| ORS | Output Record Separator, a | + | + | + | | |||
| | newline by default. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| FILENAME | The name of the current input | | | | | |||
| | file. If no files are specified | | | | | |||
| | on the command line, the value of | | | | | |||
| | FILENAME is "-". However, | + | + | + | | |||
| | FILENAME is undefined inside the | | | | | |||
| | BEGIN block (unless set by | | | | | |||
| | getline). | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| ARGC | The number of command line | | | | | |||
| | arguments (does not include | | | | | |||
| | options to gawk, or the program | - | + | + | | |||
| | source). Dynamically changing the | | | | | |||
| | contents of ARGV control the | - | + | + | | |||
| | files used for data. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| ARGV | Array of command line arguments. | | | | | |||
| | The array is indexed from 0 to | - | + | + | | |||
| | ARGC - 1. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| ARGIND | The index in ARGV of the current | - | - | + | | |||
| | file being processed. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| BINMODE | On non-POSIX systems, specifies | | | | | |||
| | use of "binary" mode for all file | | | | | |||
| | I/O.Numeric values of 1, 2, or 3, | | | | | |||
| | specify that input files, output | | | | | |||
| | files, or all files, respectively,| | | | | |||
| | should use binary I/O. String | | | | | |||
| | values of "r", or "w" specify | - | - | + | | |||
| | that input files, or output files,| | | | | |||
| | respectively, should use binary | | | | | |||
| | I/O. String values of "rw" or | | | | | |||
| | "wr" specify that all files | | | | | |||
| | should use binary I/O. Any other | | | | | |||
| | string value is treated as "rw", | | | | | |||
| | but generates a warning message. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| CONVFMT | The CONVFMT variable is used to | | | | | |||
| | specify the format when | - | - | + | | |||
| | converting a number to a string. | | | | | |||
| | Default: "%.6g" | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| ENVIRON | An array containing the values | - | - | + | | |||
| | of the current environment. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| ERRNO | If a system error occurs either | | | | | |||
| | doing a redirection for getline, | | | | | |||
| | during a read for getline, or | | | | | |||
| | during a close(), then ERRNO will | - | - | + | | |||
| | contain a string describing the | | | | | |||
| | error. The value is subject to | | | | | |||
| | translation in non-English locales. | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| FIELDWIDTHS | A white-space separated list of | | | | | |||
| | fieldwidths. When set, gawk | | | | | |||
| | parses the input into fields of | - | - | + | | |||
| | fixed width, instead of using the | | | | | |||
| | value of the FS variable as the | | | | | |||
| | field separator. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| FNR | Contains number of lines read, | - | + | + | | |||
| | but is reset for each file read. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| IGNORECASE | Controls the case-sensitivity of | | | | | |||
| | all regular expression and string | | | | | |||
| | operations. If IGNORECASE has a | | | | | |||
| | non-zero value, then string | | | | | |||
| | comparisons and pattern matching | | | | | |||
| | in rules, field splitting | | | | | |||
| | with FS, record separating | | | | | |||
| | with RS, regular expression | | | | | |||
| | matching with ~ and !~, and the | - | - | + | | |||
| | gensub(), gsub(), index(), | | | | | |||
| | match(), split(), and sub() | | | | | |||
| | built-in functions all ignore | | | | | |||
| | case when doing regular | | | | | |||
| | expression operations. | | | | | |||
| | NOTE: Array subscripting is not | | | | | |||
| | affected. However, the asort() | | | | | |||
| | and asorti() functions are | | | | | |||
| | affected | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| LINT | Provides dynamic control of the | | | | | |||
| | --lint option from within an AWK | - | - | + | | |||
| | program. When true, gawk prints | | | | | |||
| | lint warnings. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| OFMT | The default output format for | - | + | + | | |||
| | numbers. Default: "%.6g" | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| PROCINFO | The elements of this array | | | | | |||
| | provide access to information | | | | | |||
| | about the running AWK program. | | | | | |||
| | PROCINFO["egid"]: | | | | | |||
| | the value of the getegid(2) | | | | | |||
| | system call. | | | | | |||
| | PROCINFO["euid"]: | | | | | |||
| | the value of the geteuid(2) | | | | | |||
| | system call. | | | | | |||
| | PROCINFO["FS"]: | | | | | |||
| | "FS" if field splitting with FS | | | | | |||
| | is in effect, or "FIELDWIDTHS" | | | | | |||
| | if field splitting with | | | | | |||
| | FIELDWIDTHS is in effect. | | | | | |||
| | PROCINFO["gid"]: | - | - | + | | |||
| | the value of the getgid(2) system | | | | | |||
| | call. | | | | | |||
| | PROCINFO["pgrpid"]: | | | | | |||
| | the process group ID of the | | | | | |||
| | current process. | | | | | |||
| | PROCINFO["pid"]: | | | | | |||
| | the process ID of the current | | | | | |||
| | process. | | | | | |||
| | PROCINFO["ppid"]: | | | | | |||
| | the parent process ID of the | | | | | |||
| | current process. | | | | | |||
| | PROCINFO["uid"] | | | | | |||
| | the value of the getuid(2) system | | | | | |||
| | call. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| RT | The record terminator. Gawk sets | | | | | |||
| | RT to the input text that matched | - | - | + | | |||
| | the character or regular | | | | | |||
| | expression specified by RS. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| RSTART | The index of the first character | - | + | + | | |||
| | matched by match(); 0 if no match.| | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| RLENGTH | The length of the string matched | - | + | + | | |||
| | by match(); -1 if no match. | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| SUBSEP | The character used to separate | | | | | |||
| | multiple subscripts in array | | | | | |||
| | elements.Default: "\034" | - | + | + | | |||
| | (non-printable character, | | | | | |||
| | dec: 28, hex: 1C) | | | | | |||
'-------------+-----------------------------------+-----+-------+-------' | |||
| TEXTDOMAIN | The text domain of the AWK | | | | | |||
| | program; used to find the | - | - | + | | |||
| | localized translations for the | | | | | |||
| | program's strings. | | | | | |||
'-------------'-----------------------------------'-----'-------'-------' | |||
============================ I/O Statements =========================== | |||
.---------------------.-------------------------------------------------. | |||
| | | | |||
| Statement | Description | | |||
| | | | |||
'---------------------+-------------------------------------------------' | |||
| close(file [, how]) | Close file, pipe or co-process. The optional | | |||
| | how should only be used when closing one end of | | |||
| | a two-way pipe to a co-process. It must be a | | |||
| | string value, either "to" or "from". | | |||
'---------------------+-------------------------------------------------' | |||
| getline | Set $0 from next input record; set NF, NR, FNR. | | |||
| | Returns 0 on EOF and ?1 on an error. Upon an | | |||
| | error, ERRNO contains a string describing the | | |||
| | problem. | | |||
'---------------------+-------------------------------------------------' | |||
| getline <file | Set $0 from next record of file; set NF. | | |||
'---------------------+-------------------------------------------------' | |||
| getline var | Set var from next input record; set NR, FNR. | | |||
'---------------------+-------------------------------------------------' | |||
| getline var <file | Set var from next record of file. | | |||
'---------------------+-------------------------------------------------' | |||
| command | | Run command piping the output either into $0 or | | |||
| getline [var] | var, as above. If using a pipe or co-process | | |||
| | to getline, or from print or printf within a | | |||
| | loop, you must use close() to create new | | |||
| | instances | | |||
'---------------------+-------------------------------------------------' | |||
| command |& | Run command as a co-process piping the output | | |||
| getline [var] | either into $0 or var, as above. Co-processes | | |||
| | are a gawk extension. | | |||
'---------------------+-------------------------------------------------' | |||
| next | Stop processing the current input record. | | |||
| | The next input record is read and processing | | |||
| | starts over with the first pattern in the AWK | | |||
| | program. If the end of the input data is | | |||
| | reached, the END block(s), if any, are executed.| | |||
'---------------------+-------------------------------------------------' | |||
| nextfile | Stop processing the current input file. The | | |||
| | next input record read comes from the next | | |||
| | input file. FILENAME and ARGIND are updated, | | |||
| | FNR is reset to 1, and processing starts over | | |||
| | with the first pattern in the AWK program. If | | |||
| | the end of the input data is reached, the END | | |||
| | block(s), are executed. | | |||
'---------------------+-------------------------------------------------' | |||
| print | Prints the current record. The output record is | | |||
| | terminated with the value of the ORS variable. | | |||
'---------------------+-------------------------------------------------' | |||
| print expr-list | Prints expressions. Each expression is | | |||
| | separated by the value of the OFS variable. | | |||
| | The output record is terminated with the value | | |||
| | of the ORS variable. | | |||
'---------------------+-------------------------------------------------' | |||
| print expr-list | Prints expressions on file. Each expression is | | |||
| >file | separated by the value of the OFS variable. The | | |||
| | output record is terminated with the value of | | |||
| | the ORS variable. | | |||
'---------------------+-------------------------------------------------' | |||
| printf fmt, | Format and print. | | |||
| expr-list | | | |||
'---------------------+-------------------------------------------------' | |||
| printf fmt, | Format and print on file. | | |||
| expr-list >file | | | |||
'---------------------+-------------------------------------------------' | |||
| system(cmd-line) | Execute the command cmd-line, and return the | | |||
| | exit status. | | |||
'---------------------+-------------------------------------------------' | |||
| fflush([file]) | Flush any buffers associated with the open | | |||
| | output file or pipe file. If file is missing, | | |||
| | then stdout is flushed. If file is the null | | |||
| | string, then all open output files and pipes | | |||
| | have their buffers flushed. | | |||
'---------------------+-------------------------------------------------' | |||
| print ... >> file | Appends output to the file. | | |||
'---------------------+-------------------------------------------------' | |||
| print ... | command | Writes on a pipe. | | |||
'---------------------+-------------------------------------------------' | |||
| print ... |& | Sends data to a co-process. | | |||
| command | | | |||
'---------------------'-------------------------------------------------' | |||
=========================== Numeric Functions ========================= | |||
.---------------------.-------------------------------------------------. | |||
| | | | |||
| Function | Description | | |||
| | | | |||
'---------------------+-------------------------------------------------' | |||
| atan2(y, x) | Returns the arctangent of y/x in radians. | | |||
'---------------------+-------------------------------------------------' | |||
| cos(expr) | Returns the cosine of expr, which is in radians.| | |||
'---------------------+-------------------------------------------------' | |||
| exp(expr) | The exponential function. | | |||
'---------------------+-------------------------------------------------' | |||
| int(expr) | Truncates to integer. | | |||
'---------------------+-------------------------------------------------' | |||
| log(expr) | The natural logarithm function. | | |||
'---------------------+-------------------------------------------------' | |||
| rand() | Returns a random number N, between 0 and 1, | | |||
| | such that 0 <= N < 1. | | |||
'---------------------+-------------------------------------------------' | |||
| sin(expr) | Returns the sine of expr, which is in radians. | | |||
'---------------------+-------------------------------------------------' | |||
| sqrt(expr) | The square root function. | | |||
'---------------------+-------------------------------------------------' | |||
| srand([expr]) | Uses expr as a new seed for the random number | | |||
| | generator. If no expr is provided, the time of | | |||
| | day is used. The return value is the previous | | |||
| | seed for the random number generator. | | |||
'---------------------'-------------------------------------------------' | |||
====================== Bit Manipulation Functions ===================== | |||
.---------------------.-------------------------------------------------. | |||
| | | | |||
| Function | Description | | |||
| | | | |||
'---------------------+-------------------------------------------------' | |||
| and(v1, v2) | Return the bitwise AND of the values provided | | |||
| | by v1 and v2. | | |||
'---------------------+-------------------------------------------------' | |||
| compl(val) | Return the bitwise complement of val. | | |||
'---------------------+-------------------------------------------------' | |||
| lshift(val, count) | Return the value of val, shifted left by | | |||
| | count bits. | | |||
'---------------------+-------------------------------------------------' | |||
| or(v1, v2) | Return the bitwise OR of the values provided by | | |||
| | v1 and v2. | | |||
'---------------------+-------------------------------------------------' | |||
| rshift(val, count) | Return the value of val, shifted right by | | |||
| | count bits. | | |||
'---------------------+-------------------------------------------------' | |||
| xor(v1, v2) | Return the bitwise XOR of the values provided | | |||
| | by v1 and v2. | | |||
'---------------------'-------------------------------------------------' | |||
=========================== String Functions ========================== | |||
.---------------------.-------------------------------------------------. | |||
| | | | |||
| Function | Description | | |||
| | | | |||
'---------------------+-------------------------------------------------' | |||
| asort(s [, d]) | Returns the number of elements in the source | | |||
| | array s. The contents of s are sorted using | | |||
| | gawk's normal rules for comparing values, and | | |||
| | the indexes of the sorted values of s are | | |||
| | replaced with sequential integers starting with | | |||
| | 1. If the optional destination array d is | | |||
| | specified, then s is first duplicated into d, | | |||
| | and then d is sorted, leaving the indexes of | | |||
| | the source array s unchanged. | | |||
'---------------------+-------------------------------------------------' | |||
| asorti(s [, d]) | Returns the number of elements in the source | | |||
| | array s. The behavior is the same as that of | | |||
| | asort(), except that the array indices are | | |||
| | used for sorting, not the array values. When | | |||
| | done, the array is indexed numerically, and the | | |||
| | values are those of the original indices. The | | |||
| | original values are lost; thus provide a second | | |||
| | array if you wish to preserve the original. | | |||
'---------------------+-------------------------------------------------' | |||
| gensub(r, s, | Search the target string t for matches of the | | |||
| h [, t]) | regular expression r. If h is a string | | |||
| | beginning with g or G, then replace all matches | | |||
| | of r with s. Otherwise, h is a number | | |||
| | indicating which match of r to replace. If t is | | |||
| | not supplied, $0 is used instead. Within the | | |||
| | replacement text s, the sequence \n, where n is | | |||
| | a digit from 1 to 9, may be used to indicate | | |||
| | just the text that matched the n'th | | |||
| | parenthesized subexpression. The sequence \0 | | |||
| | represents the entire matched text, as does the | | |||
| | character &. Unlike sub() and gsub(), the | | |||
| | modified string is returned as the result of | | |||
| | the function, and the original target string | | |||
| | is not changed. | | |||
'---------------------+-------------------------------------------------' | |||
| gsub(r, s [, t]) | For each substring matching the regular | | |||
| | expression r in the string t, substitute the | | |||
| | string s, and return the number of | | |||
| | substitutions. If t is not supplied, use $0. | | |||
| | An & in the replacement text is replaced with | | |||
| | the text that was actually matched. Use \& to | | |||
| | get a literal &. (This must be | | |||
| | typed as "\\&") | | |||
'---------------------+-------------------------------------------------' | |||
| index(s, t) | Returns the index of the string t in the | | |||
| | string s, or 0 if t is not present. (This | | |||
| | implies that characterindices start at one.) | | |||
'---------------------+-------------------------------------------------' | |||
| length([s]) | Returns the length of the string s, or the | | |||
| | length of $0 if s is not supplied. | | |||
'---------------------+-------------------------------------------------' | |||
| match(s, r [, a]) | Returns the position in s where the regular | | |||
| | expression r occurs, or 0 if r is not present, | | |||
| | and sets the values of RSTART and RLENGTH. | | |||
| | Note that the argument order is the same as for | | |||
| | the ~ operator: str ~ re. If array a is | | |||
| | provided, a is cleared and then elements 1 | | |||
| | through n are filled with the portions of s | | |||
| | that match the corresponding parenthesized | | |||
| | subexpression in r. The 0'th element of a | | |||
| | contains the portion of s matched by the entire | | |||
| | regular expression r. Subscripts a[n, "start"], | | |||
| | and a[n, "length"] provide the starting index | | |||
| | in the string and length respectively, of each | | |||
| | matching substring. | | |||
'---------------------+-------------------------------------------------' | |||
| split(s, a [, r]) | Splits the string s into the array a on the | | |||
| | regular expression r, and returns the number of | | |||
| | fields. If r is omitted, FS is used instead. | | |||
| | The array a is cleared first. Splitting behaves | | |||
| | identically to field splitting. | | |||
'---------------------+-------------------------------------------------' | |||
| sprintf(fmt, | Prints expr-list according to fmt, and returns | | |||
| expr-list) | the resulting string. | | |||
'---------------------+-------------------------------------------------' | |||
| strtonum(str) | Examines str, and returns its numeric value. | | |||
| | If str begins with a leading 0, strtonum() | | |||
| | assumes that str is an octal number. If str | | |||
| | begins with a leading 0x or 0X, strtonum() | | |||
| | assumes that str is a hexadecimal number. | | |||
'---------------------+-------------------------------------------------' | |||
| sub(r, s [, t]) | Just like gsub(), but only the first matching | | |||
| | substring is replaced. | | |||
'---------------------+-------------------------------------------------' | |||
| substr(s, i [, n]) | Returns the at most n-character substring of s | | |||
| | starting at i. If n is omitted, the rest of s | | |||
| | is used. | | |||
'---------------------+-------------------------------------------------' | |||
| tolower(str) | Returns a copy of the string str, with all the | | |||
| | upper-case characters in str translated to | | |||
| | their corresponding lower-case counterparts. | | |||
| | Non-alphabetic characters are left unchanged. | | |||
'---------------------+-------------------------------------------------' | |||
| toupper(str) | Returns a copy of the string str, with all the | | |||
| | lower-case characters in str translated to | | |||
| | their corresponding upper-case counterparts. | | |||
| | Non-alphabetic characters are left unchanged. | | |||
'---------------------'-------------------------------------------------' | |||
============================ Time Functions =========================== | |||
.---------------------.-------------------------------------------------. | |||
| | | | |||
| Function | Description | | |||
| | | | |||
'---------------------+-------------------------------------------------' | |||
| mktime(datespec) | Turns datespec into a time stamp of the same | | |||
| | form as returned by systime(). The datespec is | | |||
| | a string of the form YYYY MM DD HH MM SS[ DST]. | | |||
| | The contents of the string are six or seven | | |||
| | numbers representing respectively the full year | | |||
| | including century, the month from 1 to 12, the | | |||
| | day of the month from 1 to 31, the hour of the | | |||
| | day from 0 to 23, the minute from 0 to 59, and | | |||
| | the second from 0 to 60, and an optional | | |||
| | daylight saving flag. The values of these | | |||
| | numbers need not be within the ranges | | |||
| | specified; for example, an hour of -1 means 1 | | |||
| | hour before midnight. The origin-zero Gregorian | | |||
| | calendar is assumed, with year 0 preceding year | | |||
| | 1 and year -1 preceding year 0. The time is | | |||
| | assumed to be in the local timezone. If the | | |||
| | daylight saving flag is positive, the time is | | |||
| | assumed to be daylight saving time; if zero, | | |||
| | the time is assumed to be standard time; and if | | |||
| | negative (the default), mktime() attempts to | | |||
| | determine whether daylight saving time is in | | |||
| | effect for the specified time. If datespec does | | |||
| | not contain enough elements or if the resulting | | |||
| | time is out of range, mktime() returns -1. | | |||
'---------------------+-------------------------------------------------' | |||
| strftime([format | Formats timestamp according to the | | |||
| [, timestamp]]) | specification in format. The timestamp should | | |||
| | be of the same form as returned by systime(). | | |||
| | If timestamp is missing, the current time of | | |||
| | day is used.If format is missing, a default | | |||
| | format equivalent to the output of date(1) is | | |||
| | used. See the specification for the strftime() | | |||
| | function in ANSI C for the format conversions | | |||
| | that are guaranteed to be available. A | | |||
| | public-domain version of strftime(3) and a man | | |||
| | page for it come with gawk; if that version was | | |||
| | used to build gawk, then all of the conversions | | |||
| | described in that man page are available to | | |||
| | gawk. | | |||
'---------------------+-------------------------------------------------' | |||
| systime() | Returns the current time of day as the number | | |||
| | of seconds since the Epoch (1970-01-01 00:00:00 | | |||
| | UTC on POSIX systems). | | |||
'---------------------'-------------------------------------------------' | |||
=============== Internationalization (I18N) Functions ================ | |||
.---------------------.-------------------------------------------------. | |||
| | | | |||
| Function | | | |||
| | | | |||
| Description | | | |||
| | | | |||
'---------------------+-------------------------------------------------' | |||
| bindtextdomain(directory [, domain]) | | |||
| | | |||
| Specifies the directory where gawk looks for the .mo files. It | | |||
| returns the directory where domain is ``bound.'' The default domain | | |||
| is the value of TEXTDOMAIN. If directory is the null string (""), | | |||
| then bindtextdomain() returns the current binding for the given domain| | |||
'---------------------+-------------------------------------------------' | |||
| dcgettext(string [, domain [, category]]) | | |||
| | | |||
| Returns the translation of string in text domain domain for locale | | |||
| category category. The default value for domain is the current value | | |||
| of TEXTDOMAIN. The default value for category is "LC_MESSAGES". If | | |||
| you supply a value for category, it must be a string equal to one of | | |||
| the known locale categories. You must also supply a text domain. Use | | |||
| TEXTDOMAIN if you want to use the current domain. | | |||
'---------------------+-------------------------------------------------' | |||
| dcngettext(string1 , string2 , number [, domain [, category]]) | | |||
| | | |||
| Returns the plural form used for number of the translation of string1 | | |||
| and string2 in text domain domain for locale category category. The | | |||
| default value for domain is the current value of TEXTDOMAIN. The | | |||
| default value for category is "LC_MESSAGES". If you supply a value | | |||
| for category, it must be a string equal to one of the known locale | | |||
| categories. You must also supply a text domain. Use TEXTDOMAIN if | | |||
| you want to use the current domain. | | |||
'---------------------'-------------------------------------------------' | |||
=============== GNU AWK's Command Line Argument Summary =============== | |||
.-------------------------.---------------------------------------------. | |||
| | | | |||
| Argument | Description | | |||
| | | | |||
'-------------------------+---------------------------------------------' | |||
| -F fs | Use fs for the input field separator | | |||
| --field-sepearator fs | (the value of the FS predefined variable). | | |||
'-------------------------+---------------------------------------------' | |||
| -v var=val | Assign the value val to the variable var, | | |||
| --assign var=val | before execution of the program begins. | | |||
| | Such variable values are available to the | | |||
| | BEGIN block of an AWK program. | | |||
'-------------------------+---------------------------------------------' | |||
| -f program-file | Read the AWK program source from the file | | |||
| --file program-file | program-file, instead of from the first | | |||
| | command line argument. Multiple -f | | |||
| | (or --file) options may be used. | | |||
'-------------------------+---------------------------------------------' | |||
| -mf NNN | Set various memory limits to the value NNN. | | |||
| -mr NNN | The f flag sets the maximum number of | | |||
| | fields, and the r flag sets the maximum | | |||
| | record size. (Ignored by gawk, since gawk | | |||
| | has no pre-defined limits) | | |||
'-------------------------+---------------------------------------------' | |||
| -W compat | Run in compatibility mode. In compatibility | | |||
| -W traditional | mode, gawk behaves identically to UNIX awk; | | |||
| --compat--traditional | none of the GNU-specific extensions are | | |||
| | recognized. | | |||
'-------------------------+---------------------------------------------' | |||
| -W copyleft | Print the short version of the GNU copyright| | |||
| -W copyright | information message on the standard output | | |||
| --copyleft | and exit successfully. | | |||
| --copyright | | | |||
'-------------------------+---------------------------------------------' | |||
| -W dump-variables[=file]| Print a sorted list of global variables, | | |||
| --dump-variables[=file] | their types and final values to file. If no | | |||
| | file is provided, gawk uses a file named | | |||
| | awkvars.out in the current directory. | | |||
'-------------------------+---------------------------------------------' | |||
| -W help | Print a relatively short summary of the | | |||
| -W usage | available options on the standard output. | | |||
| --help | | | |||
| --usage | | | |||
'-------------------------+---------------------------------------------' | |||
|-W lint[=value] | Provide warnings about constructs that | | |||
|--lint[=value] | are dubious or non-portable to other AWK | | |||
| | impl?s. With argument fatal, lint warnings | | |||
| | become fatal errors. With an optional | | |||
| | argument of invalid, only warnings about | | |||
| | things that are actually invalid are | | |||
| | issued. (This is not fully implemented yet.)| | |||
'-------------------------+---------------------------------------------' | |||
| -W lint-old--lint-old | Provide warnings about constructs that are | | |||
| | not portable to the original version of | | |||
| | Unix awk. | | |||
'-------------------------+---------------------------------------------' | |||
| -W gen-po--gen-po | Scan and parse the AWK program, and | | |||
| | generate a GNU .po format file on standard | | |||
| | output with entries for all localizable | | |||
| | strings in the program. The program itself | | |||
| | is not executed. | | |||
'-------------------------+---------------------------------------------' | |||
| -W non-decimal-data | Recognize octal and hexadecimal values in | | |||
| --non-decimal-data | input data. | | |||
'-------------------------+---------------------------------------------' | |||
| -W posix--posix | This turns on compatibility mode, with the | | |||
| | following additional restrictions: | | |||
| | o \x escape sequences are not recognized. | | |||
| | o Only space and tab act as field | | |||
| | separators when FS is set to a single | | |||
| | space, new-line does not. | | |||
| | o You cannot continue lines after ? and :. | | |||
| | o The synonym func for the keyword function| | |||
| | is not recognized. | | |||
| | o The operators ** and **= cannot be used | | |||
| | in place of ^ and ^=.? The fflush() | | |||
| | function is not available. | | |||
'-------------------------+---------------------------------------------' | |||
| -W profile[=prof_file] | Send profiling data to prof_file. | | |||
| --profile[=prof_file] | The default is awkprof.out. When run with | | |||
| | gawk, the profile is just a "pretty | | |||
| | printed" version of the program. When run | | |||
| | with pgawk, the profile contains execution | | |||
| | counts of each statement in the program | | |||
| | in the left margin and function call counts | | |||
| | for each user-defined function. | | |||
'-------------------------+---------------------------------------------' | |||
| -W re-interval | Enable the use of interval expressions in | | |||
| --re-interval | regular expression matching. Interval | | |||
| | expressions were not traditionally | | |||
| | available in the AWK language. | | |||
'-------------------------+---------------------------------------------' | |||
| -W source program-text | Use program-text as AWK program source | | |||
| --source program-text | code. This option allows the easy | | |||
| | intermixing of library functions (used via | | |||
| | the -f and --file options) with source code | | |||
| | entered on the command line. | | |||
'-------------------------+---------------------------------------------' | |||
| -W version | Print version information for this | | |||
| --version | particular copy of gawk on the standard | | |||
| | output. | | |||
'-------------------------+---------------------------------------------' | |||
| -- | Signal the end of options. This is useful | | |||
| | to allow further arguments to the AWK | | |||
| | program itself to start with a "-". This | | |||
| | is mainly for consistency with the argument | | |||
| | parsing convention used by most other POSIX | | |||
| | programs. | | |||
'-------------------------'---------------------------------------------' | |||
======================================================================= | |||
.-----------------------------------------------------------------------. | |||
| Peteris Krumins (peter@catonmat.net), 2007.08.22 | | |||
| http://www.catonmat.net - good coders code, great reuse | | |||
'-----------------------------------------------------------------------' | |||
</pre> | </pre> | ||