AWK quick reference January 16, 1998 (Update 05/22/98) :: ABBREVIATIONS: cmd = command (shell) expr(s) = expression(s) fmt = format param(s) = parameter(s) patt(s) = pattern(s) stat(s) = statement(s) var(s) = variable(s) :: Command line: awk [-Fs] <'prog'|-f progfile> [var=value] [file list] :: Programs: patt { action } function name(param) { stat } :: Patterns: BEGIN END expr /regex/ patt && patt patt || patt !patt (patt) patt, patt <-- a range :: Actions: break continue delete do while (expr) exit [expr] expr if (expr) stat [else stat] input-output stat for (expr; expr; expr) stat for (var in array) stat next return [expr] while (expr) stat { stats } :: Input-output: close(expr) pipe/file expr getline sets $0,NF,NR,FNR getline file print to file printf fmt,exprs format and print printf fmt,exprs (idem) to file system(cmd) exec cmd, return status :: Print format conversions: %c ASCII char %d decimal %e [-]d.d*E[+-]dd %f [-]d*.d* %g e or f whichever is shorter %o unsigned octal %s string %x unsigned hexadecimal %% print a %, no argument converted additional parameters: - left justify expr width pad field, leading 0: w/zeros .prec max string width or digits after decimal point :: Built-in variables: ARGC # of comm-line args ARGV array of comm-line args [0..ARGC-1] FILENAME name of curr input file FNR input rec in curr file FS input field separator (blank) NF number of fields in input rec NR input rec since the beginning OFMT output fmt for numbers (%.6g) OFS output field separator (blank) ORS output rec separator (\n) RLENGTH length of string matched by regex in match RS input rec separator (\n) RSTART beginning position of string matched by match (also is the value returned by match) SUBSEP separator for array subscripts of form [i,j,...] (\034) :: Built-in string functions In the following: s,t are strings, r a regex and i,n are integers. An "&" in the replacement string s in sub and gsub is replaced by the matched string. gsub(r,s,t) returns # of subst. If no t, default $0 index(s,t) index of s in t, 0 if not length(s) length of s match(s,r) index of where s matches r, 0 if no match. RSTART and RLENGTH are set split(s,a,fs) split s into array a on fs, return # of fields. If no fs then use FS sprintf(fmt,exprs) format exprs sub(r,s,t) like gsub, but only once substr(s,i,n) return the n-char sub- string from i. If no n, return suffix of s starting at i :: Built-in arithmethic functions atan2(y,x) arctan of y/x in radians cos(x) cos (angle in radians) exp(x) e^x int(x) truncate to integer log(x) natural logarithm rand(x) pseudo-rand [0-1> sin(x) sine (angle in radians) sqrt(x) square root srand(x) new seed for rand, time of day used if no x :: Expression operators (in increasing precedence) = += -= *= /= %= ^= assignment ? : conditional operation || logical OR && logical AND in array membership ~ !~ regex match, negated match < <= > >= != == relationals string concat, no explicit operator + - add, substract * / % multiply, divide , mod + - ! unary plus/minus, logical NOT ^ exponentiation ++ -- increment/decrement (pre/postfix) $ field All operators are left associative, except assignment, ?: and ^, which are right associative. Parenthesis to group and change evaluation order. :: Regular expressions: The regex metacharacters are \ ^ $ . [ ] | ( ) * + ? summary of metacharacters and matching c matches the nonmetacharacter c \c matches the escape sequence or literal character c ^ beginning of a string $ end of string . a single character [abc...] char class [^abc...] negated char class r1|r2 alternation: any r1 or r2 (r1)(r2) concatenation (r)* zero or more of r (r)+ one or more of r (r)? zero or one of r (r) grouping :: Escape sequences: \b backspace \f formfeed \n newline \r carrige return \t tab \ddd octal value ddd (1-3 digits) \c any other char literally e.g. \" :: Limits: (for the "one true awk") 100 fields 3000 chars per input record 3000 chars per output record 1024 chars per field 3000 chars per printf string 400 chars max literal string 400 chars in char class 15 open files 1 pipe double-precision floating point :: Initialization, comparison & type coercion: *Variables can potentially be a string or a number, or both at any time. *Assignment sets its type: var = expr *In comparisons, if both operands are numeric, the comparison is made numerically. Otherwise the operands are coerced to string, and the comparison is made on strings. *Numeric coercion: expr + 0 *String coercion: expr "" *Uninitialized var have the numeric value 0 and the string value "", so if x is uninitialized: (x) and (x=="0") are false (!x), (x==0) and (x=="") are true *The type of a field is determined by context when possible: $1++ coerces $1 to numeric, and $3 = $1 "," $2 coerces $1 and $2 to string *Null fields: string "" *Null array elements: string "" *Mentioning variables causes them to exist with the values 0 and "" e.g. if (arr[i] == "") is true, because it creates arr[i] if (i in arr) determines if arr[i] exists without creating it --- <::>