greencardamom/awkPublic

NotificationsYou must be signed in to change notification settings
Fork3
Star27

Nim for awk programmers. A library of awk functions in nim

License

MIT license

27 stars 3 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
src		src
test		test
CHANGELOG		CHANGELOG
LICENSE		LICENSE
README.md		README.md
awk.nimble		awk.nimble
runtests.sh		runtests.sh

Repository files navigation

Nim for awk programmers

A library of GNU awk functions for nim. Standard awk library functions written in and for nim.

Program in nim using the familair regex-enabled awk toolset.
For nim programers, a small set of powerful regex tools from the awk world.
Convert GNU awk scripts to C (and binary executable) without coding in C by using the nim macro language.

Awk and nim can look very similair. Example awk program that prints the word "text":

BEGIN{  str="This is <a href=\"my text\">here</a>"if(match(str,"<a href=\"my text\">", dest)) {split(dest[0], arr,"\"")if(arr[2]~ /text/)printsubstr(arr[2],4,length(arr[2]))  }}

nim version:

import awkvar str="This is <a href=\"my text\">here</a>"if(match(str,"<a href=\"my text\">", dest)>0):     awk.split(dest, arr,"\"")if(arr[1]~"text"):    echo awk.substr(arr[1],3, len(arr[1])-1)

Nim compiles to C source, which compiles to a standalone binary executable using gcc. The nim compile (c) and run (-r) command:

nim c -r "test.nim"text

Versions

Most of the nim procs in this package deal with awk's regex functionality.

Two versions are included: awk.nim uses the "re" module and awknre.nim uses the "nre" module.

The re module is significantly faster and recommended, but awknre.nim is included for backwards compat since the first version of this package used it and there may be some differences in regex options.

Functions

~ and !~

Emulate awk's ~ and !~ commands which can be thought of as a regex-enabled version of contains() in nim.

proc`~`*(source, pattern:string):boolproc`!~`*(source, pattern:string):bool

Nim does not have an equivilent of awk's // to signify a text is regex. Therefore all text to the right of ~ is treated as regex. To do a literalstring test use == instead of ~

Use grouping () when building a string with '&', for example:

if s~ ("^"& re&"$"):

Example:

import awkif"george"~"ge.*?rge":  echo"true"#=> true

>* and >>

Write text to a file (append or overwrite)

proc`>*`(text, filename:string):bool

Writetext tofilename, overwrite previous content. Close on finish.

proc`>>`(text, filename:string):bool

Appendtext tofilename. Close on finish.

Example:

"Hello"&" world">*"/tmp/test.txt""Hello">*"/dev/stderr"

Note that awk's ">" is refactored as ">*" to avoid conflicting with nim's ">"

match

Find regexpattern insource and optionally store result indest.

proc match(source:string, pattern:string [, dest:string]):int

source is the string to match against.
pattern is the regex pattern.
dest is an optional string to hold the matched text.
Ifdest was not declared previously, it will be created. If it exists, match() will overwrite the contents with the results of the match.
The return value is the number of characters from the start, starting with 1, where the matched text is located, or 0 if no match.
Consider using index() instead assumingpattern is not a regex and not usingdest, it's faster.

Example:

import awkif match("this is a test a","s.*?a", a)>0:  echo a#=> "s is a"

split

Splitsource along regexmatch and store segments indest.

template split(source:string, dest:untyped, match:string):int

source is the source string to be split.
dest is a seq[] filled with results of the split.
match is a string (regex or not) that will be used to splitsource

The function behaves much like awk:

Returns the number of splits (discardable).
Thedest seq is created by split, it does not need to exist before calling split().
If the seq does exist, the contents will be overwritten.
If there are 0 splitsdest will be 0-length ie. check the return value of split and/or length ofdest before accessingdest
The first element ofdest is 0 (unlike awk which is 1).
Because nim's system.split() has the same order and type of arguments it should be invoked as awk.split() to avoid ambiguity.

Example:

import awkawk.split("This is a string", arr,"is")echo arr[0]#> "Th"

gsub

Global substitute the regexpattern withreplacement in thesource string

gsub(pattern:string, replacement:string, source:string):string

pattern is a regex string. For literal strings use gsubs()
replacement is the new text to replace the pattern text.
source is the source string.

gsub() returns the new string in addition to changing the source string in-place. It is discardable.

If the source string is not a var (let, const or literal string) the source string is not modified in-place.

Example 1:

str="this is is string"gsub("[ ]is.*?st"," is a st", str)   echo str#=> "this is a string"

Example 2:

echo gsub("[ ]is.*?st"," is a st","this is is string")=>"this is a string"

Caution: a self-reference will not produce expected results. For example this doesn't produce an error but doesn't work:

str="abc"str= gsub("b","z", str)

gsubi

Global substitute the regexpattern withreplacement in thesource string, leaving the source string unmodified

gsubi(pattern:string, replacement:string, source:string):string

pattern is a regex string. For literal strings use gsubs()
replacement is the new text to replace the pattern text.
source is the source string.

gsubi() returns the new string but leaves the source string untouched.

Example 1:

str="this is is string"echo gsubi("[ ]is.*?st"," is a st", str)#=> "this is a string"echo str#=> "this is is string"

gsubs

Global substitute non-regexpattern withreplacement in thesource string. A literal-string version of gsub()

gsubs(pattern:string, replacement:string, source:string):string

pattern is a literal string
replacement is the new text to replace the pattern text.
source is the source string.

gsubs() returns the new string in addition to changing the source string in-place. It is discardable.

Example 1:

str="this is is string"gsubs(" is is st"," is a st", str)   echo str#=> "this is a string"

Example 2:

echo gsubs(" is is st"," is a st", str)=>"this is a string"

sub

sub(pattern:string, replacement:string, source:string [, occurance:int]):string

Substitute in-place the first occurance of regexpattern withreplacement insource stringOptionaloccurance substitute at the Xth occurance.

pattern is a regex used in making the substitution
replacement the new string
source is the string matched against
occurance optional (default 1) which occurance to substitute

Ifsource is not a pre-declared variable, sub returns the new string but does not sub in-placeSubstitutions are non-overlap eg. sub("22","33","222222") => "333333" not "3333333333"

Example:

str="This is a sring"sub("[ ]is[ ]"," or", str)# substitute 'str' in-place.echo str#=> "This or a string"echo sub("[ ]is[ ]"," or","This is a sring")# doesn't sub "This is a sring" in-place, returns a new string

subs

Single substitute non-regexpattern withreplacement in thesource string. A literal-string version of sub(). See gsubs() for documentation

patsplit

Dividesource into pieces defined by regexpattern and store the pieces in seqfield. Optionalsep stores the seperators.

patsplit(source:string, field:seq, pattern:string [, sep:seq]):int

source is the source string
field is a sequence containing the field pieces
pattern is a regex (or literal) pattern string
sep is a sequence containing the seperator pieces. Optional.

patsplit() behaves as follows:

Thefield (andsep) sequences must be created beforehand (see example how).
Returns number of field elements found.
If no match found,field is set to the value ofsource

Example 1:

var str="This is <!--comment1--> a string <!--comment2--> with comments."var field= newSeq[string](0)if patsplit(str, field,"<[ ]{0,}[!].*?>")>0:  echo field[0]#=> "<!--comment1-->"  echo field[1]#=> "<!--comment2-->"

Example 2:

var ps="This is <!--comment--> a string <!--comment2--> with comments."var field, sep= newSeq[string](0)patsplit(ps, field,"<[ ]{0,}[!].*?>", sep)echo sep[1]#=> " a string "echo unpatsplit(field, sep)

unpatsplit

Recombine two sequences created by patsplit()

unpatsplit(field:seq, sep:seq)

Given two seq's created by patsplit, recombine into a single string in alternating sequence ie. field[0] & seq[0] & field[1] & seq[1] etc.

If field has more elements than sep, return ""

substr

Returnlength-character long substring ofsource starting at char numberstart

substr(source:string, start:int [, length:int]): str

The first character is 0 (diff from awk which is 1)
Iflength not present return the string fromstart to end
Ifstart < 0, treat as 0
Ifstart > length of source, return ""
Iflength < 1, return ""
Because nim's system.substr() has the same order and type of arguments this proc should be invoked as awk.substr() to avoid ambiguity.

Example:

echo awk.substr("Hello World",3)#> "lo World"echo awk.substr("Hello World",3,2)#> "lo"

index

Return the start location (index) of the first occurance of non-regextarget insource

index(source:string, target:string):int

First character is 0 (not 1 as in awk)
If none found or error return -1

Example

var loc= index("This is string","is")echo loc#=> 2

Techniques

associative arrays

Awk uses associative arrays. Nim also supports associative arrays, called "tables".

For example in awk to uniqe a list of words:

split("Blue Blue Red Green", arr,"")# Whoops, let's get rid of the extra "Blue"for(iin arr)  uarr[i]=1for(iin uarr)print i

The equivilent in Nim:

import strutils, tablesvar   arr= split("Blue Blue Red Green","")# list of words containing a duplicate  uarr= initTable[string,int]()# create empty table (associative array) to hold wordsfor iin arr:# unique the list  uarr[i]=1for jin uarr.keys:# print the list  echo j

Getting started with nim

How I Start has good instructions for installing nim. It takes 5 minutes and everything is contained in a single directory.
Nim Language, official website.
GNU awk manual

About

Nim for awk programmers. A library of awk functions in nim

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

greencardamom/awk

Folders and files

Latest commit

History

Repository files navigation

Nim for awk programmers

Versions

Functions

~ and !~

>* and >>

match

split

gsub

gsubi

gsubs

sub

subs

patsplit

unpatsplit

substr

index

Techniques

associative arrays

Getting started with nim

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages