Tuesday, November 13, 2018

Go-Lang Learn to Code GR



A while back (last summer) a coding group member recommended Go as a new language.  Since I have now run out of things to do regarding the Expense It application, I decided to start looking into Go.

At the end of July I had downloaded version 1.10.3 and recently I downloaded 1.11.2 to a different PC that seems to be the latest version.  So these comments will mainly be about 1.10.3 but could also apply to 1.11.2 since it seems to have the same problems.

The comments that I make here will be about writing an application in the Go language as well as a lesson learned about the Learn to Code GR application of a number of posts from September 2017 thru February 2018 mostly under titles such as Max Column Sum by Key.

Comments concerning Go

First off, upon starting to use the Go compiler the second of this month (it is now the twelfth) I tried the hello.go project as instructed in How to Write Go Code as found online (golang.org/doc/code.html).  This site provides the code example
package main


func main() {
}

Except that it wouldn't build using the 1.10.3 version of the compiler on a Windows PC.

I guess I should back track a bit.  Like many of the compilers that I used for the Learn to Code GR Max Column Sum by Key, the compiler is not visual.  So an editor, such as UltraEdit, has to be used and then the compiler has to be run from a DOS command window the old fashion way.  (My preferred way is a visual compiler such as those of GNAT or Microsoft C#, etc where code can be entered in a window of the compiler and then run with errors and warnings identified and located to the file being compiled.  And a clean compile run via the debugger of the visual compiler.)

Go didn't provide any of that.  First the hello.go file had to be formatted as a MAC file – that is, with only Line Feed characters at the end of lines.  Luckily this conversion is no big deal with UltraEdit and once the file is converted it can remain in that format.

I then got the error
C:\Go\src>go build hello.go
# command-line-arguments
.\hello.go:4:30: newline in string
.\hello.go:4:30: syntax error: unexpected newline, expecting comma or )
And noticed that line 4 column 30 was just past the end of the
      fmt.Println("Hello, world.)
line.  That was when I switched to MAC formatting.  Then I got
C:\Go\src>go build hello.go
can't load package: package main:
hello.go:1:14: expected ';', found 'import'

How to Write Go Code had no ';' in the code that it specified as the sample to try.  Checking further Wikipedia had that semi-colons still terminate lines but are implicit.  I added one anyway to see what happens.  Then I got
C:\Go\src>go build hello.go
# command-line-arguments
.\hello.go:1:28: syntax error: unexpected func, expecting semicolon or newline
.\hello.go:1:53: string not terminated

This result appeared to be that the separate lines of the hello.go file were all strung together as if all were on line 1.  Not very helpful if the file had been of greater length.  However, I added a semi-colon at the end of the
import "fmt"
line and got
C:\Go\src>go build hello.go
# command-line-arguments
.\hello.go:1:54: string not terminated
.\hello.go:1:74: syntax error: unexpected EOF, expecting comma or )
so I put all the code on one line and saw that I had a missing closing double quote after "world\n" of the line
package main; import "fmt"; func main() { fmt.Printf("Hello, world.\n); }
from entering the example into UltraEdit.  So I added it to complete the Printf statement.

It then compiled.  So I had managed to find my coding error in spite of the inadequate error output.  And also that I needed a semi-colon where Go wasn't supposed to need them.  Note, by the time that I finished, the code was
package main;
import "fmt";
func main() {
fmt.Printf("Hello, world.\n");
}
with semi-colons after the first, second, and fourth lines. 

To double check for this post I removed them again and got the error message
can't load package: package main:
hello.go:1:14: expected ';', found 'import'
The same is true if the file is all on one line
package main import "fmt" func main() { fmt.Printf("Hello, world.\n") }
where position 14 is the 'i' of import.  That is, assuming that position 1 is the p of package.

Note:  The size of hello.go text file is 74 bytes in Windows and hello.exe is 2,058,752 bytes.  Quite a difference.

Before continuing I captured the contents of a number of Go / Golang internet site into Word documents and read them.  I also tried their Rectangle struct example where I discovered the need to also put semi-colons after all the trailing } brackets (except the last).

I then began writing a somewhat larger package to redo the Learn to Code GR application in go.  This resulted in harder to locate errors due to the inadequate identification of the location of the errors.  Not only were the errors all specified as being on line 1 but many didn't identify the reason that the code was in error or provide any position whatsoever (other than line 1).  This would be a significant handicap in writing an application of any size and was problematic even with such a small application.

Not only did I find that semi-colons were needed in the expected places but also following the trailing } brackets except for the very last closing bracket.

Other differences I found writing the Learn to Code GR package are
  • // comments couldn't be used.  Instead they had to be /* */ comments.
  • The construct
      import (
           "bufio"
           "fmt"
           "io"
           "io/ioutil"
           "os"
      ); that an example gave didn't work.  Separate import lines with trailing semi-colons were needed.
  • Variables had to have a leading var keyword and the type had to be supplied even though the Go documentation says that Go will recognize what the type has to be from its use.

Comments concerning Max Column Sum by Key

After the reading of the web sites and doing the two simple examples I began the new version of Max Column Sum by Key on Nov 8 (after starting looking into Go on the 2nd).

I found an example for reading a text file (this Go package needs to read the max-col-sum-by-key.tsv file) at https://gobyexample.com.  Using this example, the entire file was read into a data buffer and could be displayed via
    /* Open and read the entire file into data */
    data, err := ioutil.ReadFile("C:/Source/LearnToCodeGR-Ada/max-col-sum-by-key.tsv");
    check(err);
    fmt.Print(string(data));
    fmt.Print("\n");
so that use of Go went quickly.  Note: check is a function to determine if an error occurred.

Like in past implementations I then invoked a parse function to obtain the key and values from the file so that the key with the associated value with the most references could be found and reported.

At this point I found a helpful Scanner function of the Go programming language.  Where
  var s scanner.Scanner;
  fset := token.NewFileSet();
  file := fset.AddFile("", fset.Base(), len(data));
  s.Init(file, data, nil, scanner.ScanComments);
initializes the scanner for the data passed to my parse function via the Init function.  Note: public functions have a leading upper case letter in Go while private functions have a lower case letter.

Then I added a for loop to scan each line of the data from the file.  That is, each line is terminated by a new line character.  Each line is decoded in the loop via
    position, tokenFound, literal := s.Scan();
    if tokenFound == token.EOF {
      };
      break;
    };
where the Scan function requires that three variables be declared for the function to return. 

This is where a rule of Go comes into play.  While debugging via Print output I had used all three variables to check what was happening.  But when I was satisfied with my code I no longer needed the position of the location of the token and the literal.  So I stopped referencing position in my debug output.  This resulted in a Go error because Go doesn't allow a variable to be declared and not used.  This is the reason for the dummy use of the position variable in
      if position > 0 { /* just to use position to avoid error for non-use */
above.  Although, I would think that since the Scan function requires that all three variables be declared (I did try the use of s.Scan() returning only tokenFound and literal and it failed to compile) that the use by Scan could be interpreted as satisfying the Go requirement.

While I still had the debugging output a portion of the sample results were
%!s(token.Pos=400)      ;       "\n"
%!s(token.Pos=401)      INT     "0"
%!s(token.Pos=402)      ,       ""
%!s(token.Pos=403)      INT     "912"
%!s(token.Pos=406)      IDENT   "_NUM"
%!s(token.Pos=411)      INT     "3000"
%!s(token.Pos=416)      INT     "1"
%!s(token.Pos=418)      INT     "1"
%!s(token.Pos=420)      ;       "\n"
where the positions in the data are 400, 401, etc; the tokens are ';', ',', INT, and IDENT; and the literals are strings of \n, 0, an empty string, 912, _NUM, 3000, and 1.  These results are from the file record
0,912_NUM       3000    1       1
where there is a \t (horizontal tab) character after "0,912_NUM", "3000", and the first "1" and a \n (new line) after the second "1".

From this output I could tell that the literals that I needed were those of the last three INT tokens of each line/file record.  It's a mystery to me why \n was returned as a token but the three instances of  \t were not.  But since the INT tokens were associated with the needed items the Go Scanner is quite usable and limits what has to be done.

Therefore, in the for loop, I counted the number of times the token was INT.  Then, if the third time, the associated literal was captured as the key; if the fourth time, it was captured as the first value of the key; and if the fifth time, as the second value of the key.

Prior to this I had, following previous Learn to Code GR examples in other languages, created a struct of
type KeyValues struct {
   key string;
   count1 int;
   value2 string;
   count2 int;
};
while preparing to get all the results
type keys [30]KeyValues;
type Keys struct {
  count int;
  keys;
};
var items Keys;

While beginning to write the captureKeyAndValue function to keep track of the results I suddenly had an epiphany.  I was again adding the extra code to determine whether the second value was the same or different from the first with the extra code necessary to handle each such case and whether another array entry was going to need to be added.  And it suddenly occurred to me that all I had to do was call the function twice; once for the key and the first value and again for the key and the second value.  A neat solution that hadn't occurred to me when I was doing one language example after another back at the end 2017 and the beginning of this year. 

At that time I was on many occasions illustrating the use of a class (which Go doesn't have) as well as language constructs of yet another language so not really looking for a better way rather than how the same concept could be written in the language.  But now, about a year later, the fact that the extra code wasn't necessary suddenly occurred to me.  An example, I suppose, of the importance of code reviews.

Hence the first struct is reduces to
type KeyValues struct {
   key string;
   value string;
   count int;
};
and, with the help of the scanner functions, the parse function is reduced to
/* This function scans each line and finds the key at the 3rd INT token.
   There are 5 INT tokens in all per line where the line ends in \n.  It
   then finds the two values for the key in the next two INT tokens. */
func parse(data []byte) {
  var s scanner.Scanner;
  fset := token.NewFileSet();
  file := fset.AddFile("", fset.Base(), len(data));
  s.Init(file, data, nil, scanner.ScanComments);
   
  var intCount int = 0;
  var key string = "";
  var value1 string = "";
  var value2 string = "";
    
  for {
    position, tokenFound, literal := s.Scan();
    if tokenFound == token.EOF {
      if position > 0 { /* just to use position to avoid error for non-use */
      };
      break;
    };
    if tokenFound == token.INT {
      intCount++; /* count INT tokens found */
    };
    /* capture literals found for 3rd, 4th, and 5th INT tokens */
    if intCount == 3 {
      key = literal;
    };
    if intCount == 4 {
      value1 = literal;
    };
    if intCount == 5 {
      value2 = literal;
      captureKeyAndValue(key, value1); /* capture the two    */
      captureKeyAndValue(key, value2); /*   key, value pairs */
      intCount = 0;
      key = "";
      value1 = "";
      literal = "";
    };
    if literal == "\n" { /* end of the line */
    }
  };
}; /* end parse */

Of course, clearing key, value1, and literal isn't really necessary since they will be captured again the next time through the loop.

With this change, captureKeyAndValue is reduced to
/* Capture the key and value pair as data is parsed */
func captureKeyAndValue(key string, value string) {
  /* Capture initial entry */
  if items.count == 0 {
    items.keys[items.count].key = key;
    items.keys[items.count].value = value;
    items.keys[items.count].count = 1;
    items.count++;
    return;
  };
  /* Capture item with duplicate entry */
  for i := 0; i < items.count; i++ {
    if items.keys[i].key == key { /* then key already captured */
       if items.keys[i].value == value {
         items.keys[i].count++;
         return;
       };
    };
  };
  /* Capture new key, value pair */
  if items.count < 30 {
     items.keys[items.count].key = key;
     items.keys[items.count].value = value;
     items.keys[items.count].count = 1;
     items.count++;
  };
}; /* end captureKeyAndValue */

Note: The above illustrates one advantage of Go that is actually met.  That is, unlike C and C# the function doesn't need to declare void as the return value when no value is to be returned.

Thus the entire code is
package main;
import "fmt";
import "go/scanner";
import "go/token";
import "io/ioutil";
func check(e error) {
    if e != nil
    {  panic(e) };
};
/* Struct to save parsed data */
type KeyValues struct {
   key string;
   value string;
   count int;
};
/* Array and struct to save parsed data */
type keys [30]KeyValues;
type Keys struct {
  count int;
  keys;
};
/* Saved data */
var items Keys;
/* Capture the key and value pair as data is parsed */
func captureKeyAndValue(key string, value string) {
  /* Capture initial entry */
  if items.count == 0 {
    items.keys[items.count].key = key;
    items.keys[items.count].value = value;
    items.keys[items.count].count = 1;
    items.count++;
    return;
  };
  /* Capture item with duplicate entry */
  for i := 0; i < items.count; i++ {
    if items.keys[i].key == key { /* then key already captured */
       if items.keys[i].value == value {
         items.keys[i].count++;
         return;
       };
    };
  };
  /* Capture new key, value pair */
  if items.count < 30 {
     items.keys[items.count].key = key;
     items.keys[items.count].value = value;
     items.keys[items.count].count = 1;
     items.count++;
  };
}; /* end captureKeyAndValue */
/* This function scans each line and finds the key at the 3rd INT token.
   There are 5 INT tokens in all per line where the line ends in \n.  It
   then finds the two values for the key in the next two INT tokens. */
func parse(data []byte) {
  var s scanner.Scanner;
  fset := token.NewFileSet();
  file := fset.AddFile("", fset.Base(), len(data));
  s.Init(file, data, nil, scanner.ScanComments);
   
  var intCount int = 0;
  var key string = "";
  var value1 string = "";
  var value2 string = "";
    
  for {
    position, tokenFound, literal := s.Scan();
    if tokenFound == token.EOF {
      if position > 0 { /* just to use position to avoid error for non-use */
      };
      break;
    };
    if tokenFound == token.INT {
      intCount++; /* count INT tokens found */
    };
    /* capture literals found for 3rd, 4th, and 5th INT tokens */
    if intCount == 3 {
      key = literal;
    };
    if intCount == 4 {
      value1 = literal;
    };
    if intCount == 5 {
      value2 = literal;
      captureKeyAndValue(key, value1); /* capture the two    */
      captureKeyAndValue(key, value2); /*   key, value pairs */
      intCount = 0;
      key = "";
      value1 = "";
      literal = "";
    };
    if literal == "\n" { /* end of the line */
    }
  };
}; /* end parse */
/* main entry point */
func main() {
    /* Open and read the entire file into data */
    data, err := ioutil.ReadFile("C:/Source/LearnToCodeGR-Ada/max-col-sum-by-key.tsv");
    check(err);
    fmt.Print(string(data));
    fmt.Print("\n");
  
    /* Parse the data read from the file */
    parse(data);
   
    /* Display parsed results */
    fmt.Printf("Table values count %d\n", items.count);
    for i := 0; i < items.count; i++ {
       fmt.Printf("%d\t%s\t%s\t%d\n", i,items.keys[i].key,items.keys[i].value,items.keys[i].count);
    };
   
    /* Determine key value combination with largest count and display */
    var max KeyValues;
    max.key = "";
    max.value = "";
    max.count = 0; /* how to initialize in the declaration? */
    for i := 0; i < items.count; i++ {
       if items.keys[i].count > max.count {
         max = items.keys[i];
       };
    };
    fmt.Printf("Key and Value of maximum references: %s\t%s\t%d\n",max.key,max.value,max.count);
   
} /* end main */
with the output
Table values count 13
0       1000    1       13
1       1000    2       7
2       2000    1       16
3       2000    2       4
4       3000    1       20
5       4000    1       15
6       4000    3       2
7       4000    2       2
8       4000    4       1
9       5000    2       8
10      5000    1       7
11      5000    3       4
12      5000    4       1
Key and Value of maximum references: 3000       1       20

This code is simpler than before but the past applications could have been as well had the improvement implementing separate function invocations for each of the key, value pairs.  Although each would have needed something to take the place of the Go scanner functions.

What next?

I was going to do some larger application in Go to learn it further.  But, what with the compiler not implementing what the documentation claims and the difficulty in determining where an error actually is located or even what it is, it seems to me that it would be too early to do so.  Of course, I could download a Linux version and see if it is closer to the documentation.  Otherwise, what to do? What to do?



No comments: