Thursday, October 5, 2017

Max Column Sum by Key (part 4)


Further continuing on from my previous explorations that followed Learn to Code Grand Rapids “Building a Real World App in VS 2017, Part I” of August 10 I decided to try Pascal - a language that I first (and last) used in 1990 at SCI in Huntsville Alabama.  At that time I had liked it for the brief time that I was part of a small project.  Maybe since after a couple of weeks I was able to guide others who had been on the project longer about how to program in it.

I had seen that GNAT GPS had Pascal as one of its options as a language.  However, in trying to use it GPS would give me an error that it didn't recognize a compiler for Pascal.  So I did an internet search and found Free Pascal and downloaded its Win32 version.  It installs to C:\FPC\3.0.2 and the install did put the path to its bin subfolder into the Windows system path. 

However running it was extremely ugly.  I tried GNAT GPS again but even with the path to it in the system path, GNAT GPS didn't find it to use as the Pascal compiler.

So I went searching for something else and in doing so I came across Lazarus.  I downloaded lazarus-1.6.4-fpc-3.0.2-win32.exe (128.8 MB) into C:\lazarus.  This proved to be a Windows IDE for the previously installed Free Pascal compiler.  A much, much better solution.  So I was off and running.  (Up and running?)  I sure had no memories of what the language looked like.

It assumed that the source code would be in C:\lazarus\fpc\3.0.2\source.  Multiple kinds of projects can be selected.  I forget which I used at the beginning that created .lpr Lazarus Project Main Source file.  After I had problems with the debugger I tried a Console application project that is a .pas Pascal file for the source.

Since Pascal has both a record structure and the ability to declare and implement a class I have included examples of both.  The record structure for the KeyData record type and the KeyTable class that includes the number of different keys and an array of the KeyData record type.  Unlike in C# and Java the implementation of a procedure or function is not included within the class declaration.  Instead a declaration of the procedure/function is included within the class and separate code has to be provided to implement the procedure or function like happens for an Ada package.  Although, with Ada, the private procedures and functions don't need a declaration in the package specification or even in the package body if the implementation is done prior to any invocation of it.

It took me a while to get the use of a class worked out.  As I found problems with the code that caused exceptions I don't know whether this was because I wasn't storing values within the bounds of an array or because I hadn't created the KeyTable class as associated with a Pascal class; in my code TCustomApplication with a constructor and destructor.

In any case, the example of a record is
// Global types
type

  fieldArray = array[1..3] of integer;
  valueArray = array[1..20] of integer;

  // Data about a particular key
  KeyData = record
    key : integer;        // numeric key
    valueCount : integer; // number of different values associated with key
    values : valueArray;  // different values associated with key
    sums : valueArray;    // number of references to a value
  end;
  keyArray = array[1..30] of KeyData;
where keyArray is an array of the KeyData record while the record has two fields of the valueArray.

The KeyTable class is
// KeyTable class declaration
  KeyTable = class(TCustomApplication)

    public
    keyCount : integer; // number of different keys
    keys : keyArray;    // value of key - one in each array position

    // Data to be retained as the key and its associated value with the
    // maximum number of references
    maxKey : Integer;
    maxValue : Integer;
    maxSum : Integer;


    constructor Create(TheOwner: TComponent); override;
    destructor Destroy; override;

    // Initialize
    procedure Clear();
    // Update keyTable with data from a parsed line
    procedure Update(data : fieldArray);
    // Add the values or increment their sums
    procedure addValues(keyIndex : Integer; value1 : Integer; value2 : Integer);
    procedure add2ndValue(keyIndex : Integer; value2 : Integer);
    procedure updateTotals(key : Integer; value : Integer; sum : Integer);
    // Report the results
    procedure Report();

  end; // KeyTable class

Each of the procedures are separately implemented as with
procedure KeyTable.Clear();

var
  k : integer;

begin // Clear
  keyCount := 0;
  for k := 1 to 30 do
  begin
    keys[k].valueCount := 0;
  end;
  maxKey := 0;
  maxValue := 0;
  maxSum := 0;
  savedData[1] := 0;
  savedData[2] := 0;
  savedData[3] := 0;
end; // Clear
where the name of the class precedes the name of the procedure.  Each has to name its local variables first following the var keyword.  This is also similar to Ada where the variables are named prior to the begin statement.  However, unlike Ada where a loop index type can frequently be determined by the context without needing to be declared, it has to be included after the var keyword for Pascal.  In C kinds of languages it is specified in the loop statement.

Instances of the class are named following a var keyword visible to the none class procedure calls that invoke one of the procedures of the class.  Such as
// Static variables
var
  Table : KeyTable;  // instance of class

// Entry point from operating system
begin // LearnToCodeGR program

  // Initialize
  Table := KeyTable.Create(nil);

  . . .

  // Report the results
  Table.Report();
Here, of course, the name of the class is used in setting the instance of the class while the second dotted notation has been used to specify that that instance of the class is to be invoked.

It should be noted here that, like Ada and unlike C, case makes no difference.  Therefore I named the instance of the class as Table rather than keyTable as I would have in a language where case mattered.

However, initially (as with the other languages) I first determined how to read the max-col-sum-by-Key.tsv file.  In doing so I found that the
Readln(FileIn,buffer);
statement, where buffer is a string, inputs up to the end-of-line LF and/or CR but doesn't include these characters in the data string.  Therefore, the Parse procedure (when it was added) had to change in recognizing the end of the final field - the second value and the third field of interest.

Note that, like Ada, Pascal has both functions, that return a value, and procedures that don't.   Also, in many other ways creating this Pascal application there were reminders of the syntax of Ada.  Which must have been the reason that I liked Pascal on that long ago occasion when I had reason to use it.  I did find the need to surround the code following an if or loop statement with begin … end; blocks a little excessive.  Ada doesn't need the begin but does have the end statement (end if; and end loop;).  C and the like have { } brackets of course.  Whereas { } brackets can be used to surround a comment like /* */ in C.

There is some required sequence of keywords.  For instance, the internal procedures and functions have to be included after the variables (that follow the 'var' keyword) of the program.

The arrays can start anywhere so I have switched to begin some arrays with an index of 1 while leaving others to start at 0 to correspond to the C code.

The code in Pascal is
program LearnToCodeGR;

{$mode objfpc}{$H+}

uses
  {$IFDEF UNIX}{$IFDEF UseCThreads}
  cthreads,
  {$ENDIF}{$ENDIF}
  Classes, SysUtils, CustApp;

// Global types
type

  fieldArray = array[1..3] of integer;
  valueArray = array[1..20] of integer;

  // Data about a particular key
  KeyData = record
    key : integer;        // numeric key
    valueCount : integer; // number of different values associated with key
    values : valueArray;  // different values associated with key
    sums : valueArray;    // number of references to a value
  end;
  keyArray = array[1..30] of KeyData;

// KeyTable class declaration
  KeyTable = class(TCustomApplication)

    public
    keyCount : integer; // number of different keys
    keys : keyArray;    // value of key - one in each array position

    // Data to be retained as the key and its associated value with the
    // maximum number of references
    maxKey : Integer;
    maxValue : Integer;
    maxSum : Integer;


    constructor Create(TheOwner: TComponent); override;
    destructor Destroy; override;

    // Initialize
    procedure Clear();
    // Update keyTable with data from a parsed line
    procedure Update(data : fieldArray);
    // Add the values or increment their sums
    procedure addValues(keyIndex : Integer; value1 : Integer; value2 : Integer);
    procedure add2ndValue(keyIndex : Integer; value2 : Integer);
    procedure updateTotals(key : Integer; value : Integer; sum : Integer);
    // Report the results
    procedure Report();

  end; // KeyTable class

// Static variables
var
  Table : KeyTable;  // instance of class

  savedData : fieldArray; // data to be visible between multiple procedures

// KeyTable constructor and destructor
constructor KeyTable.Create(TheOwner: TComponent);
begin
  inherited Create(TheOwner);
  StopOnException:=True;
  Initialize();
end; // Create

destructor KeyTable.Destroy;
begin
  inherited Destroy;
end;


// KeyTable procedures
procedure KeyTable.Clear();

var
  k : integer;

begin // Clear
  keyCount := 0;
  for k := 1 to 30 do
  begin
    keys[k].valueCount := 0;
  end;
  maxKey := 0;
  maxValue := 0;
  maxSum := 0;
  savedData[1] := 0;
  savedData[2] := 0;
  savedData[3] := 0;
end; // Clear

procedure KeyTable.Update(data : fieldArray);
// This procedure first checks if the key is new and, if so, adds it to
// the keys array.  It then does similar for the values associated with it.
// Note: data[1] is the key while data[2] and data[3] are the two values
//       associated with the key.

var
  keyIndex : integer;
  k : integer;

begin // Update

  // Save data to be updated for use by updateTotals since needed by multiple functions
  savedData[1] := data[1];
  savedData[2] := data[2];
  savedData[3] := data[3];

  // Check whether the key is already in the table
  keyIndex := -1;
  for k := 1 to Table.keyCount do
  begin
    if (keys[k].key = data[1]) then
    begin
      keyIndex := k; // key already in the table
      break; // exit loop
    end;
  end; // for loop

  if keyIndex < 0 then // key not in the table
  begin // add the key
    keyCount := keyCount + 1;
    keyIndex := keyCount;
    keys[keyIndex].key := data[1];
  end; // end if

  // Add the values for the key
  addValues(keyIndex, savedData[2], savedData[3]);

end; // Update

procedure KeyTable.addValues(keyIndex : Integer; value1 : Integer; value2 : Integer);

var
  valueIndex : integer;
  v : integer;

begin // addValues

  // Check whether the first value is already in the table
  valueIndex := -1;
  for v := 1 to keys[keyIndex].valueCount do
  begin
    if (keys[keyIndex].values[v] = value1) then
    begin // value already in the table
      valueIndex := v;
      keys[keyIndex].sums[v] :=  // increment its number of references
      keys[keyIndex].sums[v] + 1;
      if (value1 = value2) then // 2nd value the same
      begin
        keys[keyIndex].sums[v] := // increment again
          keys[keyIndex].sums[v] + 1;
      end;
      // check if new max
      Table.updateTotals(savedData[1],value1,keys[keyIndex].sums[v]);
      break; // exit loop
    end;
  end; // end loop

  if (valueIndex < 0) then // value not yet in table
  begin // add value to the table - index points to last value checked
    keys[keyIndex].valueCount := keys[keyIndex].valueCount + 1;
    v := keys[keyIndex].valueCount;
    keys[keyIndex].values[v] := value1;
    keys[keyIndex].sums[v] := 1;
    if (value1 = value2) then
    begin
      keys[keyIndex].sums[v] := // increment
        keys[keyIndex].sums[v] + 1;
      // check if new max
      Table.updateTotals(savedData[1],value1,keys[keyIndex].sums[v])
    end;
//  else
    if (value1 <> value2) then
    begin
      Table.add2ndValue(keyIndex, value2);
    end;
  end; // end outer if

end; // addValues

// Add the second value or increment its sum
procedure KeyTable.add2ndValue(keyIndex : integer; value2 : integer);

var
  valueIndex : integer;
  v : integer;

begin
  // Check whether the second value is already in the table
  valueIndex := -1;
  for v := 1 to keys[keyIndex].valueCount do
  begin
    if (keys[keyIndex].values[v] = value2) then
    begin // value already in the table
      valueIndex := v;
      keys[keyIndex].sums[valueIndex] := // increment its number of references
      keys[keyIndex].sums[valueIndex] + 1;
      // check if new max
      updateTotals(savedData[1],value2,keys[keyIndex].sums[valueIndex]);
      break; // exit loop
    end;
  end; // end loop

  if (valueIndex < 0) then // value not yet in table
  begin // add value to the table
    v := keys[keyIndex].valueCount;
    keys[keyIndex].values[v] := value2;
    keys[keyIndex].sums[v] := 1;
    keys[keyIndex].valueCount := keys[keyIndex].valueCount + 1;
  end;

end; // add2ndValue

procedure KeyTable.updateTotals(key : Integer; value : Integer; sum : Integer);
begin
  if (sum > maxSum) then
  begin
    maxKey := key;
    maxValue := value;
    maxSum := sum;
  end;
end; // updateTotals

procedure KeyTable.Report();
begin
  WriteLn( 'Key ', maxKey, ' with Value ', maxValue, ' has maximum Sum of ', maxSum );
end; // Report


// General functions and procedures

function toInt(data : string; iStart : integer; iEnd : integer) : integer;
  const
    NINE = '9';
    ZERO = '0';
    numZero = integer('0');

  var
    index : integer;
    digit : integer;
    m : integer = 1;      // multiplier for shift
    number : integer = 0; // Numeric result

begin // toInt
  index := iEnd; // loop in reverse
  while (index >= iStart) do
  begin
    if ((data[index] >= ZERO) and (data[index] <= NINE)) then
    begin
      digit := integer(data[index]) - numZero; // convert ASCII to digit
      number := number + (m * digit);
      m := m * 10;
      index := index - 1;
    end;
  end;
  toInt := number; // return converted value
end; // toInt

// Parse each line of data in the buffer to obtain the three fields of
// interest, converting those fields to integers into an array, and
// then updating a data structure to retain the data for evaluation
// when the complete buffer has been parsed.
procedure Parse(count : Integer; data : string);
  const
    HT : char = #9; // horizontal tab

  var
    convertedValue : integer;
    nextField : integer = 0;  // index of the beginning of next field
    startField : integer;     // range of indexes of
    numFields : integer = 0;  // index into dataFields array
    dataFields : fieldArray;  // Integer values of the three fields of interest

    index : integer = 0;

begin // Parse
  while index < count do
  begin
    // Parse the buffer line
    if (data[index] = HT) then // beginning of a field
    begin
      startField := nextField; // save starting index
      nextField := index + 1;  // the next byte will contain part of next field
      if (numFields > 0) then
      begin
        //convert and store in dataFields
        dataFields[numFields] := toInt(data, startField, index - 1);
      end; // end if numFields > 0
      numFields := numFields + 1;
    end; // end if HT
    index := index + 1;
  end; // while loop
  // Convert and store last field since it ends at the last char of data
  convertedValue := toInt(data, nextField, count);
  dataFields[numFields] := convertedValue; //toInt(data, nextField, count);
  // Save the data to determine the key, value combination with
  // maximum number of references
  Table.Update(dataFields);
end; // Parse

// Variables needed by the main entry point program
var
  FileIn : Text;
  buffer : string;   // line of data from the Text file

// Entry point from operating system
begin // LearnToCodeGR program

  // Initialize
  Table := KeyTable.Create(nil);
  Table.Title:='LearnToCodeGR';

  // Open file
  AssignFile(FileIn, 'C:\\Source\\LearnToCodeGR\\max-col-sum-by-Key.tsv');
  Reset(FileIn); // Ensure that at the start of the file

  // Read and Parse each line
  while not Eof(FileIn) do
  begin
    Readln(FileIn,buffer); // input the string of the file line
    WriteLn(Length(buffer));
    WriteLn(buffer);

    // Parse the line and update to retain all the necessary data.
    Parse(Length(buffer), buffer);
  end;

  Close(FileIn);

  // Report the results
  Table.Report();

  // Terminate
  Table.Free;

end. // LearnToCodeGR program

Note that in KeyTable.addValues I wanted to use an else statement but had trouble getting the compiler to accept it.  So the else is commented out and is followed by a second if statement to obtain the opposite result.  That is, the two values being equal prior to the //else and being unequal following it.