Tuesday, September 19, 2017

Max Column Sum by Key

In regards to Learn to Code Grand Rapids “Building a Real World App in VS 2017, Part I” I decided, following the meeting, to create my own code in Visual Studio C# and then, after doing so, to create the application once again in Ada to illustrate the same application in two different languages.  This was followed by another version of the application in Python as a way to learn something about that language.

Since I didn’t completely follow what the goal was from the class I had to decide on a modification of the goal to solve.  (I had the disadvantage of not being able to see the display and my notebook at the same time since I wear different glasses for distance and close-up.  Therefore, I was continually attempting to switch back and forth and missing the switches on the display as well as losing my attention.)

A large share of the reason for writing my own applications was just to have something to do.  A second reason was to show actual code that involved more than supplying a wrapper to execute a solution that had already been created and wasn’t visible to the class to view what might have been required to code the solution. 

Below is a discussion of what I did.  After doing an initial C# application without knowing what the disk file of the class contained I then got a copy of a truncated copy of that file from Jeffrey Fuller and did an Ada application using that file.  I then redid the C# application to use the supplied file.  So you may want to just skip the description of the initial C# application – although it does illustrate a problem with C# (or at least the version we were able to use).

After that, thinking about what else to do, I decided to see what the Python language was all about.  So I rewrote the application in Python as a way of learning something about it.

The sections below discuss each of these attempts and provide the code.

Discussion of the Visual Studio C# Application

When I did the VS C# project I wanted to produce what in Ada would be a somewhat record structure to contain the data parsed from the supplied file.  The structure to contain records identifying the Key from the records of the file and for each Key arrays of other record structures that would identify the Values from the file associated with the Key and the number of instances each particular Value occurred for the Key.

Since I didn’t have access to the file to begin with I created a small text file of my own from my assumptions of what it was supposed to look like. 

I then created a C# Windows application where a C# class was used to allow the user to select the file via a File icon with a drop down to Open the file.  This C# provided class allows the user to search through the folders for that which contains the file and then select the file to be opened.  When doing a Windows application, Visual C# allows widgets to be selected from a C# panel and dragged to a form panel of the application (in this case, Form1 [Design]) to build what the form is to look like when the application is run.  Then when the widget – in this case the Open icon – is double clicked C# inserts the outline of an event handler that will be executed when the Open is clicked when the application is run.  Within this outline, the coder/programmer, in this case me, adds the code to be executed.  C# will name these and then they can be renamed as desired which I have done with event handler below which shows the code that I added.

private void openToolStripMenuItem_Click(object sender, EventArgs e)
{ // to open the file
  Stream myStream = null;

  // Get an instance of a FileDialog.
  OpenFileDialog openFileDialog = new OpenFileDialog();

  // Use a filter to allow only certain file extensions.
  openFileDialog.InitialDirectory = "c:\\";
  openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
  openFileDialog.FilterIndex = 2;
  openFileDialog.RestoreDirectory = true;

  if (openFileDialog.ShowDialog() == DialogResult.OK)
      if ((myStream = openFileDialog.OpenFile()) != null)
        using (myStream)
          // Read the file and build tables from the data.
    catch (Exception ex)
      MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
} // openToolStripMenuItem_Click
where everything between the opening and closing {} brackets was added by me.

Stream is a C# class.  OpenFileDialog is also provided by C# while setting the starting point (C:\\), the Filter, when to invoke the C# provided ShowDialog, checking that the file was opened, etc is provided by the coder.  In addition, ReadAndParse and ReportResults are methods provided by me for this particular application.

private void ReadAndParse(Stream file)
  System.Byte[] buffer = new byte[file.Length]; // buffer to contain file

  int numBytesRead = 0;
    // Read everything in file.
    numBytesRead = file.Read(buffer, 0, (int)file.Length);
  catch (Exception ex)
    MessageBox.Show("Error reading file");

  Parse(numBytesRead, buffer); // Parse the bytes read
} // ReadAndParse
where this method expects that there is sufficient stack space to be able to contain the contents entire file.  Of course, if the expected file is too large this code would need to be different with portions read at a time and the Parse method would need to be invoked to examine those portions that were read with special code to place code remaining at the end at the beginning and the next portion of the file to be read into the buffer from that location onward.  (Or else use a pair of buffers where the next bytes of the file can be read into the second buffer when running out of the first.  Then code that will read up to the end of the first and switch the buffer being examined to continue.)

The Parse method assumed that the Key began at the beginning of the file record and then had a separator followed by a single Value.
private void Parse(int count, Byte[] data)
  // Build a table of keys (first field of a record) and, for each key,
  // the number of items of the key.  Also, for each key, a table that
  // contains each of the unique values (second field of a record) with
  // the number of times the value is contained in the data.
  // The first field ends with a space or a TAB.  The second field ends
  // the same way or at the end of record (in this case the CR LF pair).
  // All other data is ignored until end-of-file (EOF) or end of record.
  const Byte Space = 32;
  const Byte Tab = 9;
  const Byte CR = 13;
  const Byte LF = 10;
  //enum Fields { Key, Value, RecEnd };
  //int pos = (int)Fields.Key;
  const int Key = 0;
  const int Value = 1;
  const int Bypass = 2;
  const int RecEnd = 3;
  int pos = Key;
  Byte[] keyField = new byte[16]; // max of 16 bytes for a key
  int index = 0; // index into keyField for next Byte

  int keyValue = 0;
  int valueValue = 0;

  for (int i = 0; i < count; i++) // examine each byte of data
    byte debugxx = data[i];
    // Parse key
    switch (pos)
      case Key:
        if ((data[i] != Space) & (data[i] != Tab) & (data[i] != CR) & (data[i] != LF))
          keyField[index] = data[i];
          index += 1;
        else // at end of field
          // Convert the byte string to an integer.
          // Check the StructClasses to find the Key or add a new one.
          // Remember the index into the KeyValueRec to use for the field Value.
          keyValue = ConvertToNumeric(keyField, index);
          index = 0; // reinitialize to capture value associated with key
          pos = Bypass;
          if ((i + 1) < count)
            if ((data[i + 1] >= 48) & (data[i + 1] <= 57))
            { pos = Value; }
      case Bypass: // ignore text until a numeric is next
        if ((i + 1) < count)
          if ((data[i + 1] >= 48) & (data[i + 1] <= 57))
          { pos = Value; }
      case Value:
        if ((data[i] != Space) & (data[i] != Tab) & (data[i] != CR) & (data[i] != LF))
          keyField[index] = data[i];
          index += 1;
        else // at end of field
          // Convert the byte string to an integer.
          // Check the StructClasses to find the Key or add a new one.
          // Remember the index into the KeyValueRec to use for the field Value.
          valueValue = ConvertToNumeric(keyField, index);
          index = 0; // reinitialize to capture next key
          pos = RecEnd; // expecting no more numeric fields before next Key
          if ((i + 1) < count)
            if ((data[i + 1] >= 48) & (data[i + 1] <= 57))
            { pos = Key; }

          // Update Value for Key in table
          Update(keyValue, valueValue);
      case RecEnd:
        if ((i + 1) < count)
        {   if ((data[i + 1] >= 48) & (data[i + 1] <= 57))
            { pos = Key; }
        index = 0;
    } // end of switch
    if (i >= count)
  } // end for
} // Parse
This method invokes two more methods, ConvertToNumeric and Update that is to add the new Key, Value pair into a structure that can be examined after the file data has been completely parsed.  Note that the Key and Value fields are the only ones assumed to contain digits (the ASCII values of 48 thru 57.  Constants of Zero and Nine should have been assigned to these values.  Also note that the enum should have been recognized by C# as a valid construct but was not by this particular version of the compiler so the constants Key, Value, Bypass and RecEnd were created instead.

private int ConvertToNumeric(byte[] keyField, int index)
  // Convert the byte string to an integer.
  // Check the StructClasses to find the Key or add a new one.
  // Remember the index into the KeyValueRec to use for the field Value.
  int start;
  start = 0;
  int finish;
  finish = index - 1;
  if (finish < 0) // debug
  { MessageBox.Show("ConvertToNumeric error 1"); }
  int keyInt = 0;
  if ((keyField[finish] >= 48) & (keyField[finish] <= 57))
    keyInt = keyField[finish] - 48; // convert ASCII to digit
  int m = 10;
  while (finish >= start)
    if (finish < 0) // debug
    { MessageBox.Show("ConvertToNumeric error 2"); }
    if ((keyField[finish] >= 48) & (keyField[finish] <= 57))
      keyInt = keyInt + (m * (keyField[finish] - 48));
    m = m * 10;
  return keyInt;
} // ConvertToNumeric

The Update method that follows is not what I wanted and, since I only did a few records in the file that I created, I don’t know if it is correct.  The Ada application does parse the actual file which I obtained later.  With C# I wanted to use a class to implement the structure that I wanted.  There is a struct keyword in C# but it really a type of class and not the same as a struct in C.  Therefore I created a KeyValueRec class and a Keys class where KeyValueRec was

Don't take this code seriously since at this time I don’t remember what commented out code was commented out due to trying to find something that would execute and what was commented out after I found that the version of C# provided just wouldn’t execute no matter what.  This turned out because C# would not allow arrays of the KeyValueRec class even though it would compile without error.

    public class KeyValueRec
    { // to be used as a record structure
        public int key;        // key associated with values
        public int valueCount; // number of different values
        public int[] values = new int[30];
        public int[] sums = new int[30];
    //    public KeyValueRec() // constructor
    //    {
    //        key = -1;
    //        valueCount = 0;
    //    } // end constructor

    } // end struct KeyValueRec
and identifies the key and valueCount specifies the number of values added to the values and sums arrays.

Keys had to be mostly commented out because C# compiled OK but any use of an object of the class resulted in an exception.  (Not an ideal thing for a compiler to do.  If it isn't going to process a structure, a compiler should disallow it when attempting to build the solution.)
   public class Keys
    { // to be used as a record structure
        public int keyCount;  // number of different keys
     // public static int[] keyTable = new int[100]; // space for 100 different keys
     // public static KeyValueRec[] Value = new KeyValueRec[100]; // values associated with each key
     // public KeyValueRec[] keyTable = new KeyValueRec[100]; // space for 100 different keys with values
     // public KeyValueRec keyTable1 = new KeyValueRec(); // space for 1 key with values
     // public static KeyValueRec[] Value = new KeyValueRec[100]; // values associated with each key

        public Keys() // constructor
        { // Initialize to no keys and no values for a key -- do when add a key
            keyCount = 0;
        //  keyTable1.key = 0;
        //  keyTable1.valueCount = 0;
        //  keyTable1.values[0] = 0;
        //  keyTable1.sums[0] = 0;

            for (int i = 0; i < 100; i++)
                //keyTable[i].valueCount = 0;
                //keyTable[i].key = 0;
                //keyTable[i].values[0] = 0;
                //keyTable[i].sums[0] = 0;
                ////     Value[i].valueCount = 0;
                ////  keyTable[i].
        } // end constructor

        // Update tables with 'key' and 'value'
     // public void Update(int key, int value)
     // {
            // Search keyTable
     //     int keyIndex = 0;
     //     for (int i = 0; i < keyCount; i++)
     //     {
     //         if (key == keyTable[i].key)
     //         {
     //             keyIndex = i;
     //             int valueIndex = 0;
                    // Search array of values associated with key
     ////           for (int j = 0; j < Value[i].valueCount; j++)
     //             for (int j = 0; j < keyTable[i].valueCount; j++)
     //             {
     ////               if (value == Value[i].value[j])
     //                 if (value == keyTable[i].values[j])
     //                 { // Increment number of references to the value associated with key
     //                    valueIndex = j;
     //                    Value[i].sum[j]++;
     //                    keyTable[i].sums[j]++;
     //                 }
     //             }
     //             if (valueIndex == 0)
     //             { // value not found -- add new value to list
     //                 Value[i].value[Value[i].valueCount] = value;
     //                 Value[i].sum[Value[i].valueCount] = 1; // first instance
     //                 Value[i].valueCount++;
     //                 keyTable[i].values[keyTable[i].valueCount] = value;
     //                 keyTable[i].sums[keyTable[i].valueCount] = 1; // first instance
     //                 keyTable[i].valueCount++;
     //             }
     //         }
     //     }

     //     if (keyIndex == 0) // key not found
     //     {   try
     //         {// add new key with its value to the tables
     //             keyTable1.key = key;
     //             keyTable1.values[0] = value;

     //             keyTable[keyCount].key = key;
     //             keyTable[keyCount].values[0] = value;
     //             keyTable[keyCount].sums[0] = 1; // first instance of value for key
     //             keyTable[keyCount].valueCount = 1; // first array entry
     //             keyCount++;
     //         }
     //         catch (Exception ex)
     //         {
     //             MessageBox.Show("Error: Could not add to keyTable " + ex.Message);
     //         }

     //     }

     // } // end Update

   } // end class Keys

The Update method below has commented out code from when these classes were attempted to be used.  Instead, the following objects (keyTable, valueKey, etc) were added at the beginning of the Form1 class that Visual C# provided.
namespace MaxColSumbyKey
  public partial class Form1 : Form
    public Form1()
    Keys keyTable = new Keys(); // instantiation of table to contain the keys and their values
    // Note: Each object begins with "value" to associate all the objects as part of the same
    int[] valueKey = new int[100];   // key associated with values
    int[] valueCount = new int[100]; // number of different values
    int[,] valueValues = new int[100, 30]; // different values of the key
    int[,] valueSums = new int[100, 30];   // sum of values of particular key
Here, following the lines that C# provided to begin the Form1 class in the MaxColSumbyKey namespace, is an instantiation of the Keys class as it ended up without instantiations of arrays of the KeyValueRec class.  In place of the KeyValueRec class the separate arrays of valueKey, valueCount and the double indexed arrays of valueValues and valueSums are declared where the prefix “value” was used with each in an attempt to associate them.  The Update method below then references them although it most likely needs some work.

// Update tables with 'key' and 'value'
public void Update(int key, int value)
  // Search keyTable
  int keyIndex = -1;
  for (int i = 0; i < keyTable.keyCount; i++)
    //if (key == keyTable.keyTable[i].key)
    //if (key == valueTable[i].key)
    if (key == valueKey[i])
      keyIndex = i;
      int valueIndex = -1;
      // Search array of values associated with key
      //for (int j = 0; j < Value[i].valueCount; j++)
      //for (int j = 0; j < valueTable[i].valueCount; j++)
      for (int j = 0; j < valueCount[i]; j++)
        //if (value == Value[i].value[j])
        //if (value == valueTable[i].values[j])
        if (value == valueValues[i, j])
        { // Increment number of references to the value associated with key
          valueIndex = j;
          valueSums[i, j]++;
          return; // value added
      if (valueIndex < 0)
      { // value not found -- add new value to list
        //         Value[i].value[Value[i].valueCount] = value;
        //         Value[i].sum[Value[i].valueCount] = 1; // first instance
        //         Value[i].valueCount++;
        //valueTable[i].values[valueTable[i].valueCount] = value;
        valueValues[i, valueCount[i]] = value;
        //valueTable[i].sums[valueTable[i].valueCount] = 1; // first instance
        valueSums[i, valueCount[i]] = 1; // first instance
        return; // value added

  if (keyIndex < 0) // key not found
    {// add new key with its value to the tables
     //       keyTable.keyTable1.key = key;
     //       keyTable.keyTable1.values[0] = value;

     //valueTable[keyTable.keyCount].key = key;
     //valueTable[keyTable.keyCount].values[0] = value;
     //valueTable[keyTable.keyCount].sums[0] = 1; // first instance of value for key
     //valueTable[keyTable.keyCount].valueCount = 1; // first array entry
     valueKey[keyTable.keyCount] = key;
     valueValues[keyTable.keyCount, 0] = value;
     valueSums[keyTable.keyCount, 0] = 1; // first instance of value for key
     valueCount[keyTable.keyCount] = 1; // first array entry
   catch (Exception ex)
     MessageBox.Show("Error: Could not add to keyTable " + ex.Message);


} // end Update

The inadequate file used follows.
2017 5
2016 4
2017 1
2017 5
where CR and NL end each record (since created as a DOS compliant file) but aren’t visible.

Following the discussion of the GNAT Ada application, there will be another discussion of a C# application where the sample file provided by Jeffrey Fuller will be used and arrays will be used from the beginning to keep track of the data.

Discussion of the GNAT Ada Application

Ada applications consist of packages that are containers for code and static variables and subroutines that are either procedures that have input and output parameters or functions that only have input parameters and return a single result (corresponding to a non void C or C# method).

GNAT provides various libraries that can be used – some of which interface to Windows (or Linux depending on the operating system being used) to read files and the like.  I haven’t, as yet, come across a library that would allow a Windows interface like what is available with C# (or a Linux interface that is available with Mono; the Linux variation of C#).  Therefore, the Ada application just has the filename with its path encoded into the application.

This particular Ada application reads the file supplied by Jeffrey Fuller.  After getting it I found that it had extraneous text prior to the Key and two Value fields (or what I assume are Value fields) rather than one.  Examining the data I found that each record only had a trailing NL (new line) so not a DOS formatted file.  And that the final record was without the trailing NL.  Preceding the Key and each of the two Value fields was a HT (horizontal tab).  That is, the first two records look like
49 44 57 49 50 95 78 85 77 9 49 48 48 48 9 49 9 49 10
49 44 57 49 50 95 78 85 77 9 49 48 48 48 9 50 9 49 10
which translate to ASCII as below.
 1  ,  9  1  2  _  N  U  M HT 1  0  0  0 HT 1 HT  1 NL
 1  ,  9  1  2  _  N  U  M HT 1  0  0  0 HT 2 HT  1 NL
where the numbers below are the byte positions.
 1  2  3  4  5  6  7  8  9 10 11 12 3 14 15 6 17 18 19

Therefore I decided to find the Key with the greatest number of combined value fields of a particular Value.  In the two record sample this would be a value of 1 since there are three instances of 1 and only one instance of 2 for the Key of 1000.

The Ada Main procedure is the entry point into the application when it is executed.  Since it is only a procedure, variable objects declared in it are on the stack rather than in static memory.  Therefore, as is the normal practice, it is a minimal procedure that invokes another procedure or function in a package of the application.  In this instance, the Main procedure is as follows.

with Max_Col_Sum_by_Key;

procedure Main is

begin -- Main

  -- Execute the project from the beginning.

end Main;
where the “with” statement informs the compiler of the package where the Open procedure can be found.

Ada packages have a specification that provides the declarations that are to be visible to other parts of the application and a body where the implementation is encoded as well as declarations and variables that are to be visible only within the particular package.  (Note: Ada packages can somewhat be thought of as similar to namespaces in C#.)  The specification for this package is
package Max_Col_Sum_by_Key is

  procedure Open;
  -- Main entry point to package

end Max_Col_Sum_by_Key;

That is, the declaration of the Open procedure is the only visible construct of the package.  The body of the package is
with Windows_Itf;

package body Max_Col_Sum_by_Key is

  type Unsigned_Byte
  --| Unsigned 8-bit byte
  is mod 2**8;
  for Unsigned_Byte'Size use 8;

  type Unsigned_Byte_Array
  --| Unconstrained array of unsigned bytes
  is array (Integer range <>) of Unsigned_Byte;

  type Data_List_Type
  -- Structure defining data to be passed to Update procedure
  is record
    Count : Integer;
    -- Number of data bytes in List
    List  : Unsigned_Byte_Array(1..10); -- with spare bytes
    -- Captured data
  end record;

  type Key_Count_Type
  -- Maximum of 100 unique keys allowed
  is new Integer range 0..100;

  type Value_Count_Type
  -- Maximum of 30 unique values allowed
  is new Integer range 0..30;

  type Value_Data_Type
  is record
    Value : Integer;
    -- Unique value
    Sum   : Integer;
    -- Sum of the instances of the Value
  end record;

  type Value_Array_Type
  is array (1..Value_Count_Type'last) of Value_Data_Type;

  type Value_List_Type
  is record
    Count : Value_Count_Type;
    -- Number of unique values in the List
    List  : Value_Array_Type;
    -- List of unique values associated with a key
  end record;

  type Value_Pair_Type
  -- Unique values for each of the two value fields of a particular key
  is array (1..2) of Value_List_Type;

  type Key_Data_Type
  -- Data to be retained for each unique key
  is record
    Key    : Integer;
    -- Unique input key as converted to a numeric value
    Values : Value_Pair_Type;
    -- Lists of unique values for each of the two values assigned to a key
  end record;

  type Key_Array_Type
  is array (1..Key_Count_Type'last) of Key_Data_Type;

  type Key_List_Type
  is record
    Count    : Key_Count_Type;
    -- Number of unique keys in the list
    Key_Data : Key_Array_Type;
    -- List of unique keys with their associated values
  end record;

  -- Capured data from input file
  : Key_List_Type;

  -- Number of bytes read from file
  : Integer;

  -- Size of buffer
  : constant Windows_Itf.DWORD := 5000;

  -- Data read from file
  : Unsigned_Byte_Array( 1..Integer(Buffer_Size) );

  -- For debugging Update
  : Integer := 0;
  -- Procedure declarations

  procedure Parse;
  -- Parse the data of the file and then continue to produce the result.

  procedure Read
  -- Read and save the data of the file.
  ( Handle  : in Windows_Itf.File_Handle;
    -- Handle of file
    Success : out Boolean
    -- True if Read was successful

  procedure Report;
  -- Find key with most different values and report

  procedure Update
  ( Key    : in Data_List_Type;
    -- Key extracted from file buffer
    Value1 : in Data_List_Type;
    -- First value extracted from file buffer
    Value2 : in Data_List_Type
    -- Second value extracted from file buffer

  -- Procedure implementations

  procedure Open is separate;

  procedure Parse is separate;

  procedure Read
  ( Handle  : in Windows_Itf.File_Handle;
    Success : out Boolean
  ) is separate;

  procedure Report is separate;

  procedure Update
  ( Key    : in Data_List_Type;
    Value1 : in Data_List_Type;
    Value2 : in Data_List_Type
  ) is separate;

end Max_Col_Sum_by_Key;

The Windows_Itf package is a special package to provide types, variables, and procedure and function declarations to interface to GNAT library supplied routines that support Windows.  These libraries are provided when the publicly available GNAT Ada and C compilers are installed via the internet.  I’ve created a much more extensive interface package over the years to support the use of Windows and Linux invocations and have selected certain variables and procedure/function declarations for use by the Max_Col_Sum_by_Key application to include in the Windows_Itf package although not all were used by this application.

This package body contains the declarations to contain the file data as opened by the Open procedure and input by the Read procedure.  The Parse, Update and Report procedures are similar in purpose to those of the C# application.  The code of each of these procedures could be provided within this package body but to keep the amount of material that must be scrutinized at a time to a minimum, it is normal practice to provide the implementation in separate files. 

As per my normal practice I have declared record structures to associate an array with the variable that keeps track of the number of array items that actually contain data.  Note that Ada arrays are usually declared to begin with an index of 1 rather than 0 as in C and C# although this isn’t necessary.  That is, an array type could be declared to range from -10 to -1 if this was desirable to mimic that of a piece of equipment, for instance.

The Key_Table static memory object has been declared as I wanted to do in the C# implementation by building up a complex record type.  That is, the Key_Table to keep track of the parsed data is declared to be of the Key_List_Type which is a record containing the Count of the number of unique keys that have been parsed and an array that has been sized large enough to hopefully contain all the different keys of the file.  Note that the Count and the Key_Array_Type have been sized using a Key_Count_Type rather than just using an Integer.  This is because Ada is strongly typed and by using such types when the application is run an exception (unless the feature has been turned off) will be thrown if the range is exceeded.  This prevents storage of data beyond the limits of the object and thus overwriting other code.  It is also useful to the coder since it prevents, for instance, confusing the index variable for one array with that of another.  That is, the Ada compiler will refuse to allow a variable of one type from being used where the array has been declared to use another type.  I associate the count of the number of array positions used and the array into their own record type to avoid confusion of what value indicates the number of used array elements.

The Key_Data_Type has been declared to contain the value of the Key and a pair of Values for the two value fields of each file record.  The array elements of each of these are sized to the Value_Count_Type and consist of the unique Value from the file record for a particular Key and the Sum of the number of times that particular Value is contained in the file. 

Thus the structure is
      +--> for each key -- Value 1 and Value 2
                              |           |
                              v           v
                          Value Sum  Value  Sum
                                       |     |
                                       v     v
                                    unique  # of instances
                                    values  of the value
Where the structure to the right is repeated for each different Key.  The Value 1 and Value 2 structures are identical and are associated with the first and second Values in a record of the file.  The Value array and the Sum array will attain the same length (e.g., Count) for any particular Key with each instance of a Value containing the particular value extracted from the file data and the associated Sum the running count of the number of times the Value was specified in the file for the particular Key.

The Key_Table is declared in the package body so as to be static.  That is, to remain from one call to Update to the next.  If declared in the Update procedure the memory used would be that of the stack and hence the table would be freed upon return from the procedure so it wouldn't accumulate.  The data contained in the table will be supplied by the Update procedure and perused by the Report procedure to obtain the Key with the most references to a particular Value.

The Open procedure is
with Text_IO;

separate( Max_Col_Sum_by_Key )

procedure Open is

  -- Result of the Close
  : Boolean;

  -- Handle of file
  : Windows_Itf.File_Handle;

  -- Result of the Read
  : Boolean;

  : String(1..51) := "C:/Source/LearnToCodeGR-Ada/max-col-sum-by-key.tsv ";

  use type Windows_Itf.File_Handle;

begin -- Open

  -- Make a C terminated string.
  CName(51) := ASCII.NUL;

  File := Windows_Itf.Open_Read( Name => CName );
  if File = Windows_Itf.Invalid_File_Handle then
    Text_IO.Put_Line( "File not found" );
  end if;

  -- Read and save the data of the file.
  Read( Handle  => File,
        Success => Success );

  -- Close
  Done := Windows_Itf.Close_File( Handle => File );

  -- Parse the file and report the results.
  if Success then
  end if;

end Open;
Since Windows based functions are much more limited than in Visual C# the location of the file to be opened is supplied via the CName variable and, due to the nature of the GNAT supplied C function the string has to be NUL terminated.  Text_IO is an Ada supplied package.  Note: I provided the NUL termination before I changed the Windows_Itf Open_Read function to also provide a terminating NUL so this C string pathname ends up doubly NUL terminated.  Since the need for a NUL terminated string shouldn't have to be considered by the Open procedure, I should have made the changed to the Open_Read function first so only the Windows_Itf package would have needed to know what the GNAT provided routine needed.  The Windows_Itf package function (after the modification) is
  function Open_Read
  ( Name : String;
    Mode : Mode_Type := Text
  ) return File_Handle is

    FileDesc : GNAT.OS_Lib.File_Descriptor;

    : String(1..Name'Length+1);

    function File_Descriptor_to_Handle
    is new Unchecked_Conversion( Source => GNAT.OS_Lib.File_Descriptor,
                                 Target => File_Handle );

    function to_Mode is new Unchecked_Conversion( Source => Mode_Type,
                                                  Target => GNAT.OS_Lib.Mode );

  begin -- Open_Read

    NameWithNULTerminator(1..Name'Length) := Name;
    NameWithNULTerminator(Name'Length+1) := ASCII.NUL;
    FileDesc := GNAT.OS_Lib.Open_Read( Name  => NameWithNULTerminator'address,
                                       FMode => to_Mode(Mode) );
    return File_Descriptor_to_Handle( FileDesc );

  end Open_Read;
where GNAT.OS_Lib is a GNAT Ada supplied package.  It needs to be passed the pointer to the path Name which is why the ‘address operator is used to pass the address of the NUL terminated path object rather than the object itself.

After the file has been opened (and verified that a file was found to be opened), the Read procedure is invoked to read the contents of the file into the Buffer declared in the package body.  As with the C# application, for a bigger file this and Parse would need to be coordinated to partially read and parse the file until the complete file had been processed.  The Read procedure has been declared to return the Success boolean to indicate whether the file was successfully read.  If it was, then the Parse followed by Report procedures are called.

In the code of the Read procedure that follows, four different Ada packages are referenced.  The Windows_Itf doesn’t need a “with” statement since it was withed for the package body so the Ada compiler already knows about it.

The Unchecked_Conversion's are to type cast the Source reference type to the Target type.  It is assumed that the coder knows what they are doing when an Unchecked_Conversion is used and that the size (width) of the Source and Target are the same since Unchecked_Conversion only overlays the object of the Source type onto that of the Target type and doesn't convert one type to the other.  These can be needed at times due to the strong typing of Ada.  That is, unlike C, two variables of different types that are really variations of an integer cannot be used in place of one another.  For instance, in the arrays that were declared in the package body, a different type was specified for the Key array versus the Value array.  Therefore, Ada won’t allow an index mix-up of specifying an index for the Key array when referencing the Value array.  This is why it is good practice to declare unique types for the two rather than just using Integer for both.  These particular Unchecked_Conversion functions are to change the type from that used by the GNAT C code to what I am using in the Ada application.
with Interfaces.C;
with System;
with Text_IO;
with Unchecked_Conversion;

separate( Max_Col_Sum_by_Key )

procedure Read
( Handle  : in Windows_Itf.File_Handle;
  Success : out Boolean
) is

  -- Result returned from read file
  : Windows_Itf.BOOL;

  function to_PVOID is new Unchecked_Conversion
                           ( Source => Windows_Itf.File_Handle,
                             Target => Windows_Itf.PVOID );
  function to_LPCVOID is new Unchecked_Conversion
                             ( Source => System.Address,
                               Target => Windows_Itf.LPCVOID );
  function to_LPDWORD is new Unchecked_Conversion
                             ( Source => System.Address,
                               Target => Windows_Itf.LPDWORD );

  use type Interfaces.C.unsigned_long;
  use type Windows_Itf.BOOL;

begin -- Read

  Result := Windows_Itf.ReadFile
            ( File                => to_PVOID(Handle),
              Buffer              => to_LPCVOID(Buffer'address),
              NumberOfBytesToRead => Buffer_Size, -- size of buffer
              NumberOfBytesRead   => to_LPDWORD(Buffer_Length'address),
              Overlapped          => null ); -- not overlapped IO
  if Buffer_Length <= 0 or else Result = 0
    Text_IO.Put_Line("Read Failed ");
    Success := False;
    Success := True;
  end if;

    Count : Integer := 0;
    Data  : String(1..19);
    Data_Hex : Unsigned_Byte_Array(1..19);
    for Data_Hex'Address use Data'address;
    J : Integer := 0;
    L : Integer := 0;
    type StringType is new String(1..4);
    function ByteToString is new Unchecked_Conversion( Source => Unsigned_Byte,
                                                       Target => Character );
    function IntToString is new Unchecked_Conversion( Source => Integer,
                                                      Target => StringType );
    for I in 1..Buffer_Length loop
      J := J + 1;
      Data(J) := ByteToString(Buffer(I));

      --   Text_IO.Put(bytetostring(Data(j));
      if (J = 19) or else (I = Buffer_Length) then
        Count := Count + 1;
        if Count = 49 then
          L := 49; -- line to set break on
        end if;
        for K in 1..19 loop
          Data(K) := ASCII.NUL;
        end loop;
        J := 0;
      end if;

    end loop;

end Read;
The code uses the Windows_Itf ReadFile function to read the file into the Buffer.  The “to” conversions are used to pass the needed types to the Windows_Itf function or, in the case of the NumberOfBytesRead parameter, get the value returned.  Note that an address is supplied for this.  An Ada function cannot have an “out” parameter so NumberOfBytesRead cannot be such as
  NumberOfBytesRead : out Integer;
But, since an address is being passed in, this restriction is avoided.  Of course it could also have been avoided by supplying a record type for the function return that contained both the Result BOOL and the number of bytes read as fields of the record.  But, since the GNAT library function is being referenced, this option isn’t considered.

The code in the declare block is only to output what the file records look like as characters and isn’t really needed.  That is, as a string special characters such as Horizontal Tab won’t show up – only printable ASCII characters show.  This was to get an idea of what the file looked like and why 19 ended up as the size of the array.  This ended up with output such as
0000,912_NUM   1000    1       1
displayed in the GNAT GPS debugger window.  This looks longer than 19 characters since the HT characters cause the next displayable character to be moved to the right to the next tab position.  As mentioned before, the bytes of data were
49 44 57 49 50 95 78 85 77 9 49 48 48 48 9 49 9 49 10
where the 9s are the HTs and the 10 is the NL such that 49 48 48 48 is the Key (1000 as a numeric value rather than a series of ASCII characters), 49 (1) is the first Value and 49 (that is, 1) is also the second value. 

The Parse procedure is
with System;

separate( Max_Col_Sum_by_Key )

procedure Parse is

-- Notes:
--   Unlike the C# version, Ada has the ability to declare record structures.
--   And, since the format of the file is known and has data in fixed columns,
--   the data can easily be separated into fields.

-- Each record in the Max-Col-by-Key.tsv file has the format
-- 0,912_NUM 1000   1      1
-- That is, 10 characters to be ignored including a TAB, then a Key of 5
-- characters including a trailing TAB, then the two 1 character digits
-- with a trailing TAB after the first and a NEW LINE after the second
-- except for the last record which doesn't have the trailing NEW LINE.

-- Therefore, if it was known that the non-truncated file never had values
-- in either of the last two data fields that exceeded one digit then the
-- file could be parsed by overlaying each 19 byte slice of the data buffer
-- with an object of this record type and then selecting the Key, Digit1,
-- and Digit2 fields to build a data structure to use to be able to answer
-- which Key has the greatest sum of Digit1 or Digit2 values.  (Or whatever
-- the question was that the class exercise was to answer.)

-- And, of course, if it were known in advance that the file was made up of
-- 19 byte records, each record could be separately read into a buffer of
-- the following format without the need to input the contents of the entire
-- file.  Also, of course, if the file was too large to be read all at once
-- a buffer of much smaller size could be used and the bytes could be parsed
-- until remaining bytes were insufficient to represent the next record.  Then
-- the remaining bytes could be copied to the beginning and additional bytes
-- from the file could be read from that point on to again fill the buffer
-- and the decoding continued.

  type Data_Record_Type
  is record
    Ignore : String(1..10); -- includes trailing horizontal tab
    Key    : String(1..4);  -- 4 digits of the key
    Tab1   : Character;     -- horizontal tab
    Digit1 : Character;     -- whatever this digit means
    Tab2   : Character;     -- horizontal tab
    Digit2 : Character;     -- whatever this digit means
    NL     : Character;     -- new line to end each record except last
  end record;
  for Data_Record_Type'size use 19*8; -- 19 bytes of 8 bits

  : constant Integer := 19; -- bytes

-- Since it isn't known that the file will never have records that are longer
-- than 19 bytes, the record will be parsed by locating the non-digit markers
-- to separate data fields as was done in the C# version and the above
-- record structure will not be used.

  : Data_List_Type;

  : Data_List_Type;

  : Data_List_Type;

  -- Offset into data buffer read from file
  : Integer := 1;

  -- Index into Data array
  : Integer := 0;

  type Scan_Phase_Type
  is ( Ignore,     -- beginning of record to be ignored
       Key,        -- obtain key
       Value1,     -- obtain first value of record
       Value2 );   -- obtain second value of record

  -- Keep track of portion of record being parsed
  : Scan_Phase_Type := Ignore;

  xxx : Unsigned_Byte; -- to see char in debugger

  -- Horizontal Tab
  : constant Unsigned_Byte := 16#09#;

  -- New Line
  : constant Unsigned_Byte := 16#0A#;
begin -- Parse

  loop -- until end of Buffer

    xxx := Buffer(Offset); -- to use debugger to see next value
    -- Scan for NL that ends record while extracting data fields
    case Scan_Phase is
      -- Ignore bytes until after first HT found
      when Ignore    =>
        if Buffer(Offset) = HT then
          Scan_Phase := Key;
          -- Initialize for next set of data
          Key_Bytes.Count := 0;
          Value1_Bytes.Count := 0;
          Value1_Bytes.Count := 0;
        end if;
      -- Capture the key
      when Key       =>
        if Buffer(Offset) /= HT then
          Index := Index + 1;
          Key_Bytes.List(Index) := Buffer(Offset);
          Key_Bytes.Count := Index;
          Scan_Phase := Value1; -- Value immediately follows HT
          Index := 0;
        end if;
      -- Capture the first value
      when Value1    =>
        if Buffer(Offset) /= HT then
          Index := Index + 1;
          Value1_Bytes.List(Index) := Buffer(Offset);
          Value1_Bytes.Count := Index;
          Scan_Phase := Value2; -- Value immediately follows HT
          Index := 0;           -- Capture the first value
        end if;
      -- Capture the second value
      when Value2    =>
        if Buffer(Offset) /= HT and then Buffer(Offset) /= NL
          Index := Index + 1;
          Value2_Bytes.List(Index) := Buffer(Offset);
          Value2_Bytes.Count := Index;
          Index := 0;
          -- Update tables with the data from the record.
          Update( Key_Bytes, Value1_Bytes, Value2_Bytes );

          -- Initialize for next record
          Key_Bytes.Count := 0;
          Value1_Bytes.Count := 0;
          Value2_Bytes.Count := 0;
          Scan_Phase := Ignore;

        end if;
    end case;

    Offset := Offset + 1; -- increment to next Buffer position
    if Offset > Buffer_Length then -- no more data
      if Value2_Bytes.Count > 0 then -- last record fully parsed without trailing NL
        Update(Key_Bytes, Value1_Bytes, Value2_Bytes);
      elsif Scan_Phase = Value2 and then
            Index > 0
      then -- last record stopped parsing the value w/o trailing NL
        Value2_Bytes.Count := Index;
        Update(Key_Bytes, Value1_Bytes, Value2_Bytes);
      end if;
      exit; -- loop
    end if;
  end loop;
end Parse;

This routine would be similar to a C# one parsing the same file.  Except with Ada the enumerated type can be declared and used to keep track of the current Scan_Phase (although the C# compiler should have supported it).  The xxx variable was declared for use in the debugger where one can hover over to see the current value while getting the code correct.

The Report procedure follows.  The
  package Int_IO is new Text_IO.Integer_IO( Integer );
statement instantiates an instance of the Text_IO.Integer_IO package to output integer values.

with Text_IO;

separate( Max_Col_Sum_by_Key )

procedure Report is
-- Create a Max Combined structure with a Key, a Value and a Sum.
-- For each Key in the Key Table
--   For each Value in the first paired array
--     Search the second paired array for the Value
--       If found, add its Sum to that of the first paired array and
--        compare the total to the current Sum in the Max Combined.
--         If greater, replace the Key, Value and Sum in the Max Combined
--          with the new Key, Value and Sum
--         Otherwise, compare the second paired array Sum to that of Max
--          Combined and, if greater do the replacement.
-- Report the result.

  -- True if Value found in second field's data for Key
  : Boolean;

  -- Current key from table
  : Integer;

  -- Current number of instances of Value for the Key
  : Integer;

  -- Current Value for the Key
  : Integer;

  type Max_Combined_Type
  is record
    Key   : Integer;
    -- Key with most different values associated with it
    Value : Integer;
    -- First or second value
    Sum   : Integer;
    -- Number of instances of Value associated with Key in combined first and
    -- second fields of records identified with the Key
  end record;

  -- Key with most instances of a particular Value in the combination of the
  -- first and second fields
  : Max_Combined_Type
  := ( Key   => 0,
       Value => 0,
       Sum   => 0 );

  package Int_IO is new Text_IO.Integer_IO( Integer );

begin -- Report

  for I in 1..Key_Table.Count loop
    Key := Key_Table.Key_Data(I).Key;
    for J in 1..Key_Table.Key_Data(I).Values(1).Count loop
      Value := Key_Table.Key_Data(I).Values(1).List(J).Value;
      Sum   := Key_Table.Key_Data(I).Values(1).List(J).Sum;
      Found := False;
      for K in 1..Key_Table.Key_Data(I).Values(2).Count loop
        if Value = Key_Table.Key_Data(I).Values(2).List(K).Value then
          Sum   := Sum + Key_Table.Key_Data(I).Values(2).List(K).Sum;
          Found := True;
          exit; -- inner loop
        end if;
      end loop;
      if Found then
        if Sum >  Max_Combined.Sum then -- save new maximum sum for a Value
          Max_Combined := ( Key   => Key,
                            Value => Value,
                            Sum   => Sum );
        end if;
      end if;
    end loop;

    -- Value of first field for key may not be in second field but second
    -- field may have a Value with references that exceeds that of the first
    -- or of the combination of the first and second.
    -- Check if any of its unique value's references exceed the Max Combined.
    -- Note: It does no harm to check Values that were combined with those of
    --       the first field since they cannot exceed an already selected Value.
    for K in 1..Key_Table.Key_Data(I).Values(2).Count loop
      Sum := Key_Table.Key_Data(I).Values(2).List(K).Sum;
      if Sum >  Max_Combined.Sum then -- save new maximum sum for a Value
        Max_Combined := ( Key   => Key,
                          Value => Key_Table.Key_Data(I).Values(1).List(K).Value,
                          Sum   => Sum );
        end if;
    end loop;
  end loop;

  -- Report the result.
  Text_IO.Put( "Key " );
  Int_IO.Put( Max_Combined.Key, Width => 0 ); -- Width of 0 for no leading spaces
  Text_IO.Put( " with the maximum number of instances " );
  Int_IO.Put( Max_Combined.Sum, Width => 0 );
  Text_IO.Put( " of Value " );
  Int_IO.Put( Max_Combined.Value, Width => 0 );
  Text_IO.Put_Line( " " );

end Report;

The Width => 0 parameter supplied with the invocation of Int_IO Put causes leading blanks/spaces to be discarded.

The result of executing the program is
Key 3000 with the maximum number of instances 20 of Value 1

That is, key 3000 has all 10 first value fields and all 10 second value fields with a value of 1.

Discussion of the second C# Application

This application is a redo of the first to enable it to read the supplied max-col-sum-by-key.tsv file.  [Note:  Sometime while doing the code I looked at the tsv file with the UltraEdit editor and allowed it to convert the file to DOS.  Therefore, that changed the end of record character from NL (new line) to the CR LF (carriage return; line feed) pair of characters (where LF is the same character as NL portrayed by a different name).  The Parse routine has been written to treat the end of record either way.]

Knowing that C# is not going to allow record structures to be implemented I replaced those of the Ada application with single, double and triple indexed arrays and, as I ended up in the first implementation, all starting with the same prefix to indicate that each is a part of the same "database".  This eliminates the other classes of the first version.

The Form1.cs [Design] panel was created from the C# Toolbox as shown below.  A click on the File icon shows a drop down with an Open option.  Selecting it results in the openToolStripMenuItem_Click event handler being entered via the Visual C# supplied interface to Windows.  As supplied below, this routine is the same as in the first version and the user will navigate to the correct folder and select the tsv file to be opened via the C# supplied methods.

--> insert picture

The beginning of Form1.cs is as follows (where, of the supplied using statements, only System, System.IO, and System.Windows.Forms are really needed).
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace MaxColSumbyKey
    public partial class Form1 : Form
        public Form1()

        // Objects to contain the data of the Ada Key_Table structure since C#
        // cannot handle an array of instances of a class. 
        // Each object begins with the same prefix of "table" to associate the
        // objects with each other.
        int tableKeyCount; // Number of different keys in tableKeys array
        int[] tableKeys = new int[100]; // Unique keys found in the input file
        int[,] tableValuesCount = new int[100,2]; // Number of unique values for each
                                                  //  value field for a key
        int[,,] tableValue = new int[100,2,30]; // Unique values associated with a key
                                                //  (first index) of a value field
                                                //  (second index) with each unique
                                                //  value up to tableValuesCount using
                                                //  the third index
        int[,,] tableSum = new int[100,2,30];   // Sum of instances of a value associated
                                                //  with the same set of indexes as tableValue

The Form1() constructor is provided by the C# compiler.  The static variables beginning with "table" are provided to keep track of the parsed data from the file.  tableKeyCount is to keep track of the number of different keys in the tableKeys array.  The double indexed tableValuesCount array is to keep track of the number of unique values of each of the two value fields of each key that are in the tableValue array while the tableSum array is to keep track of the number of references to a particular value for a particular key.  Note that this could have been done via a fourth array index ranging from 0 to 1 by replacing tableValue and tableSum with
        int[,,,] tableValue = new int[100,2,30,2];  
where the first index is that of the particular key, the second that of the value field of the file record, the third that of the particular unique value, and the fourth whether the bucket contains the parsed value or the accumulated sum of the instances of the value.

The end of the namespace with the event handler is
        private void openToolStripMenuItem_Click(object sender, EventArgs e)
        { // to open the file
            Stream myStream = null;

            // Get an instance of a FileDialog.
            OpenFileDialog openFileDialog = new OpenFileDialog();

            // Use a filter to allow only certain file extensions.
            openFileDialog.InitialDirectory = "c:\\";
            openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*";
            openFileDialog.FilterIndex = 2;
            openFileDialog.RestoreDirectory = true;

            if (openFileDialog.ShowDialog() == DialogResult.OK)
                    if ((myStream = openFileDialog.OpenFile()) != null)
                        using (myStream)
                            // Initialize
                            tableKeyCount = 0;
                            for (int i = 0; i < 30; i++)
                                tableValuesCount[i, 0] = 0;
                                tableValuesCount[i, 1] = 0;

                            // Read the file and build tables from the data.
                catch (Exception ex)
                    MessageBox.Show("Error: Could not read file from disk. Original error: " + ex.Message);
        } // openToolStripMenuItem_Click

    } // class Form1

} // namespace MaxColSumbyKey

This method opens the file as before using the File|Open widget of the form panel.  It also initializes the various array counts to 0 to prepare for parsing the file data.

ReadAndParse is as before and is
        private void ReadAndParse(Stream file)
            System.Byte[] buffer = new byte[file.Length]; // buffer to contain file

            int numBytesRead = 0;
                // Read everything in file.
                numBytesRead = file.Read(buffer, 0, (int)file.Length);
            catch (Exception ex)
                MessageBox.Show("Error reading file");

            Parse(numBytesRead, buffer); // Parse the bytes read

        } // ReadAndParse
where the buffer is sized to hold everything in the opened file.  The size and the buffer are then passed to Parse.

Parse has been modified to treat the format of the tsv file.
        private void Parse(int count, Byte[] data)
            // Build a table of keys (second field of a record) and, for each key,
            // the number of value items (third and fourth fields) of the key and 
            // the number of instances of the particular value.

            // The first field ends with a TAB.  The second field (Key) ends the
            // same way as does both the Value fields or at the end of record
            // following the second Value field (in this case NL but for a different
            // file - a DOS file - in the CR LF pair).
            // All other data is ignored until end-of-file (EOF) or end of record.
            const Byte Space = 32;
            const Byte HT = 9; // horizontal tab
            const Byte CR = 13;
            const Byte LF = 10;
            const Byte NL = 10; // new line
            //enum Fields { Bypass, Key, Value1, Value2, RecEnd }; // enum didn't work with this compiler
            //int scanPhase = (int)Fields.Bypass;
            const int Bypass = 0;
            const int Key = 1;
            const int Value1 = 2;
            const int Value2 = 3;
            const int RecEnd = 4;
            int scanPhase = Bypass;

            int keyCount = 0; // number of actual bytes in keyField
            Byte[] keyField = new Byte[16]; // max of 16 bytes for a key
            int value1Count = 0;
            Byte[] value1Field = new Byte[10];
            int value2Count = 0;
            Byte[] value2Field = new Byte[10];

            int index = 0; // index into keyField, etc for next Byte

            int keyValue = 0; // keyField as converted
            int[] valueValues = new int[2];
            valueValues[0] = 0;
            valueValues[1] = 0;

            for (int i = 0; i < count; i++) // examine each byte of data
                Byte debugxx = data[i]; // examine byte with debugger

                // Parse key
                switch (scanPhase)
                    case Bypass: // ignore text until after first HT found
                        if (data[i] == HT)
                            scanPhase = Key;
                            // Initialize for next set of data
                            keyCount = 0;
                            value1Count = 0;
                            value2Count = 0;
                    case Key: // capture the Key
                        if (data[i] != HT)
                            keyField[index] = data[i];
                        {   keyCount = index;
                            keyValue = ConvertToNumeric(keyField, keyCount);
                            scanPhase = Value1; // Value immediately follows HT
                            index = 0;
                    case Value1: // capture first value
                        if (data[i] != HT)
                            value1Field[index] = data[i];
                            value1Count = index;
                            valueValues[0] = ConvertToNumeric(value1Field, value1Count);
                            scanPhase = Value2; // Value immediately follows HT
                            index = 0; // Capture the first value
                    case Value2: // capture 2nd value
                        if ((data[i] != HT) && (data[i] != NL) && (data[i] != CR) && (data[i] != LF))
                            value2Field[index] = data[i];
                            value2Count = index;
                            valueValues[1] = ConvertToNumeric(value2Field, value2Count);
                            index = 0;

                            // Update tables with the data from the record.
                            Update(keyValue, valueValues);

                            // Initialize for next record
                            keyCount = 0;
                            value1Count = 0;
                            value2Count = 0;
                            scanPhase = Bypass;
                if (i >= count)
            } // end for
        } // Parse
The above code allows the second value field to be terminated by either a horizontal tab, a new line, a carriage return or a line feed.  Since all the data has been read into the buffer, if there is a double terminating character the code to bypass the initial characters of the next record will just bypass the second terminating character first.

Including the above I notice that nothing has been included for the Ada code case where the last record didn't have a terminating character so that special code had to be added to check at the end of the buffer if the final second value had yet to be completed and the Update done for the last record.  Since the file had been converted to the DOS format a trailing CR LF was likely added.

ConvertToNumeric is as before.  That is,
        private int ConvertToNumeric(byte[] keyField, int count)
            // Convert the byte string to an integer.
            // Check the StructClasses to find the Key or add a new one.
            // Remember the index into the KeyValueRec to use for the field Value.
            const int Zero = 48;
            const int Nine = 57;
            int start;
            start = 0;
            int finish;
            finish = count - 1;
            if (finish < 0) // debug
            { MessageBox.Show("ConvertToNumeric error 1"); }
            int keyInt = 0;
            if ((keyField[finish] >= Zero) & (keyField[finish] <= Nine))
                keyInt = keyField[finish] - Zero; // convert ASCII to digit
            int m = 10;
            while (finish >= start)
                if (finish < 0) // debug
                { MessageBox.Show("ConvertToNumeric error 2"); }
                if ((keyField[finish] >= Zero) & (keyField[finish] <= Nine))
                    keyInt = keyInt + (m * (keyField[finish] - Zero));
                m = m * 10;
            return keyInt;
        } // ConvertToNumeric

The new Update method is
        // Update tables with 'key' and 'value'
        public void Update(int key, int[] value)
            int value1 = value[0]; // to look at
            int value2 = value[1]; //  with debugger

            // Search keyTable
            bool value1Found = false;
            bool value2Found = false;
            int keyIndex = -1;
            for (int i = 0; i < tableKeyCount; i++)
                if (key == tableKeys[i])
                    keyIndex = i;
                    break; // exit loop

            // Add new key to the table
            if (keyIndex < 0) // key not found
                { // add new key with its value to the tables
                    tableKeys[tableKeyCount] = key;
                    tableValue[tableKeyCount, 0, 0] = value[0];
                    tableValue[tableKeyCount, 1, 0] = value[1];
                    tableSum[tableKeyCount, 0, 0] = 1; // first instance of value for key
                    tableSum[tableKeyCount, 1, 0] = 1; // first array entry
                    tableValuesCount[tableKeyCount, 0] = 1; // one pair of values
                    tableValuesCount[tableKeyCount, 1] = 1; //  for key
                catch (Exception ex)
                    MessageBox.Show("Error: Could not add to keyTable " + ex.Message);
                // Find whether first Value is already in the table.
                value1Found = false;
                int valueIndex = -1;
                for (int k = 0; k < tableValuesCount[keyIndex, 0]; k++)
                    if (value[0] == tableValue[keyIndex, 0, k])
                        value1Found = true;
                        valueIndex = k;
                        break; // exit loop

                // Add new first value to the table for the key.
                if (valueIndex < 0)
                    if (tableValuesCount[keyIndex, 0] < 30)
                    {   valueIndex = tableValuesCount[keyIndex, 0];
                        tableValue[keyIndex, 0, valueIndex] = value[0];
                        tableSum[keyIndex, 0, valueIndex] = 1;
                        tableValuesCount[keyIndex, 0]++;
                        MessageBox.Show("More different first values than app can handle");
                else // first value already in table
                {    // add to number of instances
                    tableSum[keyIndex, 0, valueIndex]++;

                // Find whether second value already in table
                value2Found = false;
                valueIndex = -1;
                for (int k = 0; k < tableValuesCount[keyIndex, 1]; k++)
                    if (value[1] == tableValue[keyIndex, 1, k])
                        value2Found = true;
                        valueIndex = k;
                        break; // exit loop

                // Add new second value to the table for the key.
                if (valueIndex < 0)
                    if (tableValuesCount[keyIndex, 1] < 30)
                        valueIndex = tableValuesCount[keyIndex, 1];
                        tableValue[keyIndex, 1, valueIndex] = value[1];
                        tableSum[keyIndex, 1, valueIndex] = 1;
                        tableValuesCount[keyIndex, 1]++;
                        MessageBox.Show("More different first values than app can handle");
                else // second value already in table
                {    // add to number of instances
                    tableSum[keyIndex, 1, valueIndex]++;

        } // end Update

Finally, ReportResults is
        private void ReportResults()
            int key, value, sum;
            bool found;

            int[] most = new int[3]; // most[0] is key, most[1] is value, most[2] is sum

            // Search the keys
            for (int keyIndex = 0; keyIndex < tableKeyCount; keyIndex++)
                key = tableKeys[keyIndex];
                int values1Count = tableValuesCount[keyIndex, 0];
                int values2Count = tableValuesCount[keyIndex, 1];
                // search for greatest number (sum) of values
                for (int valueIndex = 0; valueIndex < values1Count; valueIndex++)
                    value = tableValue[keyIndex, 0, valueIndex]; // value of first field
                    sum = tableSum[keyIndex, 0, valueIndex];
                    found = false;
                    for (int sumIndex = 0; sumIndex < values2Count; sumIndex++)
                        if (value == tableValue[keyIndex, 1, sumIndex])
                            sum = sum + tableSum[keyIndex, 1, sumIndex];
                            found = true;
                            break; // inner loop
                    if (found)
                        if (sum > most[2]) // save new maximum sum for a Value
                            most[0] = key;
                            most[1] = value;
                            most[2] = sum;
                } // end for loop

                // Value of first field for key may not be in second field but second
                // field may have a Value with references that exceeds that of the first
                // or of the combination of the first and second.
                // Check if any of its unique value's references exceed the Max Combined.
                // Note: It does no harm to check Values that were combined with those of
                //       the first field since they cannot exceed an already selected Value.
                for (int sumIndex = 0; sumIndex < values2Count; sumIndex++)
                    sum = tableSum[keyIndex, 1, sumIndex];
                    if (sum > most[2])
                        most[0] = key;
                        most[1] = tableValue[keyIndex, 0, sumIndex]; // value and sum arrays
                        most[2] = sum;                               //  same range
            } // end outer for loop

            // Output the key, value, and sum to the text boxes.
            keyTextBox.Text = most[0].ToString();
            valueTextBox.Text = most[1].ToString();
            sumTextBox.Text = most[2].ToString();
            this.Refresh(); // cause the panel to refresh

        } // end ReportResults

The results are shown as below.

Discussion of the Python Application

What I learned about Python was from what appeared as the result of online searches.  First, of course, was to download Python.  Somehow, with the first attempt I downloaded the Linux version and so I had to retry and be sure I selected the Windows version.  (I'm writing this on the eighth calendar day since I started and the seventh since I started the application.  Since it was working yesterday, it took six part-time days to learn enough about Python to produce the application.)  I didn't find out about a debugger so I used print statements to determine what was going on as I added code.  Having just done an internet search, there is a debugger that I could have used.

I didn't try to determine if there was a way to execute the code in a Windows panel so used what once upon a time was called a DOS window – now Command Prompt.  (Type Command Prompt into the box when Start is selected and then click on the program that appears.)  With the Command Prompt window opened use the DOS cd (change directory) command to switch to the Windows folder where the Python code is to reside where, as the code is written (such as in MaxColSumbyKey.py where the extension py is standard for Python code) can be run by just entering the name in the DOS/Command Prompt window followed by Enter.  Or entering >textname after the Python file name to redirect the output to the named file.

Python uses a colon (:) at the end of a name to end a declaration of a function/method, an if statement, any else, and the like.  Rather than have {} brackets as in C# or "end if" as in Ada, the code has been indented by 4 columns (as is standard practice) and the end of the function definition, if block, etc is indicated by where the indention returns to the previous column.  The indentation has to remain constant to the end of the block of code.  This is made easy enough via the UltraEdit editor which will start a new line at the indention of the previous line until the user changes it.

I took the development of the application a step at a time.  This involved first determining how to open the .tsv file.  This involved doing internet searches such as "Python file open".  Using the help provided I was able to do the code (where # is used before a comment)
# Open file, Read it, and then extract each Key and pair of Values
print("This line will be printed.")
import os
count = os.path.getsize('C:/Source/LearnToCodeGR/max-col-sum-by-key.tsv')
# Note: This doesn't provide the correct answer.  It results in 998 whereas the
#       len(read_data) below is 949 so only read_data[0] thru [948] are valid.
print (count)

with open('C:/Source/LearnToCodeGR/max-col-sum-by-key.tsv') as f:
    read_data = f.read()

count = len(read_data) # get number of bytes read from the file

# Separate Key and pair of Values from extraneous contents of the file and
# build list of namedtuples with number of instances of each particular value
print("call parse")

# Report the results
keysx = KeyList();


As indicated by the Note, the getsize function didn't return the number of bytes in the file.  This I didn't find out until I had written most of the code since I initially parsed only a limited number of bytes (38 to get the first two file records) while working on what to do to keep track of the keys and their associated values and the number of references to a particular value for a particular key.  After I had done so and then increased the number of bytes read to read the data for the first two keys to further learn how to do the code, I opened the parse to do the number of bytes indicated by the getsize result.  This caused the application to have a failure for trying to read beyond the extent of the read_data buffer.  That's when I discovered the len function to return the size of the read_data buffer.

Note that the code to call parse and report were done later as I got to them although the initial code for parse was started immediately after being able to open and read the file.

Since I already knew what the file looked like from when I wrote the Ada application, the
# print(read_data)
line (now commented out via the leading # character) wasn't necessary but was done to check that the open and read via
with open('C:/Source/LearnToCodeGR/max-col-sum-by-key.tsv') as f:
    read_data = f.read()
had occurred as I expected.  Also note the indention by 4 columns following the : ending the previous line.

Also note that I sometimes end lines with a semi-colon (;) and sometimes not.  With Python it’s a "don't care".  It must use the new line as the terminator if a comment indication isn't found first.

I'll get into KeyList later.  It is a class where
keysx = KeyList();
instantiates an instance of the class and then
invokes the report function declared in the class.  This code wasn't added until the very end when the parse of the data in the read_data buffer had been worked out.

The Parse function ended up as
# Parse array of data as previously read from file to
# obtain Key and two Values from each record of file
def parse(data,count):

    Key_Bytes = []
    Value1_Bytes = []
    Value2_Bytes = []
    offset = 0;
    scan_phase = 0 #bypass

    # Parse all bytes previously read from the file
    while offset < count:
        if scan_phase == 0: # bypass
            if ord(data[offset]) == 9:
                scan_phase = 1; # key
                print("new phase of key")
                # Initialize for next set of data
        elif scan_phase == 1: # key
            if ord(data[offset]) != 9:
                scan_phase = 2; # value1
                print("new phase of value1")
            # end if;
        elif scan_phase == 2: # value1
            if ord(data[offset]) != 9:
                scan_phase = 3; # value2
                print("new phase of value2")
            # end if;
        elif scan_phase == 3: # value2
            if ord(data[offset]) == 10 or ord(data[offset]) == 13:
                # Update tables with the data from the record.
                Update( Key_Bytes, Value1_Bytes, Value2_Bytes );

                # Initialize for next record
                scan_phase = 0; # bypass
                print("new phase of bypass")
            # end if;
        # end if;

        # Complete processing of final record when not terminated by New Line
        offset = offset + 1 # increment to next data buffer position
        if offset >= count: # no more data
          if len(Value2_Bytes) > 0: # last record fully parsed without trailing NL
            Update( Key_Bytes, Value1_Bytes, Value2_Bytes );
          # end if
          break # exit loop
        # end if

    return # from Update

# end parse

Notice, I have added "# end parse" to indicate the end of the function, "# end if" to indicate the end of an if statement sequence, and the like to better document the code so the reader doesn't need to completely follow the indentation changes.  Also, note close to the end the call to Update and the preceding if statement are only indented by 2 columns rather than the usual 4 illustrating that an indentation by 4 columns isn't necessary just as long as a fixed indentation is used following each : terminator.  If whatever the indentation is used isn't maintained the execution of the program (which, as an interpreter is also the "compile") will fail.  I suspect that 4 columns is said to be the usual amount since, without the terminators (such as } in C), it makes recognizing the end of a block of code easier than a smaller indentation.

As with C, C#, etc case is important in names.  Update and update would be two different constructs.  Whereas case doesn't matter in Ada and Update and update would refer to the same thing.

The first problem to overcome with the parse was how to keep track of the data being parsed.  This took me a while to determine.  I soon found that there was a "list" structure that was implemented by Python but I couldn't decide how to use it.  Then I found out that there was a concept known as namedtuple where the tuple fields could be named.  (Rather than regular tuples which were separated by commas like arrays.)  This seemed like it might be similar to an Ada record structure or a C (not C#) struct.

So after messing around I came up with

import collections

# Namedtuple declaration
Dat = collections.namedtuple('Dat', 'pkey pvalue psum')

that I put at the beginning of the application outside of the class and the functions.  This declares the namedtuple (which isn't in earlier versions of Python) from the Python supplied collections giving it the name Dat and declaring that it has the tuple names pkey, pvalue, and psum where I preceded the key, value, and sum names with 'p' to be sure the tuple names weren't mixed up with the variables.

Notice in parse that
indicates an index into the data array similar to C and C#. 

Since Python doesn't have a const construct, I didn't bother to do names such as I did in Ada for scan phases or to declare
  -- Horizontal Tab
  : constant Unsigned_Byte := 16#09#;

  -- New Line
  : constant Unsigned_Byte := 16#0A#;
and just used the numeric values.  That is, 0, 1, 2, and 3 for when scanning thru the portion of the record to be ignored, the portion that is the key, and the two values and just using 9, 10, and 13 as the numeric value of the horizontal tab, new line, and carriage return characters.  The ord function is the typecast of the ASCII character to its numeric value and is used when parse is checking where a particular parse/scan phase ends.

    Key_Bytes = []
    Value1_Bytes = []
    Value2_Bytes = []
are lists.  Therefore append adds another character to the list.  When the second value has been terminated, the Update function is called passing it the key and the two values.  Since the final second value of the file doesn't have a terminating new line character, parse has the special code at the end to detect this and invoke Update for the final key and values.

The list manipulation in Python must mean that it must grab memory from the heap whenever a new item is appended.  I don't know whether it has garbage collection of removed items.  In any case, in addition to being an interpreter, this use of the heap makes in unsuitable for critical applications such as onboard an aircraft where it has to be known in advance that no matter what paths are taken while the program executes, that the application won't run out of available memory.

In any case, the parse code is quite similar to that of the Ada and C# applications except that it uses the Python provided list construct.

The Update function is
def Update(Key, Value1, Value2):

    N_Key    = 0;
    N_Value1 = 0;
    N_Value2 = 0;
    # Convert the Key and Values bytes to numeric.
    N_Key    = Bytes_to_Integer( Key );
    N_Value1 = Bytes_to_Integer( Value1 );
    N_Value2 = Bytes_to_Integer( Value2 );
    print(N_Key, N_Value1, N_Value2)

    y = KeyList(); # (N_Key);
    y.Update(N_Key, N_Value1, N_Value2)


# end of Update

As I should have noted before, "def" defines a function.  It can also be noted that unlike C# and Ada, the type of the function parameters is not provided.  Python determines the type for itself from the type used in the call.  So it just uses list for the three parameters.

This Update is small since I've put most of the code into the KeyList class.  It just calls the Bytes_to_Integer function to convert each of the three lists to an integer.  It then instantiates an instance of the KeyList class and invokes its Update passing the three integers.

The Bytes_to_Integer function is similar to that of the other two applications except that I remove any non-digits from the list that was passed (although there shouldn't be any).  In this function I did name the characters that are the limits of the ASCII digits.
# Convert Data to integer
def Bytes_to_Integer(Data):

    Digit = 0  # Numeric digit

    Number = 0 # Numeric result

    Start = 1  # String position of first numeric

    Nine = 57  # ASCII character for digit 9

    Zero = 48  # ASCII character for digit 0

    # Ignore all non-digits.
    for item in Data:
        if ord(item) < Zero or ord(item) > Nine:
        # end if

    for item in Data:
        Digit = ord(item) - Zero;

        Number = (Number * 10) + Digit;
    # end for loop

    return Number;

# end of Bytes_to_Integer

The KeyList class contains the Ndata list where this list is a Python list of namedtuples.  The code first determines whether the passed key is already in the list of namedtuples.  And then proceeds depending upon whether it is and whether the values are already in the list for the key.  For a key and for the values it either adds (that is, appends) them to the list if new or increments the sum of the instance of the value if already in the list using the index to where the match occurred as in
                        self.Ndata[i] = d
to update the list.  In this application, unlike the previous two, value1 and value2 are combined into the sum of instances rather than maintaining two separate fields in the namedtuple.  (This would have required a namedtuple of five fields.  The key, the first value with its sum of instances and the second value with its sum.) 

# Class to Update list of namedtuples and Report results
class KeyList():

    Ndata = [] # A list of namedtuples of collection Dat

    def Update(self, key, value1, value2):
        print("KeyList values",len(self.Ndata),key,value1,value2)

        d = Dat(pkey=key, pvalue=value1, psum=1) # assume first instance

        updateDone = False

        # Find if the key is already in the list
        dCount = 0;
        for i in range(len(self.Ndata)):
            obj = self.Ndata[i]
            if obj.pkey == key:
                dCount += 1
            # end if
        # end for loop

        # Add the new key with its values
        if dCount == 0:
            # Check if value2 is the same as value1 included in object d
            if value1 == value2:
                d = Dat(pkey=key, pvalue=value1, psum=2)
                print("new key",d)
                self.Ndata.append(d) # d contains two instances of value
                updateDone = True
            # Add two instances of new key for different value2
                print("new key",d)
                self.Ndata.append(d) # d contains first instance of value
                d.value = value2
                print("new key with new value2",d)
                updateDone = True
            # end if
        # Add new instance of existing key with its values
        # Are either of the values already in the list for the key?
            vCount = 0
            matches = 0
            match1 = False
            match2 = False
            for i in range(len(self.Ndata)):
                obj = self.Ndata[i]
                sum = obj.psum
                if not match1 and obj.pkey == key and obj.pvalue == value1:
                    matches += 1
                    match1 = True
                    sum += 1
                    d = Dat(pkey=key, pvalue=value1, psum=sum)
                    print("existing key with existing first value",matches,d)
                # end if
                if not match2 and obj.pkey == key and obj.pvalue == value2:
                    matches += 1
                    match2 = True
                    sum += 1
                    d = Dat(pkey=key, pvalue=value2, psum=sum)
                    print("existing key with existing second value",matches,d)
                # end if
                if matches > 0: # key and at least one value already in list
                    print("existing key with existing value",matches,d)
                    self.Ndata[i] = d # replace entry with new sum
                    if matches == 2:
                        print("existing key with existing value",d)
                        self.Ndata[i] = d
                        updateDone = True
                        break # exit loop since update done
                        obj = self.Ndata[i]
                        if match1:
                            # Add new instance of existing key with new 2nd value
                            sum = 1
                            d = Dat(pkey=key, pvalue=value2, psum=sum)
                            print("existing key with new 2nd value",d)
                            updateDone = True
                            break # exit loop since update done
                        else: # match2 is True
                            # Add new instance of existing key with new 1st value
                            sum = 1
                            d = Dat(pkey=key, pvalue=value1, psum=sum)
                            print("existing key with new 1st value",d)
                            updateDone = True
                            break # exit loop since update done
                        # end if
                    # end if

                else: # key but neither value in the list
                    print("no match for loop index",i)
                # end if
            # end of for loop 

            # Add new tuple to list when neither value already in the list
            if not updateDone:
                # Check if value2 is the same as value1 included in object d
                if value1 == value2:
                    d = Dat(pkey=key, pvalue=value1, psum=2)
                    print("new paired value",d)
                    self.Ndata.append(d) # d contains two instances of value
                # Add two instances of new key for different value2
                    d = Dat(pkey=key, pvalue=value1, psum=1)
                    print("new first value",d)
                    self.Ndata.append(d) # d contains first instance of value
                    d.value = value2
                    print("new second value",d)
                # end if
            # end if

        # end if


    # end of Update

    # Report Key and Value with maximum number of references
    def report(self):

        Key = 0   # Key with most instances of a given value
        Value = 0 # Value of Key with most instances
        Sum = 0   # Number of instances of Value for the Key

        for i in range(len(self.Ndata)):
            obj = self.Ndata[i]
            print( i, obj.pkey, obj.pvalue, obj.psum );
            if Key == 0: # initialize
                Key = obj.pkey
                Value = obj.pvalue
                Sum = obj.psum
            else: # check if namedtuple has a value with a greater sum
                if obj.psum > Sum: # save key and value with greater sum
                    Key = obj.pkey
                    Value = obj.pvalue
                    Sum = obj.psum
                # end if
            # end if
        # end loop                  

        print( "Key and Value with greatest Sum", Key, Value, Sum )


    # end report

# end of class    

For some reason, Python doesn't know which instance of the class is being referred to unless the "self" parameter is used to indicate "this" one.  Therefore, self is specified as the first parameter in the function parameter list and is used in such references as
       for i in range(len(self.Ndata)):
            obj = self.Ndata[i]
where the Ndata list declared immediately the declaration of the class is being referenced.  It took me a while to get these matters sorted out.  Such as the need to supply the self prefix rather than just reference Ndata which is declared within the class.

After finally getting the code to build the list correctly I could have removed all the extra print statements that showed me which paths were being taken.  I have left them in as an illustration of what can be done when there isn't a debugger to use while verifying that code correctly solves a problem.

The Report function of the KeyList class just has to look for the largest sum so simpler than in the other two applications.  The print statement towards the top of the for loop in the function displays what was built in the class's Update function.  It results in
0 1000 1 13
1 1000 2 7
2 2000 1 16
3 2000 2 4
4 3000 1 20
5 4000 1 15
6 4000 3 2
7 4000 2 2
8 4000 4 1
9 5000 2 8
10 5000 1 7
11 5000 3 4
12 5000 4 1

The final print statement displays
Key and Value with greatest Sum 3000 1 20

It has occurred to me that, what with the way the file is arranged, the largest sum could have been determined in the Update function.  However, this wouldn't work if the file wasn't completely ordered by key.  That is, while parsing the data of the file, a particular key could appear to have the greatest value sum when a change of key occurred.  But another key might actually have a greater number of instances of a value if, after a change from one key to another and its value and sum were recorded, the key again appeared in the file later on.  Then, since all the keys and values hadn't been maintained, new counts would be done for it.  There would be no way of knowing if the new counts added to the previous counts would have been greater than for the intervening key and value that appeared to have the greatest number of instances since the previous key, value and sum hadn't been retained.  The way the application has been implemented (as well as the previous two applications) prevents that by maintaining all the possible combinations and then finding the one with the greatest sum when the file has been completely parsed.

No comments: