In regards to Learn to Code Grand Rapids “Building a Real World App in VS 2017, Part I”
I decided, following the meeting, to create my own code in Visual Studio C# and
then, after doing so, to create the application once again in Ada to illustrate
the same application in two different languages. This was followed by another version of the application in Python
as a way to learn something about that language.
Since I didn’t completely follow what the goal was from the
class I had to decide on a modification of the goal to solve. (I had the disadvantage of not being able to
see the display and my notebook at the same time since I wear different glasses
for distance and close-up. Therefore, I
was continually attempting to switch back and forth and missing the switches on
the display as well as losing my attention.)
A large share of the reason for writing my own applications
was just to have something to do. A
second reason was to show actual code that involved more than supplying a
wrapper to execute a solution that had already been created and wasn’t visible
to the class to view what might have been required to code the solution.
Below is a discussion of what I did. After doing an initial C# application
without knowing what the disk file of the class contained I then got a copy of
a truncated copy of that file from Jeffrey Fuller and did an Ada application
using that file. I then redid the C#
application to use the supplied file.
So you may want to just skip the description of the initial C#
application – although it does illustrate a problem with C# (or at least the
version we were able to use).
After that, thinking about what else to do, I decided to see
what the Python language was all about.
So I rewrote the application in Python as a way of learning something
about it.
The sections below discuss each of these attempts and
provide the code.
Discussion of the Visual Studio C# Application
When I did the VS C# project I wanted to produce what in Ada
would be a somewhat record structure to contain the data parsed from the
supplied file. The structure to contain
records identifying the Key from the records of the file and for each Key
arrays of other record structures that would identify the Values from the file
associated with the Key and the number of instances each particular Value
occurred for the Key.
Since I didn’t have access to the file to begin with I
created a small text file of my own from my assumptions of what it was supposed
to look like.
I then created a C# Windows application where a C# class was
used to allow the user to select the file via a File icon with a drop down to
Open the file. This C# provided class
allows the user to search through the folders for that which contains the file
and then select the file to be opened.
When doing a Windows application, Visual C# allows widgets to be
selected from a C# panel and dragged to a form panel of the application (in
this case, Form1 [Design]) to build what the form is to look like when the
application is run. Then when the
widget – in this case the Open icon – is double clicked C# inserts the outline
of an event handler that will be executed when the Open is clicked when the
application is run. Within this
outline, the coder/programmer, in this case me, adds the code to be
executed. C# will name these and then
they can be renamed as desired which I have done with event handler below which
shows the code that I added.
private void openToolStripMenuItem_Click(object
sender, EventArgs e)
{ // to open the file
Stream
myStream = null;
// Get an
instance of a FileDialog.
OpenFileDialog openFileDialog = new OpenFileDialog();
// Use a
filter to allow only certain file extensions.
openFileDialog.InitialDirectory = "c:\\";
openFileDialog.Filter = "txt files (*.txt)|*.txt|All files
(*.*)|*.*";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
if
(openFileDialog.ShowDialog() == DialogResult.OK)
{
try
{
if
((myStream = openFileDialog.OpenFile()) != null)
{
using
(myStream)
{
//
Read the file and build tables from the data.
ReadAndParse(myStream);
ReportResults();
return;
}
}
}
catch
(Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original
error: " + ex.Message);
}
}
} // openToolStripMenuItem_Click
where everything between the opening and closing {}
brackets was added by me.
Stream is a C# class.
OpenFileDialog is also provided by C# while setting the starting point
(C:\\), the Filter, when to invoke the C# provided ShowDialog, checking that
the file was opened, etc is provided by the coder. In addition, ReadAndParse and ReportResults are methods provided
by me for this particular application.
private void ReadAndParse(Stream file)
{
System.Byte[] buffer = new byte[file.Length]; // buffer to contain file
int
numBytesRead = 0;
try
{
// Read
everything in file.
numBytesRead = file.Read(buffer, 0, (int)file.Length);
file.Close();
}
catch
(Exception ex)
{
MessageBox.Show("Error reading file");
}
Parse(numBytesRead, buffer); // Parse the bytes read
return;
} // ReadAndParse
where this method expects that there is sufficient stack
space to be able to contain the contents entire file. Of course, if the expected file is too large this code would need
to be different with portions read at a time and the Parse method would need to
be invoked to examine those portions that were read with special code to place
code remaining at the end at the beginning and the next portion of the file to
be read into the buffer from that location onward. (Or else use a pair of buffers where the next bytes of the file
can be read into the second buffer when running out of the first. Then code that will read up to the end of
the first and switch the buffer being examined to continue.)
The Parse method assumed that the Key began at the beginning
of the file record and then had a separator followed by a single Value.
private void Parse(int count, Byte[] data)
{
// Build a
table of keys (first field of a record) and, for each key,
// the
number of items of the key. Also, for
each key, a table that
// contains
each of the unique values (second field of a record) with
// the number
of times the value is contained in the data.
//
// The
first field ends with a space or a TAB.
The second field ends
// the same
way or at the end of record (in this case the CR LF pair).
// All
other data is ignored until end-of-file (EOF) or end of record.
const Byte
Space = 32;
const Byte
Tab = 9;
const Byte
CR = 13;
const Byte
LF = 10;
//enum
Fields { Key, Value, RecEnd };
//int pos =
(int)Fields.Key;
const int
Key = 0;
const int
Value = 1;
const int
Bypass = 2;
const int
RecEnd = 3;
int pos =
Key;
Byte[]
keyField = new byte[16]; // max of 16 bytes for a key
int index =
0; // index into keyField for next Byte
int
keyValue = 0;
int
valueValue = 0;
for (int i
= 0; i < count; i++) // examine each byte of data
{
byte
debugxx = data[i];
// Parse
key
switch
(pos)
{
case
Key:
if
((data[i] != Space) & (data[i] != Tab) & (data[i] != CR) & (data[i]
!= LF))
{
keyField[index] = data[i];
index += 1;
}
else // at end of field
{
//
Convert the byte string to an integer.
//
Check the StructClasses to find the Key or add a new one.
//
Remember the index into the KeyValueRec to use for the field Value.
keyValue
= ConvertToNumeric(keyField, index);
index = 0; // reinitialize to capture value associated with key
pos
= Bypass;
if
((i + 1) < count)
{
if ((data[i + 1] >= 48) & (data[i + 1] <= 57))
{
pos = Value; }
}
}
break;
case
Bypass: // ignore text until a numeric is next
if
((i + 1) < count)
{
if
((data[i + 1] >= 48) & (data[i + 1] <= 57))
{
pos = Value; }
}
break;
case
Value:
if
((data[i] != Space) & (data[i] != Tab) & (data[i] != CR) & (data[i]
!= LF))
{
keyField[index] = data[i];
index += 1;
}
else
// at end of field
{
//
Convert the byte string to an integer.
//
Check the StructClasses to find the Key or add a new one.
//
Remember the index into the KeyValueRec to use for the field Value.
valueValue = ConvertToNumeric(keyField, index);
index = 0; // reinitialize to capture next key
pos
= RecEnd; // expecting no more numeric fields before next Key
if
((i + 1) < count)
{
if ((data[i + 1] >= 48) & (data[i + 1] <= 57))
{
pos = Key; }
}
// Update Value for Key in table
Update(keyValue, valueValue);
}
break;
case
RecEnd:
if
((i + 1) < count)
{ if ((data[i + 1] >= 48)
& (data[i + 1] <= 57))
{
pos = Key; }
}
index = 0;
break;
} // end
of switch
if (i
>= count)
{
return;
}
} // end
for
return;
} // Parse
This method invokes two more methods, ConvertToNumeric
and Update that is to add the new Key, Value pair into a structure that can be examined after
the file data has been completely parsed.
Note that the Key and Value fields are the only ones assumed to contain
digits (the ASCII values of 48 thru 57.
Constants of Zero and Nine should have been assigned to these
values. Also note that the enum should
have been recognized by C# as a valid construct but was not by this particular
version of the compiler so the constants Key, Value, Bypass and RecEnd were
created instead.
private int ConvertToNumeric(byte[] keyField, int
index)
{
// Convert
the byte string to an integer.
// Check
the StructClasses to find the Key or add a new one.
// Remember
the index into the KeyValueRec to use for the field Value.
int start;
start = 0;
int finish;
finish =
index - 1;
if (finish
< 0) // debug
{
MessageBox.Show("ConvertToNumeric error 1"); }
int keyInt
= 0;
if
((keyField[finish] >= 48) & (keyField[finish] <= 57))
{
keyInt =
keyField[finish] - 48; // convert ASCII to digit
}
finish--;
int m = 10;
while
(finish >= start)
{
if
(finish < 0) // debug
{
MessageBox.Show("ConvertToNumeric error 2"); }
if
((keyField[finish] >= 48) & (keyField[finish] <= 57))
{
keyInt
= keyInt + (m * (keyField[finish] - 48));
}
m = m *
10;
finish--;
}
return
keyInt;
} // ConvertToNumeric
The Update method that follows is not what I wanted and,
since I only did a few records in the file that I created, I don’t know if it
is correct. The Ada application does
parse the actual file which I obtained later.
With C# I wanted to use a class to implement the structure that I
wanted. There is a struct keyword in C#
but it really a type of class and not the same as a struct in C. Therefore I created a KeyValueRec class and
a Keys class where KeyValueRec was
Don't take this code seriously since at this time I don’t
remember what commented out code was commented out due to trying to find
something that would execute and what was commented out after I found that the
version of C# provided just wouldn’t execute no matter what. This turned out because C# would not allow
arrays of the KeyValueRec class even though it would compile without error.
public
class KeyValueRec
{ // to
be used as a record structure
public int key; // key
associated with values
public int valueCount; // number of different values
public int[] values = new int[30];
public int[] sums = new int[30];
// public KeyValueRec() // constructor
// {
// key = -1;
// valueCount = 0;
// } // end constructor
} // end
struct KeyValueRec
and identifies the key and valueCount specifies the number
of values added to the values and sums arrays.
Keys had to be mostly commented out because C# compiled OK
but any use of an object of the class resulted in an exception. (Not an ideal thing for a compiler to
do. If it isn't going to process a
structure, a compiler should disallow it when attempting to build the
solution.)
public
class Keys
{ // to
be used as a record structure
public int keyCount; // number
of different keys
//
public static int[] keyTable = new int[100]; // space for 100 different keys
//
public static KeyValueRec[] Value = new KeyValueRec[100]; // values associated
with each key
//
public KeyValueRec[] keyTable = new KeyValueRec[100]; // space for 100
different keys with values
//
public KeyValueRec keyTable1 = new KeyValueRec(); // space for 1 key with
values
//
public static KeyValueRec[] Value = new KeyValueRec[100]; // values associated
with each key
public Keys() // constructor
{ //
Initialize to no keys and no values for a key -- do when add a key
keyCount = 0;
// keyTable1.key = 0;
// keyTable1.valueCount = 0;
// keyTable1.values[0] = 0;
// keyTable1.sums[0] = 0;
for (int i = 0; i < 100; i++)
{
//keyTable[i].valueCount = 0;
//keyTable[i].key = 0;
//keyTable[i].values[0] = 0;
//keyTable[i].sums[0] = 0;
//// Value[i].valueCount = 0;
//// keyTable[i].
}
} //
end constructor
//
Update tables with 'key' and 'value'
//
public void Update(int key, int value)
// {
// Search keyTable
// int keyIndex = 0;
// for (int i = 0; i < keyCount; i++)
// {
// if (key == keyTable[i].key)
// {
// keyIndex = i;
// int valueIndex = 0;
// Search array of values associated with key
//// for (int j = 0; j
< Value[i].valueCount; j++)
// for (int j = 0; j <
keyTable[i].valueCount; j++)
// {
//// if (value == Value[i].value[j])
// if (value ==
keyTable[i].values[j])
// { // Increment number of
references to the value associated with key
// valueIndex = j;
// Value[i].sum[j]++;
// keyTable[i].sums[j]++;
// }
// }
// if (valueIndex == 0)
// { // value not found -- add new
value to list
// Value[i].value[Value[i].valueCount] = value;
//
Value[i].sum[Value[i].valueCount] = 1; // first instance
// Value[i].valueCount++;
//
keyTable[i].values[keyTable[i].valueCount] = value;
//
keyTable[i].sums[keyTable[i].valueCount] = 1; // first instance
// keyTable[i].valueCount++;
// }
// }
// }
// if (keyIndex == 0) // key not found
// { try
// {// add new key with its value to the
tables
// keyTable1.key = key;
// keyTable1.values[0] = value;
// keyTable[keyCount].key = key;
// keyTable[keyCount].values[0] =
value;
// keyTable[keyCount].sums[0] = 1;
// first instance of value for key
// keyTable[keyCount].valueCount =
1; // first array entry
// keyCount++;
// }
// catch (Exception ex)
// {
// MessageBox.Show("Error:
Could not add to keyTable " + ex.Message);
// }
// }
// } //
end Update
} // end
class Keys
The Update method below has commented out code from when these
classes were attempted to be used.
Instead, the following objects (keyTable, valueKey, etc) were added at
the beginning of the Form1 class that Visual C# provided.
namespace MaxColSumbyKey
{
public
partial class Form1 : Form
{
public
Form1()
{
InitializeComponent();
}
Keys
keyTable = new Keys(); // instantiation of table to contain the keys and their
values
// Note:
Each object begins with "value" to associate all the objects as part
of the same
int[]
valueKey = new int[100]; // key
associated with values
int[]
valueCount = new int[100]; // number of different values
int[,]
valueValues = new int[100, 30]; // different values of the key
int[,]
valueSums = new int[100, 30]; // sum
of values of particular key
Here, following the lines that C# provided to begin the
Form1 class in the MaxColSumbyKey namespace, is an instantiation of the Keys
class as it ended up without instantiations of arrays of the KeyValueRec
class. In place of the KeyValueRec
class the separate arrays of valueKey, valueCount and the double indexed arrays
of valueValues and valueSums are declared where the prefix “value” was used
with each in an attempt to associate them.
The Update method below then references them although it most likely
needs some work.
// Update tables with 'key' and 'value'
public void Update(int key, int value)
{
// Search
keyTable
int
keyIndex = -1;
for (int i
= 0; i < keyTable.keyCount; i++)
{
//if (key
== keyTable.keyTable[i].key)
//if (key
== valueTable[i].key)
if (key
== valueKey[i])
{
keyIndex = i;
int
valueIndex = -1;
//
Search array of values associated with key
//for
(int j = 0; j < Value[i].valueCount; j++)
//for
(int j = 0; j < valueTable[i].valueCount; j++)
for
(int j = 0; j < valueCount[i]; j++)
{
//if
(value == Value[i].value[j])
//if
(value == valueTable[i].values[j])
if
(value == valueValues[i, j])
{ //
Increment number of references to the value associated with key
valueIndex = j;
//Value[i].sum[j]++;
//valueTable[i].sums[j]++;
valueSums[i, j]++;
return; // value added
}
}
if
(valueIndex < 0)
{ //
value not found -- add new value to list
//
Value[i].value[Value[i].valueCount] = value;
//
Value[i].sum[Value[i].valueCount] = 1; // first instance
// Value[i].valueCount++;
//valueTable[i].values[valueTable[i].valueCount] = value;
valueValues[i, valueCount[i]] = value;
//valueTable[i].sums[valueTable[i].valueCount] = 1; // first instance
valueSums[i, valueCount[i]] = 1; // first instance
//keyTable[i].valueCount++;
valueCount[i]++;
return;
// value added
}
}
}
if
(keyIndex < 0) // key not found
{
try
{// add
new key with its value to the tables
// keyTable.keyTable1.key = key;
// keyTable.keyTable1.values[0] = value;
//valueTable[keyTable.keyCount].key = key;
//valueTable[keyTable.keyCount].values[0] = value;
//valueTable[keyTable.keyCount].sums[0] = 1; // first instance of value
for key
//valueTable[keyTable.keyCount].valueCount = 1; // first array entry
valueKey[keyTable.keyCount] = key;
valueValues[keyTable.keyCount, 0] = value;
valueSums[keyTable.keyCount, 0] = 1; // first instance of value for key
valueCount[keyTable.keyCount] = 1; // first array entry
keyTable.keyCount++;
}
catch
(Exception ex)
{
MessageBox.Show("Error: Could not add to keyTable " +
ex.Message);
}
}
} // end Update
The inadequate file used follows.
2017 5
2016 4
2017 1
2017 5
where CR and NL end each record (since created as a DOS
compliant file) but aren’t visible.
Following the discussion of the GNAT Ada application, there
will be another discussion of a C# application where the sample file provided
by Jeffrey Fuller will be used and arrays will be used from the beginning to
keep track of the data.
Discussion of the GNAT Ada Application
Ada applications consist of packages that are containers for
code and static variables and subroutines that are either procedures that have
input and output parameters or functions that only have input parameters and return
a single result (corresponding to a non void C or C# method).
GNAT provides various libraries that can be used – some of
which interface to Windows (or Linux depending on the operating system being
used) to read files and the like. I
haven’t, as yet, come across a library that would allow a Windows interface
like what is available with C# (or a Linux interface that is available with
Mono; the Linux variation of C#).
Therefore, the Ada application just has the filename with its path
encoded into the application.
This particular Ada application reads the file supplied by
Jeffrey Fuller. After getting it I
found that it had extraneous text prior to the Key and two Value fields (or
what I assume are Value fields) rather than one. Examining the data I found that each record only had a trailing
NL (new line) so not a DOS formatted file.
And that the final record was without the trailing NL. Preceding the Key and each of the two Value
fields was a HT (horizontal tab). That
is, the first two records look like
49 44 57 49 50 95 78 85 77 9 49 48 48 48 9 49 9 49
10
49 44 57 49 50 95 78 85 77 9 49 48 48 48 9 50 9 49
10
which translate to ASCII as below.
1 ,
9 1 2 _ N
U M HT 1 0
0 0 HT 1 HT 1 NL
1 ,
9 1 2 _ N
U M HT 1 0
0 0 HT 2 HT 1 NL
where the numbers below are the byte positions.
1 2
3 4 5 6 7
8 9 10 11 12 3 14 15 6 17 18 19
Therefore I decided to find the Key with the greatest number
of combined value fields of a particular Value. In the two record sample this would be a value of 1 since there
are three instances of 1 and only one instance of 2 for the Key of 1000.
The Ada Main procedure is the entry point into the
application when it is executed. Since
it is only a procedure, variable objects declared in it are on the stack rather
than in static memory. Therefore, as is
the normal practice, it is a minimal procedure that invokes another procedure
or function in a package of the application.
In this instance, the Main procedure is as follows.
with Max_Col_Sum_by_Key;
procedure Main is
begin -- Main
-- Execute
the project from the beginning.
Max_Col_Sum_by_Key.Open;
end Main;
where the “with” statement informs the compiler of the
package where the Open procedure can be found.
Ada packages have a specification that provides the
declarations that are to be visible to other parts of the application and a
body where the implementation is encoded as well as declarations and variables
that are to be visible only within the particular package. (Note: Ada packages can somewhat be thought
of as similar to namespaces in C#.) The
specification for this package is
package Max_Col_Sum_by_Key is
procedure
Open;
-- Main
entry point to package
end Max_Col_Sum_by_Key;
That is, the declaration of the Open procedure is the only
visible construct of the package. The
body of the package is
with Windows_Itf;
package body Max_Col_Sum_by_Key is
type
Unsigned_Byte
--|
Unsigned 8-bit byte
is mod
2**8;
for
Unsigned_Byte'Size use 8;
type
Unsigned_Byte_Array
--| Unconstrained
array of unsigned bytes
is array
(Integer range <>) of Unsigned_Byte;
type
Data_List_Type
--
Structure defining data to be passed to Update procedure
is record
Count :
Integer;
-- Number
of data bytes in List
List : Unsigned_Byte_Array(1..10); -- with spare
bytes
--
Captured data
end record;
type
Key_Count_Type
-- Maximum
of 100 unique keys allowed
is new
Integer range 0..100;
type
Value_Count_Type
-- Maximum
of 30 unique values allowed
is new
Integer range 0..30;
type
Value_Data_Type
is record
Value :
Integer;
-- Unique
value
Sum : Integer;
-- Sum of
the instances of the Value
end record;
type
Value_Array_Type
is array
(1..Value_Count_Type'last) of Value_Data_Type;
type
Value_List_Type
is record
Count :
Value_Count_Type;
-- Number
of unique values in the List
List : Value_Array_Type;
-- List
of unique values associated with a key
end record;
type
Value_Pair_Type
-- Unique
values for each of the two value fields of a particular key
is array
(1..2) of Value_List_Type;
type
Key_Data_Type
-- Data to
be retained for each unique key
is record
Key : Integer;
-- Unique
input key as converted to a numeric value
Values :
Value_Pair_Type;
-- Lists
of unique values for each of the two values assigned to a key
end record;
type
Key_Array_Type
is array
(1..Key_Count_Type'last) of Key_Data_Type;
type
Key_List_Type
is record
Count : Key_Count_Type;
-- Number
of unique keys in the list
Key_Data
: Key_Array_Type;
-- List
of unique keys with their associated values
end record;
Key_Table
-- Capured
data from input file
:
Key_List_Type;
Buffer_Length
-- Number
of bytes read from file
: Integer;
Buffer_Size
-- Size of
buffer
: constant
Windows_Itf.DWORD := 5000;
Buffer
-- Data
read from file
:
Unsigned_Byte_Array( 1..Integer(Buffer_Size) );
Update_Count
-- For
debugging Update
: Integer
:= 0;
------------------------------------------------------------
--
Procedure declarations
procedure
Parse;
-- Parse
the data of the file and then continue to produce the result.
procedure
Read
-- Read and
save the data of the file.
(
Handle : in Windows_Itf.File_Handle;
-- Handle of file
Success :
out Boolean
-- True
if Read was successful
);
procedure
Report;
-- Find key
with most different values and report
procedure
Update
( Key : in Data_List_Type;
-- Key
extracted from file buffer
Value1 :
in Data_List_Type;
-- First
value extracted from file buffer
Value2 :
in Data_List_Type
-- Second
value extracted from file buffer
);
------------------------------------------------------------
--
Procedure implementations
procedure
Open is separate;
procedure
Parse is separate;
procedure
Read
(
Handle : in Windows_Itf.File_Handle;
Success :
out Boolean
) is
separate;
procedure
Report is separate;
procedure
Update
( Key : in Data_List_Type;
Value1 :
in Data_List_Type;
Value2 :
in Data_List_Type
) is
separate;
end Max_Col_Sum_by_Key;
The Windows_Itf package is a special package to provide
types, variables, and procedure and function declarations to interface to GNAT
library supplied routines that support Windows. These libraries are provided when the publicly available GNAT Ada
and C compilers are installed via the internet. I’ve created a much more extensive interface package over the
years to support the use of Windows and Linux invocations and have selected
certain variables and procedure/function declarations for use by the
Max_Col_Sum_by_Key application to include in the Windows_Itf package although
not all were used by this application.
This package body contains the declarations to contain the
file data as opened by the Open procedure and input by the Read procedure. The Parse, Update and Report procedures are
similar in purpose to those of the C# application. The code of each of these procedures could be provided within
this package body but to keep the amount of material that must be scrutinized
at a time to a minimum, it is normal practice to provide the implementation in
separate files.
As per my normal practice I have declared record structures
to associate an array with the variable that keeps track of the number of array
items that actually contain data. Note
that Ada arrays are usually declared to begin with an index of 1 rather than 0
as in C and C# although this isn’t necessary.
That is, an array type could be declared to range from -10 to -1 if this
was desirable to mimic that of a piece of equipment, for instance.
The Key_Table static memory object has been declared as I
wanted to do in the C# implementation by building up a complex record
type. That is, the Key_Table to keep
track of the parsed data is declared to be of the Key_List_Type which is a
record containing the Count of the number of unique keys that have been parsed
and an array that has been sized large enough to hopefully contain all the
different keys of the file. Note that
the Count and the Key_Array_Type have been sized using a Key_Count_Type rather
than just using an Integer. This is
because Ada is strongly typed and by using such types when the application is
run an exception (unless the feature has been turned off) will be thrown if the
range is exceeded. This prevents
storage of data beyond the limits of the object and thus overwriting other
code. It is also useful to the coder
since it prevents, for instance, confusing the index variable for one array
with that of another. That is, the Ada
compiler will refuse to allow a variable of one type from being used where the
array has been declared to use another type.
I associate the count of the number of array positions used and the
array into their own record type to avoid confusion of what value indicates the
number of used array elements.
The Key_Data_Type has been declared to contain the value of
the Key and a pair of Values for the two value fields of each file record. The array elements of each of these are
sized to the Value_Count_Type and consist of the unique Value from the file
record for a particular Key and the Sum of the number of times that particular
Value is contained in the file.
Thus the structure is
Keys
|
+-->
for each key -- Value 1 and Value 2
| |
v v
Value Sum Value Sum
| |
v v
unique # of instances
values of the value
Where the structure to the right is repeated for each
different Key. The Value 1 and Value 2
structures are identical and are associated with the first and second Values in
a record of the file. The Value array
and the Sum array will attain the same length (e.g., Count) for any particular
Key with each instance of a Value containing the particular value extracted
from the file data and the associated Sum the running count of the number of
times the Value was specified in the file for the particular Key.
The Key_Table is declared in the package body so as to be
static. That is, to remain from one
call to Update to the next. If declared
in the Update procedure the memory used would be that of the stack and hence
the table would be freed upon return from the procedure so it wouldn't
accumulate. The data contained in the
table will be supplied by the Update procedure and perused by the Report
procedure to obtain the Key with the most references to a particular Value.
The Open procedure is
with Text_IO;
separate( Max_Col_Sum_by_Key )
procedure Open is
Done
-- Result
of the Close
: Boolean;
File
-- Handle
of file
: Windows_Itf.File_Handle;
Success
-- Result
of the Read
: Boolean;
CName
:
String(1..51) := "C:/Source/LearnToCodeGR-Ada/max-col-sum-by-key.tsv
";
use type
Windows_Itf.File_Handle;
begin -- Open
-- Make a C
terminated string.
CName(51)
:= ASCII.NUL;
File :=
Windows_Itf.Open_Read( Name => CName );
if File =
Windows_Itf.Invalid_File_Handle then
Text_IO.Put_Line( "File not found" );
return;
end if;
-- Read and
save the data of the file.
Read(
Handle => File,
Success => Success );
-- Close
Done :=
Windows_Itf.Close_File( Handle => File );
-- Parse
the file and report the results.
if Success
then
Parse;
Report;
end if;
end Open;
Since Windows based functions are much more limited than in
Visual C# the location of the file to be opened is supplied via the CName
variable and, due to the nature of the GNAT supplied C function the string has
to be NUL terminated. Text_IO is an Ada
supplied package. Note: I provided the
NUL termination before I changed the Windows_Itf Open_Read function to also
provide a terminating NUL so this C string pathname ends up doubly NUL
terminated. Since the need for a NUL
terminated string shouldn't have to be considered by the Open procedure, I
should have made the changed to the Open_Read function first so only the
Windows_Itf package would have needed to know what the GNAT provided routine
needed. The Windows_Itf package
function (after the modification) is
function
Open_Read
( Name :
String;
Mode :
Mode_Type := Text
) return
File_Handle is
FileDesc
: GNAT.OS_Lib.File_Descriptor;
NameWithNULTerminator
:
String(1..Name'Length+1);
function
File_Descriptor_to_Handle
is new
Unchecked_Conversion( Source => GNAT.OS_Lib.File_Descriptor,
Target =>
File_Handle );
function
to_Mode is new Unchecked_Conversion( Source => Mode_Type,
Target =>
GNAT.OS_Lib.Mode );
begin --
Open_Read
NameWithNULTerminator(1..Name'Length) := Name;
NameWithNULTerminator(Name'Length+1) := ASCII.NUL;
FileDesc
:= GNAT.OS_Lib.Open_Read( Name =>
NameWithNULTerminator'address,
FMode => to_Mode(Mode) );
return
File_Descriptor_to_Handle( FileDesc );
end
Open_Read;
where GNAT.OS_Lib is a GNAT Ada supplied package. It needs to be passed the pointer to the
path Name which is why the ‘address operator is used to pass the address of the
NUL terminated path object rather than the object itself.
After the file has been opened (and verified that a file was
found to be opened), the Read procedure is invoked to read the contents of the
file into the Buffer declared in the package body. As with the C# application, for a bigger file this and Parse
would need to be coordinated to partially read and parse the file until the
complete file had been processed. The
Read procedure has been declared to return the Success boolean to indicate
whether the file was successfully read.
If it was, then the Parse followed by Report procedures are called.
In the code of the Read procedure that follows, four
different Ada packages are referenced.
The Windows_Itf doesn’t need a “with” statement since it was withed for
the package body so the Ada compiler already knows about it.
The Unchecked_Conversion's are to type cast the Source
reference type to the Target type. It
is assumed that the coder knows what they are doing when an
Unchecked_Conversion is used and that the size (width) of the Source and Target
are the same since Unchecked_Conversion only overlays the object of the Source
type onto that of the Target type and doesn't convert one type to the
other. These can be needed at times due
to the strong typing of Ada. That is,
unlike C, two variables of different types that are really variations of an
integer cannot be used in place of one another. For instance, in the arrays that were declared in the package
body, a different type was specified for the Key array versus the Value
array. Therefore, Ada won’t allow an
index mix-up of specifying an index for the Key array when referencing the
Value array. This is why it is good
practice to declare unique types for the two rather than just using Integer for
both. These particular
Unchecked_Conversion functions are to change the type from that used by the
GNAT C code to what I am using in the Ada application.
with Interfaces.C;
with System;
with Text_IO;
with Unchecked_Conversion;
separate( Max_Col_Sum_by_Key )
procedure Read
( Handle :
in Windows_Itf.File_Handle;
Success :
out Boolean
) is
Result
-- Result
returned from read file
:
Windows_Itf.BOOL;
function
to_PVOID is new Unchecked_Conversion
( Source => Windows_Itf.File_Handle,
Target => Windows_Itf.PVOID );
function
to_LPCVOID is new Unchecked_Conversion
( Source => System.Address,
Target => Windows_Itf.LPCVOID );
function
to_LPDWORD is new Unchecked_Conversion
( Source =>
System.Address,
Target => Windows_Itf.LPDWORD );
use type
Interfaces.C.unsigned_long;
use type
Windows_Itf.BOOL;
begin -- Read
Result :=
Windows_Itf.ReadFile
(
File => to_PVOID(Handle),
Buffer =>
to_LPCVOID(Buffer'address),
NumberOfBytesToRead => Buffer_Size, -- size of buffer
NumberOfBytesRead =>
to_LPDWORD(Buffer_Length'address),
Overlapped => null ); -- not overlapped IO
if
Buffer_Length <= 0 or else Result = 0
then
Text_IO.Put_Line("Read Failed ");
Success
:= False;
else
Success
:= True;
end if;
declare
Count :
Integer := 0;
Data : String(1..19);
Data_Hex
: Unsigned_Byte_Array(1..19);
for
Data_Hex'Address use Data'address;
J :
Integer := 0;
L :
Integer := 0;
type
StringType is new String(1..4);
function
ByteToString is new Unchecked_Conversion( Source => Unsigned_Byte,
Target => Character
);
function
IntToString is new Unchecked_Conversion( Source => Integer,
Target =>
StringType );
begin
for I in
1..Buffer_Length loop
J := J
+ 1;
Data(J)
:= ByteToString(Buffer(I));
-- Text_IO.Put(bytetostring(Data(j));
if (J =
19) or else (I = Buffer_Length) then
Text_IO.Put_Line(String(IntToString(J)));
Text_IO.Put_Line(Data);
Count
:= Count + 1;
if
Count = 49 then
L
:= 49; -- line to set break on
end
if;
for K
in 1..19 loop
Data(K) := ASCII.NUL;
end
loop;
J :=
0;
end if;
end loop;
end;
end Read;
The code uses the Windows_Itf ReadFile function to read the
file into the Buffer. The “to”
conversions are used to pass the needed types to the Windows_Itf function or,
in the case of the NumberOfBytesRead parameter, get the value returned. Note that an address is supplied for
this. An Ada function cannot have an
“out” parameter so NumberOfBytesRead cannot be such as
NumberOfBytesRead : out Integer;
But, since an address is being passed in, this restriction
is avoided. Of course it could also
have been avoided by supplying a record type for the function return that
contained both the Result BOOL and the number of bytes read as fields of the
record. But, since the GNAT library
function is being referenced, this option isn’t considered.
The code in the declare block is only to output what the
file records look like as characters and isn’t really needed. That is, as a string special characters such
as Horizontal Tab won’t show up – only printable ASCII characters show. This was to get an idea of what the file looked
like and why 19 ended up as the size of the array. This ended up with output such as
0000,912_NUM 1000 1 1
displayed in the GNAT GPS debugger window. This looks longer than 19 characters since
the HT characters cause the next displayable character to be moved to the right
to the next tab position. As mentioned
before, the bytes of data were
49 44 57 49 50 95 78 85 77 9 49 48 48 48 9 49 9 49
10
where the 9s are the HTs and the 10 is the NL such that 49
48 48 48 is the Key (1000 as a numeric value rather than a series of ASCII
characters), 49 (1) is the first Value and 49 (that is, 1) is also the second
value.
The Parse procedure is
with System;
separate( Max_Col_Sum_by_Key )
procedure Parse is
-- Notes:
-- Unlike
the C# version, Ada has the ability to declare record structures.
-- And,
since the format of the file is known and has data in fixed columns,
-- the data
can easily be separated into fields.
-- Each record in the Max-Col-by-Key.tsv file has
the format
-- 0,912_NUM 1000 1 1
-- That is, 10 characters to be ignored including a
TAB, then a Key of 5
-- characters including a trailing TAB, then the two
1 character digits
-- with a trailing TAB after the first and a NEW
LINE after the second
-- except for the last record which doesn't have the
trailing NEW LINE.
-- Therefore, if it was known that the non-truncated
file never had values
-- in either of the last two data fields that
exceeded one digit then the
-- file could be parsed by overlaying each 19 byte
slice of the data buffer
-- with an object of this record type and then
selecting the Key, Digit1,
-- and Digit2 fields to build a data structure to
use to be able to answer
-- which Key has the greatest sum of Digit1 or
Digit2 values. (Or whatever
-- the question was that the class exercise was to
answer.)
-- And, of course, if it were known in advance that
the file was made up of
-- 19 byte records, each record could be separately
read into a buffer of
-- the following format without the need to input
the contents of the entire
-- file.
Also, of course, if the file was too large to be read all at once
-- a buffer of much smaller size could be used and
the bytes could be parsed
-- until remaining bytes were insufficient to
represent the next record. Then
-- the remaining bytes could be copied to the
beginning and additional bytes
-- from the file could be read from that point on to
again fill the buffer
-- and the decoding continued.
type
Data_Record_Type
is record
Ignore :
String(1..10); -- includes trailing horizontal tab
Key : String(1..4); -- 4 digits of the key
Tab1 : Character; -- horizontal tab
Digit1 :
Character; -- whatever this digit
means
Tab2 : Character; -- horizontal tab
Digit2 :
Character; -- whatever this digit
means
NL : Character; -- new line to end each record except last
end record;
for
Data_Record_Type'size use 19*8; -- 19 bytes of 8 bits
Data_Record_Size
: constant
Integer := 19; -- bytes
-- Since it isn't known that the file will never have
records that are longer
-- than 19 bytes, the record will be parsed by
locating the non-digit markers
-- to separate data fields as was done in the C#
version and the above
-- record structure will not be used.
Key_Bytes
:
Data_List_Type;
Value1_Bytes
:
Data_List_Type;
Value2_Bytes
:
Data_List_Type;
Offset
-- Offset
into data buffer read from file
: Integer
:= 1;
Index
-- Index
into Data array
: Integer
:= 0;
type
Scan_Phase_Type
is (
Ignore, -- beginning of record to be
ignored
Key, -- obtain key
Value1, -- obtain first value
of record
Value2
); -- obtain second value of record
Scan_Phase
-- Keep
track of portion of record being parsed
:
Scan_Phase_Type := Ignore;
xxx : Unsigned_Byte;
-- to see char in debugger
HT
--
Horizontal Tab
: constant
Unsigned_Byte := 16#09#;
NL
-- New Line
: constant
Unsigned_Byte := 16#0A#;
begin -- Parse
loop --
until end of Buffer
xxx :=
Buffer(Offset); -- to use debugger to see next value
-- Scan
for NL that ends record while extracting data fields
case
Scan_Phase is
--
Ignore bytes until after first HT found
when
Ignore =>
if
Buffer(Offset) = HT then
Scan_Phase := Key;
--
Initialize for next set of data
Key_Bytes.Count := 0;
Value1_Bytes.Count := 0;
Value1_Bytes.Count := 0;
end
if;
--
Capture the key
when
Key =>
if
Buffer(Offset) /= HT then
Index := Index + 1;
Key_Bytes.List(Index) := Buffer(Offset);
else
Key_Bytes.Count := Index;
Scan_Phase := Value1; -- Value immediately follows HT
Index := 0;
end
if;
--
Capture the first value
when Value1 =>
if
Buffer(Offset) /= HT then
Index := Index + 1;
Value1_Bytes.List(Index) := Buffer(Offset);
else
Value1_Bytes.Count := Index;
Scan_Phase := Value2; -- Value immediately follows HT
Index := 0; -- Capture
the first value
end
if;
--
Capture the second value
when
Value2 =>
if
Buffer(Offset) /= HT and then Buffer(Offset) /= NL
then
Index := Index + 1;
Value2_Bytes.List(Index) := Buffer(Offset);
else
Value2_Bytes.Count := Index;
Index := 0;
--
Update tables with the data from the record.
Update( Key_Bytes, Value1_Bytes, Value2_Bytes );
-- Initialize for next record
Key_Bytes.Count := 0;
Value1_Bytes.Count := 0;
Value2_Bytes.Count := 0;
Scan_Phase := Ignore;
end
if;
end case;
Offset :=
Offset + 1; -- increment to next Buffer position
if Offset
> Buffer_Length then -- no more data
if
Value2_Bytes.Count > 0 then -- last record fully parsed without trailing NL
Update(Key_Bytes, Value1_Bytes, Value2_Bytes);
elsif
Scan_Phase = Value2 and then
Index > 0
then --
last record stopped parsing the value w/o trailing NL
Value2_Bytes.Count := Index;
Update(Key_Bytes, Value1_Bytes, Value2_Bytes);
end if;
exit;
-- loop
end if;
end loop;
end Parse;
This routine would be similar to a C# one parsing the same
file. Except with Ada the enumerated
type can be declared and used to keep track of the current Scan_Phase (although
the C# compiler should have supported it).
The xxx variable was declared for use in the debugger where one can
hover over to see the current value while getting the code correct.
The Report procedure follows. The
package
Int_IO is new Text_IO.Integer_IO( Integer );
statement instantiates an instance of the Text_IO.Integer_IO
package to output integer values.
with Text_IO;
separate( Max_Col_Sum_by_Key )
procedure Report is
-- Create a Max Combined structure with a Key, a
Value and a Sum.
-- For each Key in the Key Table
-- For each
Value in the first paired array
-- Search
the second paired array for the Value
-- If
found, add its Sum to that of the first paired array and
--
compare the total to the current Sum in the Max Combined.
-- If
greater, replace the Key, Value and Sum in the Max Combined
--
with the new Key, Value and Sum
--
Otherwise, compare the second paired array Sum to that of Max
--
Combined and, if greater do the replacement.
-- Report the result.
Found
-- True if
Value found in second field's data for Key
: Boolean;
Key
-- Current
key from table
: Integer;
Sum
-- Current
number of instances of Value for the Key
: Integer;
Value
-- Current
Value for the Key
: Integer;
type
Max_Combined_Type
is record
Key : Integer;
-- Key
with most different values associated with it
Value :
Integer;
-- First
or second value
Sum : Integer;
-- Number
of instances of Value associated with Key in combined first and
-- second
fields of records identified with the Key
end record;
Max_Combined
-- Key with
most instances of a particular Value in the combination of the
-- first
and second fields
:
Max_Combined_Type
:= (
Key => 0,
Value
=> 0,
Sum => 0 );
package
Int_IO is new Text_IO.Integer_IO( Integer );
begin -- Report
for I in
1..Key_Table.Count loop
Key :=
Key_Table.Key_Data(I).Key;
for J in
1..Key_Table.Key_Data(I).Values(1).Count loop
Value
:= Key_Table.Key_Data(I).Values(1).List(J).Value;
Sum :=
Key_Table.Key_Data(I).Values(1).List(J).Sum;
Found
:= False;
for K
in 1..Key_Table.Key_Data(I).Values(2).Count loop
if
Value = Key_Table.Key_Data(I).Values(2).List(K).Value then
Sum := Sum +
Key_Table.Key_Data(I).Values(2).List(K).Sum;
Found := True;
exit; -- inner loop
end
if;
end
loop;
if
Found then
if
Sum > Max_Combined.Sum then -- save
new maximum sum for a Value
Max_Combined := ( Key =>
Key,
Value => Value,
Sum => Sum );
end
if;
end if;
end loop;
-- Value
of first field for key may not be in second field but second
-- field
may have a Value with references that exceeds that of the first
-- or of
the combination of the first and second.
-- Check
if any of its unique value's references exceed the Max Combined.
-- Note:
It does no harm to check Values that were combined with those of
-- the first field since they cannot
exceed an already selected Value.
for K in
1..Key_Table.Key_Data(I).Values(2).Count loop
Sum :=
Key_Table.Key_Data(I).Values(2).List(K).Sum;
if Sum
> Max_Combined.Sum then -- save new
maximum sum for a Value
Max_Combined := ( Key =>
Key,
Value =>
Key_Table.Key_Data(I).Values(1).List(K).Value,
Sum
=> Sum );
end
if;
end loop;
end loop;
-- Report
the result.
Text_IO.Put( "Key " );
Int_IO.Put(
Max_Combined.Key, Width => 0 ); -- Width of 0 for no leading spaces
Text_IO.Put( " with the maximum number of instances " );
Int_IO.Put(
Max_Combined.Sum, Width => 0 );
Text_IO.Put( " of Value " );
Int_IO.Put(
Max_Combined.Value, Width => 0 );
Text_IO.Put_Line( " " );
end Report;
The Width => 0 parameter supplied with the invocation of
Int_IO Put causes leading blanks/spaces to be discarded.
The result of executing the program is
Key 3000 with the maximum number of instances 20 of Value 1
That is, key 3000 has all 10 first value fields and all 10
second value fields with a value of 1.
Discussion of the second C# Application
This application is a redo of the first to enable it to read
the supplied max-col-sum-by-key.tsv file.
[Note: Sometime while doing the
code I looked at the tsv file with the UltraEdit editor and allowed it to
convert the file to DOS. Therefore,
that changed the end of record character from NL (new line) to the CR LF
(carriage return; line feed) pair of characters (where LF is the same character
as NL portrayed by a different name).
The Parse routine has been written to treat the end of record either
way.]
Knowing that C# is not going to allow record structures to
be implemented I replaced those of the Ada application with single, double and
triple indexed arrays and, as I ended up in the first implementation, all
starting with the same prefix to indicate that each is a part of the same
"database". This eliminates
the other classes of the first version.
The Form1.cs [Design] panel was created from the C# Toolbox
as shown below. A click on the File
icon shows a drop down with an Open option.
Selecting it results in the openToolStripMenuItem_Click event handler being entered via the
Visual C# supplied interface to Windows.
As supplied below, this routine is the same as in the first version and
the user will navigate to the correct folder and select the tsv file to be
opened via the C# supplied methods.
--> insert picture
The beginning of Form1.cs is as follows (where, of the
supplied using statements, only System, System.IO, and System.Windows.Forms are
really needed).
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace MaxColSumbyKey
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
// Objects to contain the data of the Ada Key_Table structure
since C#
// cannot handle an array of instances of a class.
// Each object begins with the same prefix of "table" to
associate the
// objects with each other.
int tableKeyCount; // Number of different keys in tableKeys array
int[] tableKeys = new int[100]; // Unique
keys found in the input file
int[,] tableValuesCount = new int[100,2]; // Number of unique values for
each
// value field for a key
int[,,] tableValue = new int[100,2,30]; // Unique values associated with a
key
// (first index) of a
value field
// (second index) with
each unique
// value up to
tableValuesCount using
// the third index
int[,,] tableSum = new int[100,2,30]; // Sum of instances of a value associated
// with the same set of
indexes as tableValue
The Form1() constructor is provided by the C# compiler. The static variables beginning with
"table" are provided to keep track of the parsed data from the
file. tableKeyCount is to keep track of
the number of different keys in the tableKeys array. The double indexed tableValuesCount array is to keep track of the
number of unique values of each of the two value fields of each key that are in
the tableValue array while the tableSum array is to keep track of the number of
references to a particular value for a particular key. Note that this could have been done via a
fourth array index ranging from 0 to 1 by replacing tableValue and tableSum
with
int[,,,] tableValue = new int[100,2,30,2];
where the first index is that of the particular key, the
second that of the value field of the file record, the third that of the
particular unique value, and the fourth whether the bucket contains the parsed
value or the accumulated sum of the instances of the value.
The end of the namespace with the event handler is
private void openToolStripMenuItem_Click(object sender, EventArgs e)
{ // to open the file
Stream myStream = null;
// Get an instance of a FileDialog.
OpenFileDialog openFileDialog = new OpenFileDialog();
// Use a filter to allow only certain file extensions.
openFileDialog.InitialDirectory = "c:\\";
openFileDialog.Filter = "txt files (*.txt)|*.txt|All
files (*.*)|*.*";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
try
{
if ((myStream = openFileDialog.OpenFile()) != null)
{
using (myStream)
{
// Initialize
tableKeyCount = 0;
for (int i = 0; i <
30; i++)
{
tableValuesCount[i, 0] = 0;
tableValuesCount[i, 1] = 0;
}
// Read the file and build tables from the data.
ReadAndParse(myStream);
ReportResults();
return;
}
}
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not read file from disk. Original error:
" + ex.Message);
}
}
} // openToolStripMenuItem_Click
} // class Form1
} // namespace
MaxColSumbyKey
This method opens the file as before using the File|Open
widget of the form panel. It also
initializes the various array counts to 0 to prepare for parsing the file data.
ReadAndParse is as before and is
private void ReadAndParse(Stream file)
{
System.Byte[] buffer = new byte[file.Length]; // buffer to contain file
int numBytesRead = 0;
try
{
// Read everything in file.
numBytesRead = file.Read(buffer, 0, (int)file.Length);
file.Close();
}
catch (Exception ex)
{
MessageBox.Show("Error reading file");
}
Parse(numBytesRead, buffer); // Parse the
bytes read
return;
} // ReadAndParse
where the buffer is
sized to hold everything in the opened file.
The size and the buffer are then passed to Parse.
Parse has been
modified to treat the format of the tsv file.
private void Parse(int count, Byte[] data)
{
// Build a table of keys (second field of a record) and, for each
key,
// the number of value items (third and fourth fields) of the key
and
// the number of instances of the particular value.
// The first field ends with a TAB. The second field (Key) ends the
// same way as
does both the Value fields or at the end of record
// following the second Value field (in this case NL but for a
different
// file - a DOS
file - in the CR LF pair).
// All other data is ignored until end-of-file (EOF) or end of
record.
const Byte HT = 9; // horizontal tab
const Byte CR = 13;
const Byte LF = 10;
const Byte NL = 10; // new line
//enum Fields { Bypass, Key, Value1, Value2, RecEnd }; // enum
didn't work with this compiler
//int scanPhase = (int)Fields.Bypass;
const int Bypass = 0;
const int Key = 1;
const int Value1 = 2;
const int Value2 = 3;
const int RecEnd = 4;
int scanPhase = Bypass;
int keyCount = 0; // number of actual bytes in keyField
Byte[] keyField = new Byte[16]; // max of 16 bytes for a key
int value1Count = 0;
Byte[] value1Field = new Byte[10];
int value2Count = 0;
Byte[] value2Field = new Byte[10];
int index = 0; // index into keyField, etc for next Byte
int keyValue = 0; // keyField as converted
int[] valueValues = new int[2];
valueValues[0]
= 0;
valueValues[1]
= 0;
{
Byte debugxx = data[i]; // examine byte with debugger
// Parse key
switch (scanPhase)
{
case Bypass: // ignore
text until after first HT found
if (data[i] == HT)
{
scanPhase = Key;
// Initialize for next set of data
keyCount = 0;
value1Count = 0;
value2Count = 0;
}
break;
case Key: //
capture the Key
if (data[i] != HT)
{
keyField[index] = data[i];
index++;
}
else
{ keyCount = index;
keyValue = ConvertToNumeric(keyField, keyCount);
scanPhase = Value1; // Value immediately follows HT
index = 0;
}
break;
case Value1: //
capture first value
if (data[i] != HT)
{
value1Field[index] = data[i];
index++;
}
else
{
value1Count = index;
valueValues[0] = ConvertToNumeric(value1Field, value1Count);
scanPhase = Value2; // Value immediately follows HT
index = 0; // Capture the first value
}
break;
case Value2: //
capture 2nd value
if ((data[i] != HT) && (data[i] !=
NL) && (data[i] != CR) && (data[i] != LF))
{
value2Field[index] = data[i];
index++;
}
else
{
value2Count = index;
valueValues[1] = ConvertToNumeric(value2Field, value2Count);
index = 0;
// Update tables with the data from the record.
Update(keyValue, valueValues);
// Initialize for next record
keyCount = 0;
value1Count = 0;
value2Count = 0;
scanPhase = Bypass;
}
break;
}
if (i >= count)
{
return;
}
} // end for
return;
} // Parse
The above code allows
the second value field to be terminated by either a horizontal tab, a new line,
a carriage return or a line feed. Since
all the data has been read into the buffer, if there is a double terminating
character the code to bypass the initial characters of the next record will
just bypass the second terminating character first.
Including the above I
notice that nothing has been included for the Ada code case where the last
record didn't have a terminating character so that special code had to be added
to check at the end of the buffer if the final second value had yet to be
completed and the Update done for the last record. Since the file had been converted to the DOS format a trailing CR
LF was likely added.
ConvertToNumeric is
as before. That is,
private int ConvertToNumeric(byte[]
keyField, int count)
{
// Convert the
byte string to an integer.
// Check the StructClasses to find the Key or add a new one.
// Remember the index into the KeyValueRec to use for the field
Value.
const int Zero = 48;
const int Nine = 57;
int start;
start = 0;
int finish;
finish = count
- 1;
if (finish < 0) // debug
{ MessageBox.Show("ConvertToNumeric error 1"); }
int keyInt = 0;
if ((keyField[finish] >= Zero) &
(keyField[finish] <= Nine))
{
keyInt =
keyField[finish] - Zero; // convert ASCII to digit
}
finish--;
int m = 10;
while (finish >= start)
{
if (finish < 0) // debug
{ MessageBox.Show("ConvertToNumeric error 2"); }
if ((keyField[finish] >= Zero) &
(keyField[finish] <= Nine))
{
keyInt
= keyInt + (m * (keyField[finish] - Zero));
}
m = m * 10;
finish--;
}
return keyInt;
} // ConvertToNumeric
The new Update method
is
// Update tables with 'key' and 'value'
public void Update(int key, int[] value)
{
int value1 = value[0]; // to look at
int value2 = value[1]; // with debugger
// Search keyTable
bool value1Found = false;
bool value2Found = false;
int keyIndex = -1;
for (int i = 0; i < tableKeyCount; i++)
{
if (key == tableKeys[i])
{
keyIndex = i;
break; // exit loop
}
}
// Add new key
to the table
if (keyIndex < 0) // key not found
{
try
{ // add new key with its value to the tables
tableKeys[tableKeyCount] = key;
tableValue[tableKeyCount,
0, 0] = value[0];
tableValue[tableKeyCount, 1, 0] = value[1];
tableSum[tableKeyCount, 0, 0] = 1; // first
instance of value for key
tableSum[tableKeyCount, 1, 0] = 1; // first array
entry
tableValuesCount[tableKeyCount, 0] = 1; // one
pair of values
tableValuesCount[tableKeyCount, 1] = 1; // for key
tableKeyCount++;
}
catch (Exception ex)
{
MessageBox.Show("Error: Could not add to keyTable " + ex.Message);
}
}
else
{
// Find whether first Value is already in the table.
value1Found
= false;
int valueIndex = -1;
for (int k = 0; k < tableValuesCount[keyIndex, 0]; k++)
{
if (value[0] == tableValue[keyIndex, 0,
k])
{
value1Found = true;
valueIndex = k;
break; // exit loop
}
}
// Add new first value to the table for the key.
if (valueIndex < 0)
{
if (tableValuesCount[keyIndex, 0] < 30)
{ valueIndex = tableValuesCount[keyIndex, 0];
tableValue[keyIndex, 0, valueIndex] = value[0];
tableSum[keyIndex, 0, valueIndex] = 1;
tableValuesCount[keyIndex, 0]++;
}
else
{
MessageBox.Show("More different first values than app can handle");
}
}
else // first value
already in table
{ // add to
number of instances
tableSum[keyIndex, 0, valueIndex]++;
}
// Find whether second value already in table
value2Found
= false;
valueIndex
= -1;
for (int k = 0; k < tableValuesCount[keyIndex, 1]; k++)
{
if (value[1] == tableValue[keyIndex, 1,
k])
{
value2Found = true;
valueIndex = k;
break; // exit loop
}
}
// Add new second value to the table for the key.
if (valueIndex < 0)
{
if (tableValuesCount[keyIndex, 1] < 30)
{
valueIndex = tableValuesCount[keyIndex, 1];
tableValue[keyIndex, 1, valueIndex] = value[1];
tableSum[keyIndex, 1,
valueIndex] = 1;
tableValuesCount[keyIndex, 1]++;
}
else
{
MessageBox.Show("More different first values than app can handle");
}
}
else // second value
already in table
{ // add to
number of instances
tableSum[keyIndex, 1, valueIndex]++;
}
}
} // end Update
Finally,
ReportResults is
private void ReportResults()
{
int key, value, sum;
bool found;
int[] most = new int[3]; // most[0] is key, most[1] is value, most[2] is sum
// Search the
keys
for (int keyIndex = 0; keyIndex < tableKeyCount; keyIndex++)
{
key =
tableKeys[keyIndex];
int values1Count =
tableValuesCount[keyIndex, 0];
int values2Count = tableValuesCount[keyIndex,
1];
// search for greatest number (sum) of values
for (int valueIndex = 0; valueIndex < values1Count; valueIndex++)
{
value =
tableValue[keyIndex, 0, valueIndex]; // value of
first field
sum =
tableSum[keyIndex, 0, valueIndex];
found =
false;
for (int sumIndex = 0; sumIndex < values2Count; sumIndex++)
{
if (value == tableValue[keyIndex, 1,
sumIndex])
{
sum = sum + tableSum[keyIndex, 1, sumIndex];
found = true;
break; // inner
loop
}
}
if (found)
{
if (sum > most[2]) // save new maximum sum for a Value
{
most[0] = key;
most[1] = value;
most[2] = sum;
}
}
} // end for loop
// Value of first field for key may not be in second field but
second
// field may have a Value with references that exceeds that of the
first
// or of the combination of the first and second.
// Check if any of its unique value's references exceed the Max
Combined.
// Note: It does no harm to check Values that were combined with
those of
// the first field
since they cannot exceed an already selected Value.
for (int sumIndex = 0; sumIndex < values2Count; sumIndex++)
{
sum =
tableSum[keyIndex, 1, sumIndex];
if (sum > most[2])
{
most[0] = key;
most[1] = tableValue[keyIndex, 0, sumIndex]; // value and sum arrays
most[2]
= sum; // same range
}
}
} // end outer for loop
// Output the key, value, and sum to the text boxes.
keyTextBox.Text
= most[0].ToString();
valueTextBox.Text = most[1].ToString();
sumTextBox.Text
= most[2].ToString();
} // end ReportResults
The results are shown
as below.
Discussion of the Python Application
What I learned about Python was from what appeared as the
result of online searches. First, of
course, was to download Python.
Somehow, with the first attempt I downloaded the Linux version and so I
had to retry and be sure I selected the Windows version. (I'm writing this on the eighth calendar day
since I started and the seventh since I started the application. Since it was working yesterday, it took six
part-time days to learn enough about Python to produce the application.) I didn't find out about a debugger so I used
print statements to determine what was going on as I added code. Having just done an internet search, there
is a debugger that I could have used.
I didn't try to determine if there was a way to execute the
code in a Windows panel so used what once upon a time was called a DOS window –
now Command Prompt. (Type Command
Prompt into the box when Start is selected and then click on the program that
appears.) With the Command Prompt
window opened use the DOS cd (change directory) command to switch to the
Windows folder where the Python code is to reside where, as the code is written
(such as in MaxColSumbyKey.py where the extension py is standard for Python
code) can be run by just entering the name in the DOS/Command Prompt window followed
by Enter. Or entering >textname
after the Python file name to redirect the output to the named file.
Python uses a colon (:) at the end of a name to end a
declaration of a function/method, an if statement, any else, and the like. Rather than have {} brackets as in C# or
"end if" as in Ada, the code has been indented by 4 columns (as is
standard practice) and the end of the function definition, if block, etc is
indicated by where the indention returns to the previous column. The indentation has to remain constant to
the end of the block of code. This is
made easy enough via the UltraEdit editor which will start a new line at the
indention of the previous line until the user changes it.
I took the development of the application a step at a time. This involved first determining how to open
the .tsv file. This involved doing
internet searches such as "Python file open". Using the help provided I was able to do the
code (where # is used before a comment)
# Open file, Read it, and then extract each Key and
pair of Values
print("This line will be printed.")
import os
count =
os.path.getsize('C:/Source/LearnToCodeGR/max-col-sum-by-key.tsv')
# Note: This doesn't provide the correct
answer. It results in 998 whereas the
#
len(read_data) below is 949 so only read_data[0] thru [948] are valid.
print (count)
with
open('C:/Source/LearnToCodeGR/max-col-sum-by-key.tsv') as f:
read_data
= f.read()
f.closed
count = len(read_data) # get number of bytes read
from the file
print(count)
# Separate Key and pair of Values from extraneous
contents of the file and
# build list of namedtuples with number of instances
of each particular value
print("call parse")
parse(read_data,count)
# Report the results
keysx = KeyList();
keysx.report();
print("done")
As indicated by the Note, the getsize function didn't return
the number of bytes in the file. This I
didn't find out until I had written most of the code since I initially parsed
only a limited number of bytes (38 to get the first two file records) while
working on what to do to keep track of the keys and their associated values and
the number of references to a particular value for a particular key. After I had done so and then increased the
number of bytes read to read the data for the first two keys to further learn
how to do the code, I opened the parse to do the number of bytes indicated by
the getsize result. This caused the
application to have a failure for trying to read beyond the extent of the
read_data buffer. That's when I
discovered the len function to return the size of the read_data buffer.
Note that the code to call parse and report were done later
as I got to them although the initial code for parse was started immediately
after being able to open and read the file.
Since I already knew what the file looked like from when I
wrote the Ada application, the
# print(read_data)
line (now commented out via the leading # character) wasn't
necessary but was done to check that the open and read via
with open('C:/Source/LearnToCodeGR/max-col-sum-by-key.tsv')
as f:
read_data
= f.read()
had occurred as I expected.
Also note the indention by 4 columns following the : ending the previous
line.
Also note that I sometimes end lines with a semi-colon (;)
and sometimes not. With Python it’s a
"don't care". It must use the
new line as the terminator if a comment indication isn't found first.
I'll get into KeyList later. It is a class where
keysx = KeyList();
instantiates an instance of the class and then
keysx.report();
invokes the report function declared in the class. This code wasn't added until the very end
when the parse of the data in the read_data buffer had been worked out.
The Parse function ended up as
# Parse array of data as previously read from file
to
# obtain Key and two Values from each record of file
def parse(data,count):
Key_Bytes
= []
Value1_Bytes = []
Value2_Bytes = []
print(data)
print(count)
offset =
0;
scan_phase = 0 #bypass
# Parse
all bytes previously read from the file
while
offset < count:
print(offset,data[offset],ord(data[offset]));
if
scan_phase == 0: # bypass
print("bypass");
if ord(data[offset]) == 9:
scan_phase = 1; # key
print("new phase of key")
# Initialize for next set of data
Key_Bytes.clear();
Value1_Bytes.clear();
Value2_Bytes.clear();
elif
scan_phase == 1: # key
print("key");
if ord(data[offset]) != 9:
Key_Bytes.append(data[offset]);
else:
scan_phase = 2; # value1
print("new phase of value1")
#
end if;
elif
scan_phase == 2: # value1
print("value1");
if ord(data[offset]) != 9:
Value1_Bytes.append(data[offset]);
else:
scan_phase = 3; # value2
print("new phase of value2")
#
end if;
elif
scan_phase == 3: # value2
print("value2");
if ord(data[offset]) == 10 or ord(data[offset]) == 13:
# Update tables with the data from the record.
Update( Key_Bytes, Value1_Bytes, Value2_Bytes );
# Initialize for next record
Key_Bytes.clear();
Value1_Bytes.clear();
Value2_Bytes.clear();
scan_phase = 0; # bypass
print("new phase of bypass")
else:
Value2_Bytes.append(data[offset]);
#
end if;
# end
if;
#
Complete processing of final record when not terminated by New Line
offset = offset + 1 # increment to next data buffer position
if
offset >= count: # no more data
if
len(Value2_Bytes) > 0: # last record fully parsed without trailing NL
Update( Key_Bytes, Value1_Bytes, Value2_Bytes );
#
end if
break # exit loop
# end
if
return #
from Update
# end parse
Notice, I have added "# end parse" to indicate the
end of the function, "# end if" to indicate the end of an if
statement sequence, and the like to better document the code so the reader
doesn't need to completely follow the indentation changes. Also, note close to the end the call to
Update and the preceding if statement are only indented by 2 columns rather
than the usual 4 illustrating that an indentation by 4 columns isn't necessary
just as long as a fixed indentation is used following each : terminator. If whatever the indentation is used isn't
maintained the execution of the program (which, as an interpreter is also the
"compile") will fail. I
suspect that 4 columns is said to be the usual amount since, without the
terminators (such as } in C), it makes recognizing the end of a block of code
easier than a smaller indentation.
As with C, C#, etc case is important in names. Update and update would be two different
constructs. Whereas case doesn't matter
in Ada and Update and update would refer to the same thing.
The first problem to overcome with the parse was how to keep
track of the data being parsed. This
took me a while to determine. I soon
found that there was a "list" structure that was implemented by
Python but I couldn't decide how to use it.
Then I found out that there was a concept known as namedtuple where the
tuple fields could be named. (Rather
than regular tuples which were separated by commas like arrays.) This seemed like it might be similar to an
Ada record structure or a C (not C#) struct.
So after messing around I came up with
import collections
# Namedtuple declaration
Dat = collections.namedtuple('Dat', 'pkey pvalue
psum')
that I put at the beginning of the application outside of
the class and the functions. This
declares the namedtuple (which isn't in earlier versions of Python) from the
Python supplied collections giving it the name Dat and declaring that it has
the tuple names pkey, pvalue, and psum where I preceded the key, value, and sum
names with 'p' to be sure the tuple names weren't mixed up with the variables.
Notice in parse that
data[offset]
indicates an index into the data array similar to C and
C#.
Since Python doesn't have a const construct, I didn't bother
to do names such as I did in Ada for scan phases or to declare
HT
--
Horizontal Tab
: constant
Unsigned_Byte := 16#09#;
NL
-- New Line
: constant
Unsigned_Byte := 16#0A#;
and just used the numeric values. That is, 0, 1, 2, and 3 for when scanning thru the portion of the
record to be ignored, the portion that is the key, and the two values and just
using 9, 10, and 13 as the numeric value of the horizontal tab, new line, and
carriage return characters. The ord
function is the typecast of the ASCII character to its numeric value and is used
when parse is checking where a particular parse/scan phase ends.
Key_Bytes
= []
Value1_Bytes = []
Value2_Bytes = []
are lists. Therefore
append adds another character to the list.
When the second value has been terminated, the Update function is called
passing it the key and the two values.
Since the final second value of the file doesn't have a terminating new
line character, parse has the special code at the end to detect this and invoke
Update for the final key and values.
The list manipulation in Python must mean that it must grab
memory from the heap whenever a new item is appended. I don't know whether it has garbage collection of removed
items. In any case, in addition to
being an interpreter, this use of the heap makes in unsuitable for critical
applications such as onboard an aircraft where it has to be known in advance
that no matter what paths are taken while the program executes, that the
application won't run out of available memory.
In any case, the parse code is quite similar to that of the
Ada and C# applications except that it uses the Python provided list construct.
The Update function is
def Update(Key, Value1, Value2):
N_Key = 0;
N_Value1
= 0;
N_Value2
= 0;
# Convert
the Key and Values bytes to numeric.
N_Key = Bytes_to_Integer( Key
);
N_Value1
= Bytes_to_Integer( Value1 );
N_Value2
= Bytes_to_Integer( Value2 );
print(N_Key, N_Value1, N_Value2)
y =
KeyList(); # (N_Key);
y.Update(N_Key, N_Value1, N_Value2)
return
# end of Update
As I should have noted before, "def" defines a
function. It can also be noted that
unlike C# and Ada, the type of the function parameters is not provided. Python determines the type for itself from
the type used in the call. So it just
uses list for the three parameters.
This Update is small since I've put most of the code into
the KeyList class. It just calls the
Bytes_to_Integer function to convert each of the three lists to an
integer. It then instantiates an
instance of the KeyList class and invokes its Update passing the three
integers.
The Bytes_to_Integer function is similar to that of the
other two applications except that I remove any non-digits from the list that
was passed (although there shouldn't be any).
In this function I did name the characters that are the limits of the
ASCII digits.
# Convert Data to integer
def Bytes_to_Integer(Data):
Digit =
0 # Numeric digit
Number =
0 # Numeric result
Start =
1 # String position of first numeric
Nine = 57 # ASCII character for digit 9
Zero =
48 # ASCII character for digit 0
# Ignore
all non-digits.
for item
in Data:
if
ord(item) < Zero or ord(item) > Nine:
Data.remove(item)
# end
if
for item
in Data:
Digit
= ord(item) - Zero;
Number = (Number * 10) + Digit;
print(Number)
# end for
loop
return
Number;
# end of Bytes_to_Integer
The KeyList class contains the Ndata list where this list is
a Python list of namedtuples. The code
first determines whether the passed key is already in the list of
namedtuples. And then proceeds
depending upon whether it is and whether the values are already in the list for
the key. For a key and for the values
it either adds (that is, appends) them to the list if new or increments the sum
of the instance of the value if already in the list using the index to where
the match occurred as in
self.Ndata[i] = d
to update the list.
In this application, unlike the previous two, value1 and value2 are
combined into the sum of instances rather than maintaining two separate fields
in the namedtuple. (This would have
required a namedtuple of five fields.
The key, the first value with its sum of instances and the second value
with its sum.)
# Class to Update list of namedtuples and Report
results
class KeyList():
Ndata =
[] # A list of namedtuples of collection Dat
def
Update(self, key, value1, value2):
print("KeyList values",len(self.Ndata),key,value1,value2)
d =
Dat(pkey=key, pvalue=value1, psum=1) # assume first instance
updateDone = False
#
Find if the key is already in the list
dCount = 0;
for i
in range(len(self.Ndata)):
obj = self.Ndata[i]
if obj.pkey == key:
dCount += 1
print("dCount",dCount,key,value1,value2,obj.pvalue,obj.psum)
#
end if
# end
for loop
# Add
the new key with its values
if
dCount == 0:
# Check if value2 is the same as
value1 included in object d
if value1 == value2:
d = Dat(pkey=key, pvalue=value1, psum=2)
print("new key",d)
self.Ndata.append(d) # d contains two instances of value
updateDone = True
else:
#
Add two instances of new key for different value2
print("new key",d)
self.Ndata.append(d) # d contains first instance of value
d.value = value2
print("new key with new value2",d)
self.Ndata.append(d)
updateDone = True
#
end if
else:
# Add
new instance of existing key with its values
# Are
either of the values already in the list for the key?
vCount = 0
matches = 0
match1 = False
match2 = False
for i in range(len(self.Ndata)):
obj = self.Ndata[i]
sum = obj.psum
if not match1 and obj.pkey == key and obj.pvalue == value1:
matches += 1
match1 = True
sum += 1
d = Dat(pkey=key, pvalue=value1, psum=sum)
print("existing key with existing first value",matches,d)
# end if
if not match2 and obj.pkey == key and obj.pvalue == value2:
matches += 1
match2 = True
sum += 1
d = Dat(pkey=key,
pvalue=value2, psum=sum)
print("existing key with existing second value",matches,d)
# end if
if matches > 0: # key and at least one value already in list
print("existing key
with existing value",matches,d)
self.Ndata[i] = d # replace entry with new sum
if matches == 2:
print("existing key with existing
value",d)
self.Ndata[i] = d
updateDone = True
break # exit loop since update done
else:
obj = self.Ndata[i]
if match1:
# Add new instance of existing key with
new 2nd value
sum = 1
d = Dat(pkey=key, pvalue=value2,
psum=sum)
print("existing key with new 2nd
value",d)
self.Ndata.append(d)
updateDone = True
break # exit loop since update done
else: # match2 is True
# Add new instance of existing key with
new 1st value
sum = 1
d = Dat(pkey=key, pvalue=value1,
psum=sum)
print("existing key with new 1st
value",d)
self.Ndata.append(d)
updateDone = True
break # exit loop since update done
# end if
# end if
else: # key but neither value in the list
print("no match for loop index",i)
# end if
#
end of for loop
#
Add new tuple to list when neither value already in the list
if not updateDone:
# Check if value2 is the same as value1 included in object d
if value1 == value2:
d = Dat(pkey=key, pvalue=value1, psum=2)
print("new paired value",d)
self.Ndata.append(d) # d contains two instances of value
else:
# Add two instances of new key for different value2
d = Dat(pkey=key, pvalue=value1, psum=1)
print("new first value",d)
self.Ndata.append(d) # d contains first instance of value
d.value = value2
print("new second value",d)
self.Ndata.append(d)
# end if
#
end if
# end
if
return;
# end of
Update
# Report
Key and Value with maximum number of references
def
report(self):
print("Report")
Key =
0 # Key with most instances of a given
value
Value
= 0 # Value of Key with most instances
Sum =
0 # Number of instances of Value for
the Key
for i
in range(len(self.Ndata)):
obj = self.Ndata[i]
print( i, obj.pkey, obj.pvalue, obj.psum );
if Key == 0: # initialize
Key = obj.pkey
Value = obj.pvalue
Sum = obj.psum
else: # check if namedtuple has a
value with a greater sum
if obj.psum > Sum: # save key and value with greater sum
Key = obj.pkey
Value = obj.pvalue
Sum = obj.psum
# end if
#
end if
# end
loop
print( "Key and Value with greatest Sum", Key, Value, Sum )
return
# end
report
# end of class
For some reason, Python doesn't know which instance of the
class is being referred to unless the "self" parameter is used to
indicate "this" one.
Therefore, self is specified as the first parameter in the function
parameter list and is used in such references as
for i
in range(len(self.Ndata)):
obj = self.Ndata[i]
where the Ndata list declared immediately the declaration of
the class is being referenced. It took
me a while to get these matters sorted out.
Such as the need to supply the self prefix rather than just reference
Ndata which is declared within the class.
After finally getting the code to build the list correctly I
could have removed all the extra print statements that showed me which paths
were being taken. I have left them in
as an illustration of what can be done when there isn't a debugger to use while
verifying that code correctly solves a problem.
The Report function of the KeyList class just has to look
for the largest sum so simpler than in the other two applications. The print statement towards the top of the
for loop in the function displays what was built in the class's Update
function. It results in
0 1000 1 13
1 1000 2 7
2 2000 1 16
3 2000 2 4
4 3000 1 20
5 4000 1 15
6 4000 3 2
7 4000 2 2
8 4000 4 1
9 5000 2 8
10 5000 1 7
11 5000 3 4
12 5000 4 1
The final print statement displays
Key and Value with greatest Sum 3000 1 20
It has occurred to me that, what with the way the file is
arranged, the largest sum could have been determined in the Update
function. However, this wouldn't work
if the file wasn't completely ordered by key.
That is, while parsing the data of the file, a particular key could
appear to have the greatest value sum when a change of key occurred. But another key might actually have a
greater number of instances of a value if, after a change from one key to
another and its value and sum were recorded, the key again appeared in the file
later on. Then, since all the keys and
values hadn't been maintained, new counts would be done for it. There would be no way of knowing if the new
counts added to the previous counts would have been greater than for the
intervening key and value that appeared to have the greatest number of
instances since the previous key, value and sum hadn't been retained. The way the application has been implemented
(as well as the previous two applications) prevents that by maintaining all the
possible combinations and then finding the one with the greatest sum when the
file has been completely parsed.
No comments:
Post a Comment