Continuing on from my previous explorations that followed Learn to Code Grand Rapids “Building a Real World App in VS 2017, Part I” of August 10 I decided to try Kotlin after having attended a MeetUp presentation of this new language.

It was supposed to be an extension of Java 8 (I think it was) for Android devices and to be able to gradually go from Java code to Kotlin with them working together while gradually switching over. As a new language it was supposed to be getting frequent releases. Well I must not have downloaded a real recent release – at least I couldn't compile with part Kotlin code and part Java code (of course my Java download was back with a 3.6 release or something like that so no where near Java 8).

In my previous two Max Column Sum by Key documents I described my initial C# attempt before I had the file containing the data that was aborted for that reason, then my application using Ada as the language after getting the data file, then with C# once again using the data file, and finally with Python as a new language for me. Then, in the second document, using Java and redoing the C# application once again.

This document will describe the application in Kotlin which ended up somewhat simpler since I had to work through some problems. While trying to find a way to do a similar structure to that used in "Max Column Sum by Key (part 2)" in Java and then C# once again and failing to accomplish it in Kotlin, I gave up and then a simpler solution occurred to me.

Following that discussion will be one concerning doing the same application in C.

Discussion of the Kotlin Application

I started out following the program structure of the Java version of the application with a Key class to encapsulate the data of the key field of the file. Remember that the file being used consisted of 20 records for each of 5 different keys with a pair of values in each record associated with the key with the objective of finding the key and value that had the most instances in the file. (Each key following "who cares" leading text and the key and value fields preceded by a horizontal tab character and the original file having a line feed character following the second value. Since I am using the UltraEdit editor which likes to ask if you want to convert to a DOS format when opening a file, I happened to reply in the affirmative early on so that records got a carriage return preceding the trailing LF when I was examining the format of the file.)

In the Java application and the later C# application there was the Key class to organize the data for a key including arrays to retain the different values associated with the particular key (taking the place of a C struct structure). Then a KeyTable class to keep track of all the data with an array of the Key class to retain the data on each different key found in the file. Plus an Update function to access the two classes to build the tables.

I tried to duplicate this in Kotlin – after all it was supposed to be able to gradually convert from a Java application to Kotlin. Kotlin had array constructs where a class or object could be assigned as the type of the array. (object being something similar to a class it seemed.) But try as I might I couldn't get an array of the Key class to work as I had in Java and C#.

So I decided to use a double dimension array of the Int type. Again there was a problem since it seemed from online searches that multiple dimension arrays weren't supported although some said it had been added supplying what looked to me as a strange syntax. But, most likely not in the version I had downloaded even if I could have figured out the syntax.

After fussing with the array of the Key class once again it dawned on me that I could do a work around for a double dimension array where I made a single dimension array of the Int type contain imaginary arrays of a second dimension. In that way I could have the multiple arrays of the past Key class value data for a particular key embedded in the single dimension array – one group of the value data for each key as follows.

// The double dimension arrays are simulated as

// +-----------+-----------+-----------+-----------+-----------+-----------+

// | | | | | | |

// +-----------+-----------+-----------+-----------+-----------+-----------+

// | |

// | elements +--> boundary at end of first array

// of first

// array

// Access the double array via v + k * (first array size) since arrays begin at 0

// [0,0] computes to 0 for the single dimension array;

// [1,0] computes to 1; assuming 30 elements in the k array

// [0,1] computes to 30;

// [1,1] computes to 31

var keyCount : Int = 0 // number of different keys

var keys = IntArray(30) // value of key - one in each array position

var valueCount = IntArray(30) // number of different values for a particular key

var values = IntArray(30*20) // particular values for each key

var sums = IntArray(30*20) // sum of instances of each value for each key

Here "elements of first array" means the values associated with the first key (the k index of the illustration) with v being the value index within a particular set of elements. Thus the code only needs to compute the actual index from the k and v indexes.

After getting this worked out, the Key class no longer had any purpose and so was discarded.

Then while implementing the Kotlin version of the Update function accessing the KeyTable class functions it occurred to me that it would be much simpler to move the update function inside the KeyTable class to directly access the data and that this could be accomplished more simply than I had in the past implementations by passing in all three fields from each record at a time. This is when it occurred to me that this new KeyTable function had the same purpose as the previous Update function so I changed the name I had given it to update (following the usual C convention of making the first character of a variable or function name lower case). When the code is presented it can be seen how this version of update is simpler than the previous versions.

At the beginning I had had some problems remembering the syntax for declaring variables and function parameters and function data return declarations since it is non-C like. It took me a while to get used to it and then I realized it had some characteristics of that of Ada – that is, that the variable is named first followed by a ':' and then its type. For instance

var index : Int = 0

val HT : Byte = 9

which in Ada would be (where case doesn't matter)

index : Integer := 0;

HT : constant Byte := 9;

where the Byte type in Ada needs to be defined as an 8 bit unsigned integer.

Whereas in C (for instance) it is

int index = 0;

with the type given first.

Also Kotlin can have constants by using the keyword val instead of var. While, of course, Ada, C, etc don't have a keyword to specify when a variable is being created.

As with Java an ending ';' can be given but isn't need.

For function declarations Kotlin uses

fun name(param1 : Int, param2 : String) : Int

whereas Ada would use

function name(param1 : Integer; param2 : String) return Integer;

function name(param1 : in Integer; param2 : in String) return Integer;

where it is specified that a function can only have parameters that input data to the function. Ada also has

procedure name(param1 : in out Integer; param2 : in String, param3 : out Integer);

that doesn't have a return statement but can output data via the out keyword (with param1 in this case being both input to the procedure and output). C function declarations, of course, are of the form

int name(int param1);

void name(int param1);

without the keyword to specify a function and with the return type preceding the function name and the type of the parameters preceding the parameter name.

So, in this regard, Katlin syntax is more like Ada (and the older and deader language Pascal) with the name first and then the type.

As can be seen in the Kotlin code, I also modified the parse function to just build an array of 3 integers to capture the data fields of interest (the key and the two values) and just count the number of fields processed (including the leading don't care field) rather than use a switch type of statement.

My Kotlin code for the application is (where Java IO is imported to read the file)

import java.io.File

import java.io.InputStream

// Update structure to contain the different values for each key with the

// number of instances of each combination

class KeyTable

{

// Data tracking which value of which key has the greatest number of references

var maxKey : Int = -1 // key with the greatest number of references for a value

var maxValue : Int = -1 // value with the greatest number of references

var maxSum : Int = -1 // number of references for key, value combination

// This class retains the data for all possible keys with their associated

// values. As such it simulates double dimension arrays for the values and

// the sum of the number of instances for a particular value. The first

// simulated dimension is the same as that for the keys array; one index for

// each key. The second simulated dimension is of the various different values

// for a particular key.

// The double dimension arrays are simulated as

// +-----------+-----------+-----------+-----------+-----------+-----------+

// | | | | | | |

// +-----------+-----------+-----------+-----------+-----------+-----------+

// | |

// | elements +--> boundary at end of first array for values of first key

// of first

// key

// Access the double array via v + k * (first array size) since arrays begin at 0

// [0,0] computes to 0 for the single dimension array;

// [1,0] computes to 1; assuming 30 elements in the key array and 20 for the values

// [0,1] computes to 30 for first value of 2nd key

// [1,1] computes to 31 for second value of 2nd key

var keyCount : Int = 0 // number of different keys

var keys = IntArray(30) // value of key - one in each array position

var valueCount = IntArray(30) // number of different values for a particular key

var values = IntArray(30*20) // particular values for each key

var sums = IntArray(30*20) // sum of instances of each value for each key

var savedData = IntArray(3) // global values from call to update

fun initialize()

{

for (k: Int in 0..29)

{

keys[k] = 0

valueCount[k] = 0

}

for (kv: Int in 0..(30*20)-1)

{

values[kv] = 0

sums[kv] = 0

}

} // end initialize

// Add key and its values

fun update(data : IntArray)

{

// This function first checks if the key is new and, if so, adds it to the

// keys array. It then does similar for the values associated with it.

// Note: data[0] is the key while data[1] and data[2] are the two values

// associated with the key.

savedData = data // save data to be updated for use by updateTotals

// Check whether the key is already in the table

var keyIndex : Int = -1

for (k : Int in 0..keyCount-1)

{

if (keys[k] == data[0])

{

keyIndex = k // key already in the table

break; // exit loop

}

} // end for loop

if (keyIndex < 0) // key not in the table

{ // add the key

keyIndex = keyCount

keys[keyIndex] = data[0]

keyCount++

} // end if

// add the values for the key

addValues(keyIndex, data[1], data[2])

} // end update

// Add the values or increment their sums

fun addValues(keyIndex : Int, value1 : Int, value2 : Int)

{

// Check whether the first value is already in the table

var valueIndex : Int = -1

var index : Int = keyIndex * 30 // location of any first value (0, 30, ...)

for (v : Int in 0..valueCount[keyIndex]-1)

{

index = index + v

if (values[index] == value1)

{ // value already in the table

valueIndex = index

sums[valueIndex]++ // increment its number of references

if (value1 == value2) // 2nd value the same

{

sums[valueIndex]++ // increment again

}

updateTotals(savedData[0],value1,sums[valueIndex]) // check if new max

break; // exit loop

}

} // end loop

if (valueIndex < 0) // value not yet in table

{ // add value to the table - index points to last value checked

index = index + valueCount[keyIndex] // next location

values[index] = value1

sums[index] = 1

valueCount[keyIndex]++

if (value1 == value2)

{

sums[index]++ // increment

updateTotals(savedData[0],value1,sums[index]) // check if new max

}

else

{

add2ndValue(keyIndex, value2)

}

} // end outer if

} // end addValues

// Add the second value or increment its sum

fun add2ndValue(keyIndex : Int, value2 : Int)

{

// Check whether the second value is already in the table

var valueIndex : Int = -1

var index : Int = keyIndex * 30 // location of any second value (0, 30, ...)

for (v : Int in 0..valueCount[keyIndex]-1)

{

index = index + v

if (values[index] == value2)

{ // value already in the table

valueIndex = index

sums[valueIndex]++ // increment its number of references

updateTotals(savedData[0],value2,sums[valueIndex]) // check if new max

break; // exit loop

}

} // end loop

if (valueIndex < 0) // value not yet in table

{ // add value to the table - index points to last value checked

index++ // index to next location

values[index] = value2

sums[index] = 1

valueCount[keyIndex]++

}

} // end add2ndValue

fun updateTotals(key : Int, value : Int, sum : Int)

{

if (sum > maxSum)

{

maxKey = key

maxValue = value

maxSum = sum

}

} // end updateTotals

// Report the key and value with the most references

fun report()

{

println("Key " + maxKey + " with value " + maxValue + " with maximum references of " + maxSum)

}

} // end class KeyTable

// Instantiate KeyTable class

val keyTable = KeyTable()

// Convert Byte array to Int

fun toInt(data: ByteArray, iS : Int, iE: Int): Int

{

val NINE: Int = 57 // ASCII character for digit 9

val ZERO: Int = 48 // ASCII character for digit 0

var index: Int = iE // loop in reverse

var digit: Int

var m: Int = 1 // multiplier for shift

var number: Int = 0 // Numeric result

while (index >= iS)

{

if ((data[index] >= ZERO) && (data[index] <= NINE))

{

digit = data[index] - ZERO; // convert ASCII to digit

number = number + (m * digit);

m = m * 10;

index--;

}

return number;

} // end toInt

// Parse

fun parse(buffer: ByteArray, length: Int)

{

val CR: Byte = 13 // carriage return

val HT: Byte = 9 // horizontal tab

val LF: Byte = 10 // line feed

// Parse each line of data in the buffer to obtain the three fields of

// interest, converting those fields to integers into an array, and

// then updating a data structure to retain the data for evaluation

// when the complete buffer has been parsed.

// Also, output the contents of the file buffer when not commented out.

var nextField: Int = 0 // index of the beginning of next field

var startField: Int // range of indexes of

var numFields: Int = 0 // index into dataFields array

var dataFields = IntArray(3) // Integer values of the three fields of interest

var index: Int = 0

while (true)

{

// Parse the buffer line

if (buffer[index] == HT) // beginning of a field

{

startField = nextField // save starting index

nextField = index + 1 // the next byte will contain part of next field

if (numFields > 0)

{

dataFields[numFields-1] = toInt(buffer, startField, index - 1) //convert and store in dataFields

} // end if numFields > 0

numFields++

} // end if HT

if ((buffer[index] == CR) || (buffer[index] == LF))

{

if (numFields < 4) // only do CR or LF once

{

dataFields[numFields-1] = toInt(buffer, nextField, index - 1) //convert and store in dataFields

keyTable.update(dataFields)

numFields++

}

if (buffer[index] == LF)

{

numFields = 0

}

} // end else if

index++

if (index == length)

{

break // out of while loop

}

} // end while

} // end of parse function

// Main entry point

fun main(args: Array<String>)

{

// Open the file.

try

{

// Open the inputStream.

var inputStream: InputStream = File( "C:\\Source\\LearnToCodeGR\\max-col-sum-by-Key.tsv").inputStream()

// Read from this input stream into an array of bytes

var buffer = ByteArray(8192)

var length = inputStream.read(buffer)

keyTable.initialize() // initialize the KeyTable class

// Parse the buffer and keep track of the needed data.

parse( buffer, length)

// Report the results

keyTable.report()

} // end try

catch (e: NoSuchFileException)

{

System.out.println("File not found" + e);

}

} // end main function

Kotlin, like Java and Python, doesn’t seem to have a compiler that runs in a Windows window. So it has to be run in a Command Prompt window. I, as usual, changed the folder in the window to that where I was developing the application (c:\Source\Kotlin) so the command was just

> kotlinc learntocodegr.kc -include-runtime -d learntocodegr.jar

since LearnToCodeGR.kc was the file name that I used and kotlinc.exe is the kotlin compiler. (With the path to the binary folder added to the Windows system path.)

To run the application I used the command

> java –jar learntocodegr.jar

which result in the output in the Command Prompt window of

Key 3000 with value 1 with maximum references of 20

as output by the report function of the KeyTable class. So the use of Java shows up here.

Discussion of the C Application

Once again I had the "main" function open the max-col-sum-by-Key.tsv file and read it. However, as I have been meaning to do for a while now, instead of reading the contents of the entire file into a buffer, I used a C function to only read in a line at a time. That is, until the Line Feed character and read the file as containing characters since it is now known that the file contains lines of characters rather than, for instance, a mix with binary numeric values. This also has the advantage of being able to process a much larger file since the buffer can be sized to a much smaller value.

And I used C's struct type that I've found missing in C#, etc, etc so its use can be observed. The object of the struct type are

struct KeyData keydata[30]; // array of possible keys with their values

in the declaration of the KeyTable structure type and

struct KeyTable keyTable; // static table

for the keyTable variable.

Otherwise, the C application is a rework of the Kotlin version of the application. So the two languages can be directly compared.

The contents of the max-col-sum-by-Key.tsv file is displayed in the main function. I've left it in but normally it would either be removed or commented out. When the application is run there is a blank line after each line read from the file since each line (except the last) has its own ending new line character and then my code outputs another new line using the \n new line character of C. Of course, the horizontal tabs in the lines are only seen as blank space since HTs (\t), like the LF (\n), aren't printable characters.

#include <stdio.h>

// Data about a particular key

struct KeyData {

int key; // numeric key

int valueCount; // number of different values associated with key

int values[20]; // different values associated with key

int sums[20]; // number of references to a value

};

// Data about all the keys of the file

struct KeyTable {

int keyCount; // number of different keys in the file

struct KeyData keydata[30]; // array of possible keys with their values

};

struct KeyTable keyTable; // static table

// Initialize keyTable

void initialize()

{

keyTable.keyCount = 0;

for (int i = 0; i < 30; i++)

{

keyTable.keydata[i].valueCount = 0;

}

} // end initialize

// Data to be retained as the key and its associated value with the maximum

// number of references

int maxKey = 0;

int maxValue = 0;

int maxSum = 0;

int savedData[3];

// Report the key and value with the most references

void report()

{

printf("\n");

printf("Key %d with value %d with maximum references of %d \n", maxKey, maxValue, maxSum );

}

void updateTotals(int key, int value, int sum)

{

if (sum > maxSum)

{

maxKey = key;

maxValue = value;

maxSum = sum;

}

} // end updateTotals

// Add the second value or increment its sum

void add2ndValue(int keyIndex, int value2)

{

// Check whether the second value is already in the table

int valueIndex = -1;

for (int v = 0; v < keyTable.keydata[keyIndex].valueCount; v++)

{

if (keyTable.keydata[keyIndex].values[v] == value2)

{ // value already in the table

valueIndex = v;

keyTable.keydata[keyIndex].sums[valueIndex]++; // increment its number of references

// check if new max

updateTotals(savedData[0],value2,keyTable.keydata[keyIndex].sums[valueIndex]);

break; // exit loop

}

} // end loop

if (valueIndex < 0) // value not yet in table

{ // add value to the table

int index = keyTable.keydata[keyIndex].valueCount;

keyTable.keydata[keyIndex].values[index] = value2;

keyTable.keydata[keyIndex].sums[index] = 1;

keyTable.keydata[keyIndex].valueCount++;

}

} // end add2ndValue

// Add the values or increment their sums

void addValues(int keyIndex, int value1, int value2)

{

// Check whether the first value is already in the table

int valueIndex = -1;

for (int v = 0; v < keyTable.keydata[keyIndex].valueCount; v++)

{

if (keyTable.keydata[keyIndex].values[v] == value1)

{ // value already in the table

valueIndex = v;

keyTable.keydata[keyIndex].sums[v]++; // increment its number of references

if (value1 == value2) // 2nd value the same

{

keyTable.keydata[keyIndex].sums[v]++; // increment again

}

// check if new max

updateTotals(savedData[0],value1,keyTable.keydata[keyIndex].sums[v]);

break; // exit loop

}

} // end loop

if (valueIndex < 0) // value not yet in table

{ // add value to the table - index points to last value checked

// index = index + valueCount[keyIndex] // next location

int index = keyTable.keydata[keyIndex].valueCount;

keyTable.keydata[keyIndex].values[index] = value1;

keyTable.keydata[keyIndex].sums[index] = 1;

keyTable.keydata[keyIndex].valueCount++;

if (value1 == value2)

{

keyTable.keydata[keyIndex].sums[index]++; // increment

// check if new max

updateTotals(savedData[0],value1,keyTable.keydata[keyIndex].sums[index]);

}

else

{

add2ndValue(keyIndex, value2);

}

} // end outer if

} // end addValues

// Update keyTable with data from a parsed line

void update(int* data)

{

// This function first checks if the key is new and, if so, adds it to the

// keys array. It then does similar for the values associated with it.

// Note: data[0] is the key while data[1] and data[2] are the two values

// associated with the key.

// save data to be updated for use by updateTotals since needed by multiple functions

savedData[0] = data[0];

savedData[1] = data[1];

savedData[2] = data[2];

// Check whether the key is already in the table

int keyIndex = -1;

for (int k = 0; k < keyTable.keyCount; k++)

{

if (keyTable.keydata[k].key == data[0])

{

keyIndex = k; // key already in the table

break; // exit loop

}

} // end for loop

if (keyIndex < 0) // key not in the table

{ // add the key

keyIndex = keyTable.keyCount;

keyTable.keydata[keyIndex].key = data[0];

keyTable.keyCount++;

} // end if

// add the values for the key

addValues(keyIndex, savedData[1], savedData[2]); //data[1], data[2]);

} // end update

// Convert character array to int

int toInt(char* data, int iS, int iE)

{

#define NINE '9' //57 // ASCII character for digit 9

#define ZERO '0' //48 // ASCII character for digit 0

int index = iE; // loop in reverse

int digit;

int m = 1; // multiplier for shift

int number = 0; // Numeric result

while (index >= iS)

{

if ((data[index] >= ZERO) && (data[index] <= NINE))

{

digit = data[index] - ZERO; // convert ASCII to digit

number = number + (m * digit);

m = m * 10;

index--;

}

return number;

} // end toInt

// Extract the fields of interest from the file's line and retain the data

void parse(int count, char* data)

{

#define CR '\r' // carriage return

#define HT '\t' // horizontal tab

#define LF '\n' // line feed

#define TRUE 1

// Parse each line of data in the buffer to obtain the three fields of

// interest, converting those fields to integers into an array, and

// then updating a data structure to retain the data for evaluation

// when the complete buffer has been parsed.

int nextField = 0; // index of the beginning of next field

int startField; // range of indexes of

int numFields = 0; // index into dataFields array

int dataFields[3]; // Integer values of the three fields of interest

int index = 0;

while (TRUE)

{

// Parse the buffer line

if (data[index] == HT) // beginning of a field

{

startField = nextField; // save starting index

nextField = index + 1; // the next byte will contain part of next field

if (numFields > 0)

{ //convert and store in dataFields

dataFields[numFields-1] = toInt(data, startField, index - 1);

} // end if numFields > 0

numFields++;

} // end if HT

if ((data[index] == CR) || (data[index] == LF))

{

if (numFields < 4) // only do CR or LF once

{ //convert and store in dataFields

dataFields[numFields-1] = toInt(data, nextField, index - 1);

// save the data to determine the key, value combination with

// maximum number of references

update(dataFields);

return; // finished with the line in the

}

} // end if

index++;

if (index == count)

{

break; // out of while loop

}

} // end while

} // end parse

void main(argc, argv)

int argc;

char* argv[];

{

FILE* file;

char buffer[30];

file = fopen("C:\\Source\\LearnToCodeGR\\max-col-sum-by-Key.tsv", "r");

int currPos;

int lastPos = 0;

int length;

initialize();

if(file == NULL)

{

perror("Error opening file");

}

else

{ // Read file records until end-of-file

while (!feof(file))

{

fgets(buffer, 30, (FILE*)file);

currPos = ftell(file);

length = currPos - lastPos; // number of characters read into buffer

lastPos = currPos;

for (int i=0; i < length; i++)

printf("%c ", buffer[i]);

printf("\n");

// Parse the line and update to retain all the necessary data.

parse(length, buffer);

}

fclose(file);

// Report the results.

report();

}

} // end main

my head stuck in the computer

Saturday, September 30, 2017

Max Column Sum by Key (part 3)

Discussion of the Kotlin Application

Discussion of the C Application

No comments:

About Me