The Empty Collection Conundrum: Are All Elements the Same When There Are None?

The Problem at Hand

Imagine you’re writing a function to check if all items in a list are the same. It might look something like this:

boolean allSame(List<?> items) {
    if (items.isEmpty()) {
        return ?;  // This is our puzzle
    }
    
    Object first = items.get(0);
    for (Object item : items) {
        if (item != null && !first.equals(item)) {
            return false;
        }
    }
    return true;
}

This function works fine for non-empty lists, but what about empty ones? Should we say all elements are the same (return true), or that they’re not (return false)?

The “King of France” Paradox

This situation is reminiscent of the classic logical puzzle: “Is the current king of France bald?” The trick is, France doesn’t have a king! So how can we answer?

In logic, we often treat such statements as vacuously true. Why? Because if we say “All kings of France are bald,” we’re not actually making any false statements - there are no kings of France to be not bald!

Mathematical Thinking

Let’s get a bit mathy for a moment. If we call our function f, and a list L, we can say:

 If f(L) = true, then f(L - {x}) = true, for any x in L

In plain English: If all elements in a list are the same, removing any element shouldn’t change that fact.

Now, if we keep removing elements, we’ll eventually end up with an empty list. Following our logic, f([]) should be true.

Why This Matters

Choosing to return true for empty lists isn’t just philosophical navel-gazing. It has practical benefits:

  • Consistency: Our function behaves predictably for all lists, even empty ones.
  • Mathematical soundness: It aligns with principles of logic and set theory.
  • Easier testing: It gives us a clear expectation for the empty list case.

Putting It All Together

Here’s how our final function might look:

boolean allSame(List<?> items) {
    if (items.isEmpty()) {
        return true;  // Vacuously true
    }
    
    Object first = items.get(0);
    for (Object item : items) {
        if (item != null && !first.equals(item)) {
            return false;
        }
    }
    return true;
}

Wrapping Up

When designing functions that work with collections, it’s crucial to consider edge cases like empty lists. By applying logical principles, we can create functions that are not only practical but also mathematically sound.

Remember to document your reasoning. A comment like “Returns true for empty lists (vacuously true)” can save future developers (including yourself!) a lot of head-scratching.

Таблица тайских согласных с IPA транслитерацией и русскими названиями букв

Исходники по адресу https://github.com/vasily-kartashov/thai-consonants, сама таблица выглядит вот так:

Thai Consonants

Result Set Processor

When it comes to fetching data from a database, there’s often a dilemma of whether to join tables and retrieve all the data in one go, or fetch only the necessary data for the moment and risk N+1 query problems. Let’s assume we have a list of controllers, each with a set of sensors attached. The result should look something like:

[
    {
        "controllerId": "c001",
        "sensors": []
    },
    {
        "controllerId": "c002",
        "sensors": [
            {
                "sensorId": "s001",
                "active": false
            }
        ]
    }
]

One solution is to use the following SQL query:

SELECT controllers.id AS controller_id,
       sensors.id AS sensor_id
       sensors.active
  FROM controllers
       JOIN sensors
         ON contoller.id = sensors.controller_id;

$controller = null;
foreach ($tupes as $tuple) {
    if ($tuple.controller != $controller) {
        $controller = $tuple.controller;
    }
    $controller.append($tuple.sensor);
}

However, this can become tedious and complex for more advanced use cases. Another solution is to use a foreach loop and fetch data for each controller individually:

foreach (SELECT controllers.id FROM controllers => $id) {
    SELECT * FROM sensors WHERE sensors.controller_id = $id;
}

This second solution also has its downsides and can lead to N+1 query problems.

In this article, the author presents a way to address this dilemma in PHP by using the concept of processing contexts. The basic idea is to define a ProcessingContext class that keeps track of a set of records and provides methods for manipulating and extracting those records.

The job of processing result sets is a fascinating area to explore. The core data structure is a set of records:

abstract class ProcessingContext
{
    private $records;
}

When there is no information about the structure of these records, the only meaningful operations for this context are collectAll and collectHead. This context can be called a Collector:

class Collector extends ProcessingContext
{
    public function collectAll(): array ...

    public function collectHead() ...

}

Once we know that the $records are associative arrays, we can do more interesting things with them. However, whatever we want to do with these records, it should always start with picking a subset of each record for further processing. The next context is the Selector. Even though knowing that the records are associative arrays allows us to add more operations to the context, we can still do what the Collector does i.e. collectHead and collectAll:

class Selector extends Collector
{
    public function selectValue(string $field): Converter ...

    public function selectFields(strings... $fields): Converter ...

    public function map(string $keyField, string $valueField): MapConverter ...
}

What is a Converter or a MapConverter? A selector allows you to pick fields from each record and place them in some sort of structure. For example, selectValue lets you pick a value of a field and store it as a scalar, selectFields lets you fetch an embedded associative array from each record, and map lets you create a new key/value pair from the values of two fields. The Converter is the context in which the API user must decide what to do with the selected subrecord.

class Converter extends ProcessingContext
{
    public function name(string $name): Selector ...

    public function group(): Collector ...

    public function groupInto(string $name): Selector ...

    ...
}

class MapConverter extends Converter
{
    public function flatten(): Collector ...

    public function flattenInto(string $name): Selector ...

    ...
}

So the name method returns the subrecord back into the record it was extracted from under a new name. The group method groups subrecords by using the remainder of each record as a group key. It does not return the group back into the record, so the result of group is actually a collector, i.e. the records are the groups extracted by the selector. The groupInto method not only groups subrecords but also pushes the groups back into the record.

I understand that the example provided may be complex and difficult to follow. Here is how I would simplify the example join above:

$query = "
    SELECT controllers.id,
           sensors.id AS sensor_id
           sensors.active AS sensor_active
      FROM controllers
           JOIN sensors
             ON contoller.id = sensors.controller_id;
";
$procedure = $database->prepare($query);
$procedure->processAll()
    ->selectByPrefix('sensor_')->group('sensors')
    ->collectAll();

The records would look like this:

| id   | sensor_id | sensor_active |
+------+-----------+---------------+
| c001 | NULL      | NULL          |
| c002 | s001      | false         |

Then, we select by prefix and group them into a record called sensors:

| id   | sensors                       |
+------+-------------------------------+
| c001 | []                            |
| c002 | [{ id: s001, active: false }] |

And that’s it! If you’d like oto see a working example, you can check out the example imeplementation and some unit tests for further reference.

Solving Codility exercises in shell script

This article examines the process of solving Codility programming challenges using Bash scripting. While Bash may not be the most intuitive choice for algorithmic problems, it offers an interesting perspective on problem-solving and can help improve one’s command of shell scripting.

The first exercise is the binary gap. Main job is done by bc calculator. Take away from this is the horrible pattern for dealing with trailing characters

function convert {
  # convert to binary
  local b=$( echo "obase=2;$1" | bc )

  # remove trailing spaces
  local c="${b%"${b##*[!0]}"}"

  # split by 1s and run high water mark algorithm
  local i r=0
  for i in ${c//1/ } ; do
    l=${#i} && (( l > r )) && (( r=l ))
  done

  echo "$r"
}

convert 20
convert 1041
convert 2043403647

The second exercise is quite a bit of pain. Passing arrays into and from functions is complex. I unwrap the values when calling the function and collect it back after shifting away the first argument. Clearly it won’t work for functions that accept multiple arrays as arguments. The algorithm by itself is trivial though.

function rotate {

  # get the first element and shift as array is passed 'unwrapped'
  local k="${1}"
  shift
  # wrap back the array and get its length
  local a=("${@}")
  local n="${#i[@]}"

  # calculate the split, swap sides
  (( k = n - k % n ))
  local p=("${a[@]:0:k}")
  local q=("${a[@]:k}")
  local r=("${q[@]}" "${p[@]}")

  echo "${r[@]}"
}

i=(3 8 9 7 6)
rotate 3 "${i[@]}"

i=("dude" "there" "you go")
rotate 2 "${i[@]}"

The third exercise is trivial once you know the trick that folding arrays with xor gives you the xor fold of unpaired elements. The logic translates directly into shell script.

function pair {
  local i r=0
  for i in "$@" ; do
    (( r ^= i ))
  done
  echo "$r"
}

pair 9 3 9 3 9 7 9

The Frog Jump Exercise is another application of (( ... )) magic. Very little is learned except that local x=1 y=$x will not work in bash, and to see those kind of errors always run your bash scripts with set -o nounset.

function jump {
  local a=$1 b=$2 c=$3
  local d=$(( b - a ))
  local j=$(( d / c ))
  (( j * c == d )) || (( j += 1 ))
  echo "$j"
}

jump 10 85 30

The Tape Equilibrium Exercise is where we discover that we don’t have abs function in shell (in korn shell we do though).

function balance {
  local b=0 a=0 i d

  # compute the total weight of the bar
  for i in "$@" ; do
    (( b += i ))
  done

  # set max to unbalanced (0 -- all) distribution
  local m=$b
  for i in "$@" ; do

    # move a piece from left to right
    (( a += i )) && (( b -= i ))

    #  compute the absolute difference
    (( b > a )) && (( d = b - a )) || (( d = a - b ))

    # update low water mark
    (( d < m )) && (( m = d ))
  done
  echo "$m"
}

balance 3 1 2 4 3

To solve Permutation Missing Element Exercise we can use the fact that the sum of arithmetic progression is a closed form expression and can be computed in O(1). If we compare this theoretical sum to the actual sum of the array the difference will give us the missing argument.

function missing {
  local s=0 n="$#" i

  for i in "$@" ; do
    (( s += i ))
  done
  echo $(( (n + 2) * (n + 1) / 2 - s ))
}

missing 2 3 1 5

The Frog River One kind of pushes us towards using proper ADTs, in this case hash maps. I cheated a bit here storing numbers in a : separated string a. As this datatype doesn’t track the size of our ‘list’, we better store the size in an additional state variable w.

function wait {
  # get the width of the river
  local x=$1
  shift

  # a is a list :1:6:7: with O(n) for membership check
  local i a=':' d c=0 b w=0
  for i in "$@" ; do
    if [[ "$a" != *":${i}:"* ]] ; then
      # add item to list
      a="${a}${i}:"
      (( w += 1 ))
      (( w == x )) && { echo "$c" ; return ; }
    fi
    (( c += 1 ))
  done
}

wait 5 1 3 1 4 2 3 5 4

Another Permutations Exercise, another cheat :). I just sort the values, which is O(n * log n) operation and check if the sorted order contains gaps. It’s not as fast as O(n) but definitely better than naive repeated linear scanning which is O(n * n).

function permchk {
  local i local a=$( echo "$@" | tr " " "\n" | sort | tr "\n" " " ) c=1
  for i in $a ; do
    (( c == i )) && (( c += 1 )) || { echo false ; return ; }
  done
  echo true
}

permchk 4 3 1 2
permchk 4 1 3

The next Missing Integer Exercise is pretty much the variation to the theme, just add uniqueness to sort -u to skip over duplicates

function findmiss {
  local i local a=$( echo "$@" | tr " " "\n" | sort -u | tr "\n" " " ) c=1
  for i in $a ; do
    (( c == i )) && (( c += 1 )) || { echo $c ; return ; }
  done
}

findmiss 1 3 6 4 1 2

So far the ugliest solution for Max Counts Exercise where I try to use fixed length delimited string as poor man’s array. The thorny path led me to few realizations. Firstly, {1..n} expansions don’t “understand” variables therefore {1..$n} won’t work. Secondly, there’s no simple way to repeat a string multiple times like "hello" * 5 in Python. I had to write a function instead. Thirdly, it’s not so easy to work with strings as arrays as constant padding and trimming is not fun and adds significant clutter, so I probably gonna pass on this “data structure” in the future.

readonly W=5

# repeat string $1 $2 times
function rep {
  local a="" i
  for (( i = 0 ; i < $2 ; i++ )) ; do
    a="$a$1"
  done
  echo "$a"
}

# pad the value with spaces from left
function pad {
  printf "%${W}s" $1
}

function count {
  local n=$1
  local m=$( pad 0 )
  # initialize array "....0....0....0....0"
  local a=$( rep "$m" $n )
  shift

  local i v w b c
  for i in "$@" ; do
    if (( i > n )) ; then
      # refill with max
      a=$( rep "$m" $n )
    else
      (( b = (i - 1) * W ))
      (( c = i * W ))
      # the ugly bit, update the counter, and pad it again
      v=$( pad $(( ${a:b:W} + 1 )) )
      (( v > m )) && m=$v
      a="${a:0:b}${v}${a:c}"
    fi
    echo "$a"
  done
}

count 5 3 4 4 6 1 4 4

Following was another Counting Exercise. The solution is trivial, adding the code for completeness’ sake.

function pair {
  local i s=0 r=0
  for i in "$@" ; do
    (( s += i ))
  done
  for i in "$@" ; do
    (( i == 0 )) && (( r += s )) || (( s-- ))
  done
  echo "$r"
}

pair 0 1 0 1 1

After that a simple exercise in rounding up and down. Move the a right to the nearest number divisible by c, move the b left to the nearest number divisible by c. Compute the number of intervals fitting in between and add one to count the dots.

function count {
  local a=$1 b=$2 c=$3 n=0
  (( a % c == 0 )) || (( a = (a / c + 1) * c ))
  (( b = (b / c) * c ))
  echo $(( (b - a) / c + 1 ))
}

count 6 11 2 # 6 8 10
count 6 10 2 # 6 8 10
count 7 12 3 # 9 12

At one point I’ll need to start using proper data types such as hash map. Well, maybe next time. For Genomic Range Query I’ve just created 4 arrays for folds, which will do for now, but are clearly very verbose. Learnings from this exercise is that looping through characters in a sting needs a a function call to scan, array concatenation +=(1) is very different from +=1, where the second expression just quietly promotes (demotes?) array to string, and that && || && || chains don’t always work for long statements with arithmetic context (( )). I will need to explore more on the last point.

function nucleo {
  local n=$1
  shift

  local a=0 c=0 g=0 t=0
  declare -a as=(0) cs=(0) gs=(0) ts=(0)
  for i in $( echo "$n" | fold -w1 ) ; do
    case "$i" in
      A) (( a++ )) ;;
      C) (( c++ )) ;;
      G) (( g++ )) ;;
      T) (( t++ )) ;;
    esac
    as+=($a) ; cs+=($c) ; gs+=($g) ; ts+=($t)
  done

  declare -a b vs
  for i in "$@" ; do
    b=( ${i//:/ } )
    p=${b[0]}
    q=$(( ${b[1]} + 1 ))
    if   (( as[q] - as[p] > 0 )) ; then vs+=(1)
    elif (( cs[q] - cs[p] > 0 )) ; then vs+=(2)
    elif (( gs[q] - gs[p] > 0 )) ; then vs+=(3)
    else                                vs+=(4)
    fi
  done
  echo "${vs[@]}"
}

nucleo CAGCCTA 2:4 5:5 0:6

By the way we probably don’t need ts as we infer existence of Ts indirectly in the else branch.

Next exercise is as boring as the rest of summation exercises. The jobs is greatly obscured by the syntax. Learnings from this one are none, except I am getting less and less gotchas with bash arrays.

function minavg {
  # scan, use s as threshold
  declare a=(0)
  local i s=0
  for i in "$@" ; do
    (( s+= i ))
    a+=($s)
  done

  # do not divide, work with the average multiplied by 6
  local v r
  for (( i = 2 ; i < ${#a[@]} ; i++ )) {
    (( v = (a[i] - a[i - 2]) * 3 ))
    (( v < s )) && (( s = v )) && (( r = i - 2 ))
    if (( i > 2 )) ; then
      (( v = (a[i] - a[i - 3]) * 2 ))
      (( v < s )) && (( s = v )) && (( r = i - 3 ))
    fi
  }
  echo "$r"
}

minavg 4 2 2 5 1 5 8

The triangle exercise is actually the first one making bash shine … a bit. Making an array out of an input string and sort it numerically in one line is not bad at all. Took me literally 2 minutes to solve it, especially because they’ve put O(n log n) into expected complexity. As one of my professors used to say if your math teacher asks you ‘which number’ the right answer is almost always “0”, sometimes “1”.

function triangle {
  local i v
  declare -a t r=()
  for i in "$@" ; do
    t=($( echo "$i" | tr ":" "\n" | sort -n | tr "\n" " " ))
    r+=($(( t[0] + t[1] > t[2] )))
  done
  echo "${r[@]}"
}

triangle 10:2:5 1:8:20 1:1:1

Followed up by another short exercise leveraging sort -u. Nothing learned, except maybe that uniq compares adjacent lines only, so there’s no way to avoid explicit sorting.

function distcount {
  declare -a v=($( echo "$@" | tr " " "\n" | sort -u | tr "\n" " " ))
  echo "${#v[@]}"
}

distcount 2 1 1 2 3 1

And another one aiming at the same “skill”.

function maxprod {
  declare -a i=($( echo "$@" | tr " " "\n" | sort -n | tr "\n" " " ))
  local a b n=${#i[@]}
  (( a = i[0] * i[1] * i[n - 1] ))
  (( b = i[n -3] * i[n - 2] * i[n - 1] ))
  (( a > b)) && echo "$a" || echo "$b"
}

maxprod -3 1 2 -2 5 6

At this point I have to admin I don’t commit nearly as many syntactic errors as before. So let’s keep going and see where it’s gonna take me.

I spent quite some time understanding the next exercise but decided to cheat a bit and take someone else’s solution. Interestingly bash really excelled in this exercise. The complete solution is not much longer than Luca’s python code. Learnings are numerous. Firstly, you can pipe the loops into construct thus avoiding storage. Secondly, pipes operate with their own namespace so assigning an outer variable from within piped code is horribly complex. Thirdly, once in pipes, stay in pipes. Still, I don’t know how to reference input array $@ from arithmetic context (( )), will have to explore it later.

function discint {
  declare -a a=("$@")
  local i b c
  for (( i = 0 ; i < ${#@} ; i++ )) ; do
    echo $(( i - a[i] )) "\t1"
    echo $(( i + a[i] )) "\t-1"
  done | grep -v "^$" | sort -k1,1n -k2,2rn | while read _ i ; do
    (( i > 0 )) && (( b += c ))
    (( c += i ))
    echo "${b}"
  done | tail -1
}

discint 1 5 2 1 4 0

And after the last high another low with the fish. Considerable amount of time is spent bending the data structures around to reduce the amount of clutter. I took a longer path to create a pipeable function. Still hasn’t figured it out how to better return values from a pipe, other than by tail -1. Another pain is adding things to an empty array. There must be a better way than checking -z "${s+_}". This experience reminded me of my time at the university where all our code was highly optimized to make it run faster by completely disregarding readability of the code and using contra-intuitive data types.

function unwrap {
  tr ' :' "\n\t" | while read -r w d ; do
    (( d == 1 )) && echo $(( w * -1 )) || echo "$w"
  done
}

function fish {
  declare -a s=()
  local i a l=0
  unwrap | while read i ; do
    # push new fish to stack be default
    a=true
    while (( s[0] < 0 )) ; do
      if (( s[0] * -1 > i )) ; then
        # our upstream fish was eaten by the top of the stack
        a=false ; break
      else
        # top of the stack was eaten, reduce count and pop
        (( l-- )) ; s=("${s[@]:1}")
      fi
    done
    # adding new fish to the stack
    if [ $a == true ] ; then
      (( l++ ))
      [ -z "${s+_}" ] && s=($i) || s=("$i" "${s[@]}")
    fi
    # echo current length
    echo "$l"
  done | tail -1
}

echo 4:0 3:1 2:0 1:0 5:0 | fish

Followed by another stack exercise. Again learnings are to use data structures that reduce the need to do operations in bash, like translating brackets to numbers with offset of 3, which lets us easily match opening and closing brackets in one operation.

function brackets {
  tr "{[(}])" "123456" | fold -w1 | while read c ; do
    if (( c < 4)) ; then
      [ -z "${s+_}" ] && s=($c) || s=($c "${s[@]}")
    else
      (( c - s[0] == 3 )) && s=("${s[@]:1}") || break
    fi
    [ -z "${s+_}" ] && echo true || echo false
  done | tail -1
}

echo "{[()()]}" | brackets
echo "([)()]" | brackets

Next one was quite interesting, as it required a bit of set-theoretic proof to bi-directionally map two sets, thus reducing the problem to a much simpler one. Learning is really just one, but important one, you can create an initializing block inside of piped coded hiding it behind some sort of [ -z "${s+_}" ] test. Getting more and more used to pipes and stuff.

function wall {
  tr " " "\n" | while read h ; do
    [ -z "${s+_}" ] && { declare -a s=() ; local n=0 c=0 ; }
    while (( n > 0 )) && (( s[n-1] > h )) ; do (( n-- )) ; done
    if (( n == 0 )) || (( s[n-1] != h )) ; then
      (( c++ ))
      s[n]=${h}
      (( n++ ))
    fi
    echo "$c"
  done | tail -1
}

echo 8 8 5 7 9 8 7 4 8 | wall

The EquiLeader is easier to solve if we recognize that if a number dominates two parts then is will dominate the complete sequence. Therefore we compute the dominating number and count it in as $s and then run through sequence checking the balances

function eqlead {
  declare -a s=($( echo "$@" | tr " " "\n" | sort | uniq -c | sort -k1,1nr | head -1 ))
  local i a=0 b=$# l=0 r="${s[0]}"

  for i in "$@" ; do
    (( a++ )) ; (( b-- ))
    if (( i == s[1] )) ; then
      (( l++ )) ; (( r-- ))
    fi
    (( l * 2 > a )) && (( r * 2 > b )) && echo $(( a - 1 ))
  done
}

eqlead 4 3 4 4 4 2

The Dominator should probably come before EquiLeader. Nothing interesting really, except tried not to loop over things explicitly and use pipes as much as possible. Still need to capture intermediate results with $() because I don’t see any nice way of branching and merging the pipes althogh this probably points in the right direction. Another surprise is that uniq -c does some unexpected padding on counter column therefore I am not sure how to cut -f1 out of results and have to go with read -r n _. By the way read -r n would capture complete lines

function dominator {
  declare -a c=($( tr ' ' "\n" \
                 | sort \
                 | uniq -c \
                 | sort -k1,1nr \
                 | while read -r n _ ; do echo "$n" ; done \
                 | tr "\n" ' ' ))
  local s=$( echo "${c[@]}" | tr ' ' '+' | bc )
  echo $(( c[0] * 2 >  s ))
}

echo 3 4 3 2 3 -1 3 3 | dominator

While looking for solving the next one I discovered a strangely popular topic among bash programmers: how to read the first and the last lines from a file. For documentation purposes { IFS= read -r a ; echo "$a" ; tail -1 ; } worked for me. The algorithm is straightforward. If the new value is below running minimum, update the minimum, otherwise check if trading at this point would increase running max profit, and update it if required.

function mprofit {
  local i m=$1 b=0
  for i in "${@:1}" ; do
    if (( i > m )) ; then
      (( i - m > b )) && (( b = i - m ))
    else
      (( m = i ))
    fi
  done
  echo "$b"
}

mprofit 23171 21011 21123 21366 21013 21367

Next task made me realize I have to write routines to reverse arrays myself. Implemented Kadane’s algorithm.

function kadane {
  declare -a a=("$@") b=(0)
  local i n=$# c=0 d=0
  for (( i = 1 ; i < n - 1 ; i++ )) ; do
    (( c + a[i] > 0 )) && (( c += a[i] )) || (( c = 0 ))
    (( d < c )) && (( d = c ))
    b+=($d)
  done
  b+=(0)
  echo "${b[@]}"
}

function rev {
  local i
  declare -a a=()
  for i in "$@" ; do [ -z "${a+_}" ] && a=($i) || a=($i "${a[@]}") ; done
  echo "${a[@]}"
}

function mdslice {
  declare -a p=($( kadane "$@" )) q=($( rev $( kadane $( rev "$@" ) )))
  local n=$# m=0 s
  for (( i = 1 ; i < n - 1 ; i++ )) {
    (( s = p[i-1] + q[i+1] ))
    (( s > m )) && (( m = s ))
  }
  echo "$m"
}

mdslice 3 2 6 -1 4 5 -1 2

Skipping the max slice sum as this is just a simpler version of the last one.

Following are a set of exercises that I expect will make me use bc a lot. The first one is about finding integer square root.

function minrec {
  local i n=$( echo "scale=0;sqrt($1)" | bc ) a=$1
  for (( i = n ; i > 0 ; i-- )) ; do
    (( a % i == 0 )) && { echo $(( 2 * (i + (a / i)) )) ; break ; }
  done
}

minrec 30

And the following Count Factors Exercise is pretty much the same loop, just a different counter and a different break condition.

function nfactors {
  local i n=$( echo "scale=0;sqrt($1)" | bc ) a=$1 r=0
  # set the counter to -1 if a is a square to not double count sqrt(a)
  (( n * n == a )) && (( r = -1 ))
  for (( i = n ; i > 0 ; i-- )) ; do
    (( a % i == 0 )) && (( r += 2 ))
  done
  echo "$r"
}

nfactors 24

For the Peaks Exercise I just took someone else’s code, as it was a fairly uninspiring problem. I’ve added comments to the logic, one interesting nuance of the algorithm is the part with integer divisions in c / i > f and c / i == f used to check if a peak falls within current group. Very nice indeed. No new bash gotchas.

function peaks {
  local n=$# i f g s c
  declare -a a=("$@") p=()

  # compute all the peaks
  for (( i = 2 ; i < n - 1; i++ )) ; do
    (( a[i-1] < a[i] )) && (( a[i] > a[i+1] )) && p+=($i)
  done

  for (( i = 1 ; i <= n ; i++ )) ; do
    # for all group sizes i that evenly divide n
    (( n % i != 0 )) && continue

    # number of peaks found f, number of groups g, success s
    f=0 ; g=$(( n / i )) ; s=1

    for c in "${p[@]}" ; do
      # overshot, breaking out
      (( c / i > f )) && { s=0 ; break ; }
      # falls into the group, increase counter
      (( c / i == f )) && (( f++ ))
    done
    # if found less than g set success status to 0
    (( f != g )) && s=0
    # super successful exit
    (( s == 1 )) && { echo "$g" ; break; }
  done
}

peaks 1 2 3 4 3 4 1 2 3 4 6 2

The count semi-primes exercise is trivial but it’s nice to play with memoization in bash. I’ve added two indices PRIMES_INDEX and PRIMES containing respectively indices of prime numbers to quickly answer a question if a number is a prime and to loop quickly through prime numbers.

declare -a PRIMES_INDEX=(0 0 1 1 0 1) PRIMES=(2 3 5)
P_CURRENT=5

function primes {
  local b=$1 f p
  while (( P_CURRENT * P_CURRENT <= b )) ; do
    (( P_CURRENT++ ))
    f=1
    for p in "${PRIMES[@]}" ; do
      (( P_CURRENT % p == 0 )) && { f=0 ; break; }
    done
    (( PRIMES_INDEX[P_CURRENT] = f ))
    (( f == 1 )) && PRIMES+=($P_CURRENT)
  done
  echo "${PRIMES[@]}"
}

function semipr {
  local i j c p

  for i in "$@" ; do
    r=(${i//:/ })
    (( c = 0 ))
    primes "${r[1]}"
    for (( j = r[0] ; j <= r[1] ; j++ )) ; do
      for p in "${PRIMES[@]}" ; do
        (( p * p > j )) && break;
        if (( j % p == 0 )) && (( PRIMES_INDEX[j / p] == 1 )) ; then
          (( c++ ))
          break
        fi
      done
    done
    echo "$c"
  done
}

semipr 1:26 4:10 16:20 12:2035

Very much the same applies for the count non divisible, it makes perfect sense to create indices during linear scan to avoid search later on. In this case we needed an index of occurences os which for every number i gives you the number of times os[i] the number i occurs in the input array. And the array cs is the number of non divisors which is uniformely set to n at the beginning, and then reduce while we loop over divisors.

function nondiv {
  declare -a cs=() os=() rs=()
  local i=0 n=$# j d

  while (( i < 2 * n )) ; do (( os[i++]=0 )) ; done
  for a in "$@" ; do (( os[a]++ )) ; done

  (( i = 0 )) ; while (( i < 2 * n )) ; do (( cs[i++] = n )) ; done

  for (( d = 1 ; d < 2 * n ; d++ )) ; do
    (( os[d] == 0 )) && continue
    for (( j = d ; j < 2 * n ; j += d )) ; do
      (( cs[j] -= os[d] ))
    done
  done

  for a in "$@" ; do
    rs+=("${cs[$a]}")
  done

  echo "${rs[@]}"
}

nondiv 3 1 2 3 6

Followed by chocolates by numbers, where you have two different velocities around circle, m and n. Let’s linearize positions. So a[i+1] = a[i] + n. To check that two linearized positions represent the same place on the circle we wrap around m, in other words steps p and q represent the same position on the circle if a[p] % m == a[q] % m or (a[p] - a[q]) % m == 0. And for the same two positions to be on the same progression we need the difference to be divisible by n as well (a[p] - a[q]) % n == 0.

function gcd {
  local a=$1 b=$2 c
  while (( a % b != 0 )) ; do
    c=$a ; a=$b ; (( b = c % b ))
  done
  echo "$b"
}

function choc {
  local n=$1 m=$2 d=$( gcd "$1" "$2" )
  echo $(( n / d ))
}

choc 10 4

The common prime divisors is interesting because of its reduce function. GCD of two numbers should be all you need in order to reduce the numbers. So given initial value a and gcd, the a / gcd should be a product of divisors of the very same gcd. So we extract common part from a / gcd and gcd and reduce a / gcd by this amount. We proceed till either the number is reduced to 1 or cannot be reduced any further.

function gcd {
  local a=$1 b=$2 c
  while (( a % b != 0 )) ; do
    c=$a ; a=$b ; (( b = c % b ))
  done
  echo "$b"
}

function reduce {
  local a=$1 g=$2 f d r=1
  # first division
  (( f = a / g ))
  # need to be reduced as f is not a divisor of g
  while (( g % f != 0 )) ; do
    # find the amount the f needs to be reduced by
    d=$( gcd "$g" "$f" )
    # check if can be reduced
    (( d == 1 )) && { r=0 ; break ; } || (( f /= d ))
  done
  echo "$r"
}

function cpd {
  local i a b g p q v
  declare -a r
  for i in "$@" ; do
    r=(${i//:/ }) ; a="${r[0]}" ; b="${r[1]}"
    g=$( gcd "$a" "$b" )
    p=$( reduce "$a" "$g" ) ; q=$( reduce "$b" "$g" )
    (( p == 1 )) && (( q == 1 )) && (( v++ ))
  done
  echo "$v"
}

cpd 15:75 10:30 3:5 22:7744 44:23232

The ladder is a counting exercise which is just implementation of Fibonacci numbers. Pretty much the only complexity is to understand that the number of paths leading to n-th rung is fib(n).

declare -a FIB=(1 1 2)
declare FIB_I=2

function fib {
  local n=$1 f
  while (( FIB_I < n )) ; do
    (( FIB_I++ ))
    (( f = FIB[FIB_I - 2] + FIB[FIB_I - 1] ))
    FIB+=($f)
  done
  echo "${FIB[$n]}"
}

function lad {
  declare -a r
  local a b i f d
  for i in "$@" ; do
    r=(${i//:/ }) ; a="${r[0]}" ; b="${r[1]}"
    f=$( fib "$a" )
    d=$( echo "2^$b" | bc )
    echo $(( f % d ))
  done
}

lad 4:3 4:2 5:4 5:3 1:1

The Fibonacci frog is actually dynamic programming exercise, which I was too lazy to implement just yet. I am traversing all possible paths instead, which is very bad. When I reach the same point c on two different paths, I am computing the value function at point c twice. Nobody should ever do that. Learnings are that writing recursive functions in bash is pain, probably even more pain than writing iterative versions with manual stack handling.

declare -a FIB=(1 2)
declare FIB_I=1

# generate fibonacci numbers up to n
function fib {
  local n=$1 f
  while (( FIB[FIB_I] <= n )) ; do
    (( FIB_I++ ))
    (( f = FIB[FIB_I - 2] + FIB[FIB_I - 1] ))
    FIB+=($f)
  done
}

function jmp {
  # c - current position
  # d - next position
  # f - iterator over fibonacci numbers
  # m - length of min path, -1 if no path from current point
  # v - the length of sub path starting from d = c + f
  local c=$1 d f m v
  shift
  # n is the length of the padded array
  local n=$#
  declare -a as=("$@")

  # check if we reached the end
  if (( c == n - 1 )) ; then
    echo "0"
  else
    m=-1
    for f in "${FIB[@]}" ; do
      # check if jumping over f is a valid point
      (( d = c + f ))
      (( d < n )) || continue
      (( as[d] == 1 )) || continue
      # find the optimal path length starting from d
      v=$( jmp $d "$@" )
      # update minimum
      if (( v >= 0 )) ; then
        if (( m == -1 )) || (( v + 1 < m )) ; then
          (( m = v + 1 ))
        fi
      fi
    done
    echo "$m"
  fi
}

function frog {
  # pre-compute fibonacci numbers
  fib "$#"
  # pad the array with terminal 1 to denote terminal position
  jmp -1 "$@" 1
}

frog 0 0 0 1 1 0 1 0 0 0 0

Enough.

Using Spring Request-Scoped Beans for Better Auditing

Multi-layered apps are all about keeping different parts separate. The front end deals with JSON and HTTP headers, the service layer works with business objects, and the data layer doesn’t even know which business methods are running.

In this post, we’ll set up a simple way to track what’s happening during a request. We’ll time our controller methods and see who’s making the request. This approach is easy to expand, but we’ll keep it simple for now.

First, let’s make a timer that will count how long methods take:

@Aspect
@Component
public class Timer {
    @Autowired
    private Auditor auditor;

    @Pointcut("execution(* com.kartashov.auditing.controllers.*.*(..))")
    public void methods() {}

    @Around("methods()")
    public Object profile(ProceedingJoinPoint point) throws Throwable {
        long start = System.currentTimeMillis();
        Object result = point.proceed();
        long time = System.currentTimeMillis() - start;
        auditor.register(point.getSignature().toShortString(), time);
        return result;
    }
}

Make sure to mark it as @Component so Spring picks it up. We also need an Auditor to keep track of what’s happening during the request:

public class Auditor {
    private final static Logger logger = LoggerFactory.getLogger(Auditor.class);
    private List<Record> records = new ArrayList<>();
    private String remoteAddress;

    void register(String method, long time) {
        records.add(new Record(method, time));
    }

    public void setRemoteAddress(String remoteAddress) {
        this.remoteAddress = remoteAddress;
    }

    @PreDestroy
    public void trace() {
        logger.info("Audit record. Request from {}\n{}",
                remoteAddress,
                records.stream().map(Record::toString).collect(Collectors.joining("\n")));
    }

    static class Record {
        // Details left out for brevity
    }
}

We’ll add the remote address in our controllers. Here’s a simple example:

@RestController
public class PingController {
    @Autowired
    private Auditor auditor;

    @RequestMapping("/")
    public String ping(HttpServletRequest request) {
        auditor.setRemoteAddress(request.getRemoteAddr());
        return Instant.now().toString();
    }
}

Lastly, we need to set up our Spring Boot app:

@SpringBootApplication
@EnableAspectJAutoProxy
@Configuration
public class Application {
    public static void main(String... args) {
        SpringApplication.run(Application.class, args);
    }

    @Bean
    @Scope(value = "request", proxyMode = ScopedProxyMode.TARGET_CLASS)
    public Auditor auditor() {
        return new Auditor();
    }
}

This does three important things:

  • Turns on AspectJ proxies
  • Makes Auditor request-scoped
  • Tells Spring to use a special proxy that creates a new Auditor for each web request

When you run the app, you’ll see something like this:

2016-06-17 13:23:49.440  INFO 6738 : Audit record. Request from 0:0:0:0:0:0:0:1
PingController.ping(..): 24ms

You can build on this idea to do more complex auditing. The key point is that by using request-scoped Auditors, we keep our auditing separate from our main business logic.

Getting started with Spring Data Envers

At some point of every company’s adult life it starts reassessing the sanity and quality of its data set. One of the ways to regain the trust is to introduce auditing and versioning of database entities, which is exactly what Hibernate Envers is for.

As far as Envers is concerned all you need to do is to sparkle a couple of annotations around

@Entity
@Table(name = "users")
@Audited
@AuditTable("users_audit")
public class User {
    ...
}

The majority of interesting features are hidden behind the scenes, including creation of the revision tracking table and users_audit table as well as storing the audit records when you write the data to the database. In a certain way Envers’ unique selling proposition is to be as transparent as possible while adding audit storing capabilities. While adding auditing you’re not required to change your database schema.

The hiding complex stuff behind a facade has implications. It’s quite painful to fetch the objects back efficiently, and may require legwork in worst case scenario. On a positive side the default simple setup is quite trivial.

Project setup

Let’s start with a simple application that has a set of users and these users can send invitations. When an invitation is being sent we want to capture the name, email of the inviter and the invitation email. If the user at some point changes her name or the email we still want to be able to get the original inviter data.

We’ll need the following dependencies for our spring boot project

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-envers</artifactId>
</dependency>
<dependency>
    <groupId>org.aspectj</groupId>
    <artifactId>aspectjweaver</artifactId>
</dependency>

For some odd reason, you’ll need to add org.aspectj:aspectjweaver dependency to the project.

We’ll start with the User entity

@Entity
@Table(name = "users")
@Audited
@AuditTable("users_audit")
public class User {
    @Id
    @GeneratedValue
    private Long id;

    private String firstName, lastName, email;
}

For events we would like to be a tiny bit more flexible anticipating a huge set of different events ahead of us.

@Entity
@Table(name = "events")
@Inheritance(strategy = InheritanceType.JOINED)
@Audited
public abstract class Event {
    @Id
    @GeneratedValue
    private Long id;
}

@Entity
@Table(name = "invitation_events")
@Audited
@AuditTable("invitation_events_audit")
public class InvitationEvent extends Event {
    @ManyToOne
    private User inviter;
    private String invitationEmail;
}

Envers didn’t like InheritanceType.TABLE_PER_CLASS

Now to the spring-data-envers magic. Create repositories extending RevisionRepository<E, ID, N> like following

public interface UserRepository extends
        RevisionRepository<User, Long, Integer>,
        CrudRepository<User, Long> {
}

public interface EventRepository extends
        RevisionRepository<Event, Long, Integer>,
        CrudRepository<User, Long> {
}

It’s also important to change repositoryFactoryBeanClass while enabling JPA repositories

@SpringBootApplication
@EnableJpaRepositories(repositoryFactoryBeanClass = EnversRevisionRepositoryFactoryBean.class)
public class Application {
    ...
}

You’re all set. Let’s run a simple query and see what Envers does for us:

User user = new User();
user.setEmail("old-email@tradition.test");
userRepository.save(user);

InvitationEvent event = new InvitationEvent();
event.setInviter(user);
event.setInvitationEmail("invitation-email@kartashov.com");
eventRepository.save(event);
Long eventId = event.getId();

user.setEmail("completely-new-email@tradition.test");
userRepository.save(user);

At this point if we decide just to fetch the event by ID we’ll get the link to the updated user row in table users and won’t see the email the invitation email was sent from. This is where RevisionRepository comes into play. Let’s fetch the latest revision of our event and see how the inviter’s data looked like at the time the invitation was sent.

Event event = eventRepository
        .findLastChangeRevision(eventId).getEntity();
assert event.getInviter().getEmail().equals("old-email@tradition.test");

Behind the scene

Let’s see what goes behind the scenes. For ddl-auto set to create, create-drop or update Envers would create all the auxiliary tables on application start. For our case we will end up with the following selection

Envers database schema

Let’s see what are we asking our database to do for us when we’re after the latest revision on an event

In the first query we’re fetching all the revision IDs for the specified :event_id

    SELECT events_audit.revision_id
      FROM events_audit
CROSS JOIN revinfo
     WHERE events_audit.id = :event_id
       AND events_audit.revision_id = revinfo.rev
  ORDER BY events_audit.revision_id ASC

In the next step we’re fetching all the revisions for the IDs that we just fetched, which certainly seems like a not such a smart thing to do considering that we could have done it in the first query. One possible explanation is that Envers lets you extend the revinfo entities and add fields to it, therefore prepares for worst case scenario.

SELECT revinfo.rev,
       revinfo.revtstmp
  FROM revinfo
 WHERE rev IN (:revision_ids)

In the next step we’re dealing with this beauty, and whoever feels it’s not a masterpiece needs an urgent appointment with a doctor

         SELECT events_audit.id AS id1_1_,
                events_audit.revision_id AS revision2_1_,
                events_audit.revision_type AS revision3_1_,
                invitation_events_audit.invitation_email,
                invitation_events_audit.user_id,
                CASE
                    WHEN invitation_events_audit.id IS NOT NULL THEN 1
                    WHEN events_audit.id IS NOT NULL THEN 0
                END AS clazz
           FROM events_audit
LEFT OUTER JOIN invitation_events_audit invitation_events_audit
             ON events_audit.id = invitation_events_audit.id
            AND events_audit.revision_id = invitation_events_audit.revision_id
          WHERE events_audit.revision_id = (
                SELECT MAX(ea.revision_id)
                  FROM events_audit ea
                 WHERE ea.revision_id <= ?
                   AND events_audit.id = ea.id)
            AND events_audit.revision_type <> :revision_type
            AND events_audit.id = :event_id

A lot of the clutter is caused by joined table inheritance strategy.

The last query is the lazily loaded user revision which pretty much mimics the last query only in this case we don’t have to deal with inheritance hierarchies:

SELECT users_audit.id,
       users_audit.revision_id,
       users_audit.revision_type,
       users_audit.email,
       users_audit.first_name,
       users_audit.last_name
  FROM users_audit
 WHERE users_audit.revision_id = (
       SELECT max(ua.revision_id)
         FROM users_audit ua
        WHERE ua.revision_id <= :revision_id
          AND user_audit.id = ua.id)
   AND user_audit.revision_type <> :revision_type
   AND user_audit.id = :user_id

We are looking for a revision (create or update), based on :user_id. Our target :revision_id might not have affected the user row, therefore we’re looking for max(revision_id) for this specific user, which is less than or equal to our target :revision_id.

Querying data

For querying the data you have a Criteria API which is similar to JPA Criteria API. Now let’s say we want to find all invitations that where sent from a specific email address, no matter who owns this email now and whether it’s currently used by any user in the database. The best up-to-date documentation can be found in Hibernate User Guide

In our case unfortunately a surprise is waiting for us. Innocently looking query throws UnsupportedOperationException

return AuditReaderFactory(entityManager)
        .createQuery()
        .forRevisionsOfEntity(InvitationEvent.class, true, false)
        .traverseRelation("inviter", JoinType.LEFT)
        .add(AuditEntry.property("email").eq(email))
        .getResultList();

Let’s just hope that this will be addressed soon.

Implementing Null-Safe Comparators in Java

Introduction

Sorting complex domain objects in Java can present challenges, particularly when considering multiple criteria and null values. This article explores an efficient approach to creating robust, null-safe comparators using Java 8 features.

Problem Statement

Consider a scenario where users in a client database need to be sorted based on multiple criteria:

  • City (case-insensitive)
  • Zip code
  • Last name
  • First name

Additionally, null values for any of these properties should be handled by moving the corresponding user to the end of the list.

Example Dataset

To illustrate the sorting requirements, consider the following sample dataset:

| Brisbane  | 4000 | Burke     | Jason   |
| Brisbane  | null | Appleseed | Frank   |
| melbourne | 3001 | Collins   | Lindon  |
| Melbourne | 3003 | Collins   | Grant   |
| null      | 1000 | null      | Matthew |

This dataset demonstrates various scenarios, including case differences in city names, missing zip codes, and null values for both city and last name.

Sample Data Structure

The User class might be structured as follows:

class User {
    @Nullable public String getCity() { ... }
    @Nullable public Integer getZipCode() { ... }
    @Nullable public String getFirstName() { ... }
    @Nullable public String getLastName() { ... }
}

Generalizing Comparator Composition

The process of composing multiple comparators can be generalized using the following algorithm:

result = 0
for (comparator : comparators):
    result = comparator.compare(a, b)
    if (result != 0):
        break

Implementing a Generic ComparatorBuilder

Leveraging Java 8’s functional interfaces, we can create a ComparatorBuilder class to simplify the creation of complex, null-safe comparators:

public class ComparatorBuilder<T> {

    private final List<Comparator<T>> steps = new ArrayList<>();

    public <S extends Comparable<S>>
    ComparatorBuilder<T> by(Function<T, S> property) {
        return by(property, Function.identity());
    }

    public <S extends Comparable<S>, Q>
    ComparatorBuilder<T> by(Function<T, Q> property, Function<Q, S> converter) {
        steps.add((T a1, T a2) -> {
            Q q1 = property.apply(a1);
            Q q2 = property.apply(a2);
            S s1 = q1 == null ? null : converter.apply(q1);
            S s2 = q2 == null ? null : converter.apply(q2);
            if (s1 == null) {
                return (s2 == null) ?  0 : 1;
            } else {
                return (s2 == null) ? -1 : s1.compareTo(s2);
            }
        });
        return this;
    }

    public Comparator<T> build() {
        return (T a, T b) -> {
            int result = 0;
            for (Comparator<T> step : steps) {
                result = step.compare(a, b);
                if (result != 0) {
                    break;
                }
            }
            return result;
        };
    }
}

Usage Example

To create a comparator for the User class using the ComparatorBuilder:

Comparator<User> comparator = new ComparatorBuilder<User>()
        .by(User::getCity, String::toLowerCase)
        .by(User::getZipCode)
        .by(User::getLastName)
        .by(User::getFirstName)
        .build();
Collections.sort(users, comparator);

Conclusion

This approach provides a clean, flexible, and type-safe method for creating complex comparators in Java. It effectively handles null values and allows for easy composition of multiple sorting criteria. By utilizing Java 8 features, we can create more maintainable and readable code for sorting operations.

Optimizing Database Performance with Hierarchical Caching in Spring

In large-scale applications, optimizing database performance is crucial. This article presents a sophisticated approach to enhancing read-only access and calculation efficiency using Spring’s caching mechanisms. We’ll explore how to implement a hierarchical caching strategy that significantly reduces database load and improves overall system performance.

The Challenge

Consider a system with two primary components:

  1. A Database class that retrieves time series data:
public class Database {
    public double[] load(String series) {
        ... // horribly expensive database access goes here
    }
}
  1. An Integrator class that performs calculations on this data:
public class Integrator {

    private Database database;

    public double[] run(String series) {
        double[] data = database.load(series);
        double[] result = new double[data.length];
        double sum = 0;
        for (int i = 0; i < data.length; i++) {
            sum += data[i];
            result[i] = sum;
        }
        return result;
    }
}

This basic implementation has several inefficiencies:

  • It reloads the entire time series when new data is added.
  • It recalculates the integral even when the underlying data hasn’t changed.

The Solution: Hierarchical Caching

To address these issues, we’ll implement a multi-layered caching strategy using Spring’s caching annotations.

Step 1: Basic Caching

First, we’ll add caching to both the Database and Integrator classes:

public class Database {
    @Cacheable(cacheNames = "nominal", key = "#series")
    public double[] load(String series) {
        // Implementation
    }
}

public class Integrator {
    @Cacheable(cacheNames = "integral", key = "#series")
    public double[] run(String series) {
        // Implementation
    }
}

Step 2: Coordinated Cache Management

To maintain cache consistency when new data arrives, we introduce a Repository class:

public class Repository {
    private Database database;

    @Caching(
        evict = @CacheEvict(value = "integral", key = "#series"),
        put = @CachePut(value = "nominal", key = "#series")
    )
    public double[] update(String series, double value) {
        double[] existing = database.load(series);
        double[] updated = new double[existing.length + 1];
        System.arraycopy(existing, 0, updated, 0, existing.length);
        updated[existing.length] = value;
        return updated;
    }

    @Caching(evict = {
        @CacheEvict(value = "nominal", key = "#series"),
        @CacheEvict(value = "integral", key = "#series")
    })
    public void reset(String series) {
        // Cache reset logic
    }
}

Key Design Considerations

Update Mechanism

The update method in Repository manages cache updates efficiently:

  • Retrieves existing data: It fetches the current time series from the “nominal” cache or database. Example: If the series AAPL contains [100, 101, 102], it retrieves this array.
  • Appends a new value: It creates a new array with the existing data plus the new value at the end. Example: If updating AAPL with 103, the new array becomes [100, 101, 102, 103]`.
  • Updates the “nominal” cache: It stores the new array in the cache, replacing the old one. Example: The “nominal” cache for AAPL now contains [100, 101, 102, 103].
  • Invalidates the “integral” cache: It removes the corresponding entry from the “integral” cache. Example: The cached integral for AAPL is deleted, forcing recalculation on next access.

Cache Reset

The reset method provides a way to clear cached data:

  • Clears both caches: It removes entries from both “nominal” and “integral” caches for a given series. Example: Calling reset("AAPL") removes AAPL data from both caches.
  • No database interaction: It only affects the caches, not the underlying database. Example: After reset, the next data request will trigger a fresh database load.

Separation of Concerns: This design separates different aspects of the system:

  • Functional dependencies in Java code: The actual data processing logic is in the Java methods. Example: The integration calculation in the Integrator class.
  • Caching policies in annotations: Cache behavior is defined using Spring annotations. Example: @Cacheable(cacheNames = "integral", key = "#series") on the Integrator.run() method.

Conclusion

While this hierarchical caching approach may seem complex initially, it proves highly effective in systems with multiple interdependent calculations. By leveraging Spring’s caching annotations, we can significantly reduce database load, improve response times, and create a more scalable architecture.

This strategy is particularly valuable in read-heavy systems where data updates are less frequent than read operations. It allows for efficient data access and computation while maintaining data consistency across the caching layers.

Spring data JPA and inheritance

Let’s start adding user management to our application. There are two kind of personal information that we want to capture at the moment: existing users and invitations sent out for new potential customers. Invitations and Users have a lot in common so let’s start with entities for our Spring Boot application

@Entity
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)
public abstract class Person {

    private @Id @GeneratedValue(strategy = GenerationType.TABLE) Long id;
    private String firstName;
    private String lastName;
    private String email;

    ...
}

@Entity
public class Invitation extends Person {

    ...

}

@Entity
public class User extends Person {

    ...

}

Let’s start with keeping invitations and users in completely separate tables by using InheritanceType.TABLE_PER_CLASS

To work with the set of data create a simple repository

public interface PersonRepository extends JpaRepository<Person, Long> {

    Page<Person> findAll(Pageable pageable);
}

If we now issue a simple Pageable request

Pageable pageable = new PageRequest(0, 10, Sort.Direction.ASC, "firstName", "lastName");
Page<Person> firstPage = personRepository.findAll(pageable);
for (Person person : firstPage) {
    System.out.println(person);
}

So let’s see what kind of queries the database issues in the background by placing the following application.properties in /src/test/resources

spring.jpa.show-sql = true
spring.jpa.properties.hibernate.format_sql = true

logging.level.org.hibernate.SQL = DEBUG
logging.level.org.hibernate.type.descriptor.sql.BasicBinder = TRACE

If we stay with TABLE_PER_CLASS strategy the hibernate issues the following query

SELECT person0_.id AS id1_1_,
       person0_.email AS email2_1_,
       person0_.first_name AS first_na3_1_,
       person0_.last_name AS last_nam4_1_,
       person0_.date AS date1_0_,
       person0_.clazz_ AS clazz_
  FROM ( SELECT id,
                email,
                first_name,
                last_name,
                null AS date,
                1 AS clazz_
           FROM user UNION ALL
         SELECT id,
                email,
                first_name,
                last_name,
                date,
                2 AS clazz_
           FROM invitation
       ) person0_
 ORDER BY person0_.first_name ASC,
          person0_.last_name ASC
 LIMIT ?

For SINGLE_TABLE strategy we get

SELECT person0_.id AS id2_0_,
       person0_.email AS email3_0_,
       person0_.first_name AS first_na4_0_,
       person0_.last_name AS last_nam5_0_,
       person0_.date AS date6_0_,
       person0_.dtype AS dtype1_0_
  FROM person person0_
 ORDER BY person0_.first_name ASC,
          person0_.last_name ASC
 LIMIT ?

And for the JOINED strategy we get

SELECT person0_.id AS id1_1_,
       person0_.email AS email2_1_,
       person0_.first_name AS first_na3_1_,
       person0_.last_name AS last_nam4_1_,
       person0_2_.date AS date1_0_,
       CASE
           WHEN person0_1_.id IS NOT NULL THEN 1
           WHEN person0_2_.id IS NOT NULL THEN 2
           WHEN person0_.id IS NOT NULL THEN 0
       END as clazz_
  FROM person person0_
  LEFT OUTER JOIN user person0_1_
               ON person0_.id=person0_1_.id
  LEFT OUTER JOIN invitation person0_2_
               ON person0_.id=person0_2_.id
 ORDER BY person0_.first_name ASC,
          person0_.last_name ASC
 LIMIT ?

That’s it for now

Writing a simple query language with ANTLR

Let’s assume we now have our database of IoT devices and we’d like to provide a simple search interface that would allow us to send queries like the following

location within 10 km from (-37.814, 144.963) and status.stateOfCharge < 10%

Although it might sound like an intimidatingly complex task, the fact is it’s rather simple to achive with ANTLR. Let’s begin with the query language. The following grammar will do for now.

grammar Query;

@header {
    package com.kartashov.postgis.antlr;
}

AND     : ',' | 'and' ;
OR      : 'or' ;
INT     : '-'? [0-9]+ ;
DOUBLE  : '-'? [0-9]+'.'[0-9]+ ;
WITHIN  : 'within' ;
FROM    : 'from' ;
ID      : [a-zA-Z_][a-zA-Z_0-9]* ;
STRING  :  '"' (~["])* '"' | '\'' (~['])* '\''
        {
            String s = getText();
            setText(s.substring(1, s.length() - 1));
        }
        ;
EQ      : '=' '=' ? ;
LE      : '<=' ;
GE      : '>=' ;
NE      : '!=' ;
LT      : '<' ;
GT      : '>' ;
SEP     : '.' ;
WS      : [ \t\r\n]+ -> skip ;

query : expression ;

expression
    : expression AND expression # AndExpression
    | expression OR expression  # OrExpression
    | predicate                 # PredicateExpression
    | '(' expression ')'        # BracketExpression
    ;

reference : element (SEP element)* ;

element : ID ;

predicate
    : reference WITHIN amount FROM location # LocationPredicate
    | reference operator term               # OperatorPredicate
    ;

location : '(' DOUBLE ',' DOUBLE ')' ;

term
    : reference
    | value
    | amount
    ;

operator
    : LE
    | GE
    | NE
    | LT
    | GT
    | EQ
    ;

amount : value unit ;

value
   : INT          # IntegerValue
   | DOUBLE       # DoubleValue
   | STRING       # StringValue
   | ID           # StringValue
   ;

unit :
   | '%'
   | ID
   ;

If you’re not using IntelliJ IDEA, please start doing so for this project, as the ANTLR4 plugin can help you a great deal debugging your grammars.

There are a lot of things that missing in this language, from a very relaxed handling of units of measurement all the way to not checking the type of device properties. As they loved to say at my alma mater, we leave the proof of the theorem as an exercise for a curious reader.

With the language so simple it’s actually easy to just walk down the syntactic tree and generate JPQL statement on our way down. We extend the class QueryBaseVisitor that was generated by the ANTLR maven plugin. This pattern is fairly widely used so it would be helpful to understand it better. The abstract autogenerated parent registers callbacks, and by overriding these callbacks you can 1) emit the target language constructs, 2) update the internal state of the visitor, and 3) adjust the walking algorithm by skipping certain branches of a syntactic tree.

public class QueryVisitor extends QueryBaseVisitor<String> {

    private static final GeometryFactory geometryFactory = new GeometryFactory();

    private final Map<String, Object> parameters = new HashMap<>();

    public Map<String, Object> getParameters() {
        return parameters;
    }

    private String addParameter(Object value) {
        String name = "var" + parameters.size();
        parameters.put(name, value);
        return name;
    }

    @Override
    public String visitQuery(QueryParser.QueryContext ctx) {
        return "SELECT d FROM Device AS d WHERE " + visit(ctx.expression());
    }

    @Override
    public String visitBracketExpression(QueryParser.BracketExpressionContext ctx) {
        return visit(ctx.expression());
    }

    @Override
    public String visitAndExpression(QueryParser.AndExpressionContext ctx) {
        return visit(ctx.expression(0)) + " AND " + visit(ctx.expression(1));
    }

    @Override
    public String visitPredicateExpression(QueryParser.PredicateExpressionContext ctx) {
        return visit(ctx.predicate());
    }

    @Override
    public String visitOrExpression(QueryParser.OrExpressionContext ctx) {
        return "(" + visit(ctx.expression(0)) + " OR " + visit(ctx.expression(1)) + ")";
    }

    @Override
    public String visitOperator(QueryParser.OperatorContext ctx) {
        return ctx.getText();
    }

    @Override
    public String visitIntegerValue(QueryParser.IntegerValueContext ctx) {
        return addParameter(Integer.valueOf(ctx.getText()));
    }

    @Override
    public String visitDoubleValue(QueryParser.DoubleValueContext ctx) {
        return addParameter(Double.valueOf(ctx.getText()));
    }

    @Override
    public String visitStringValue(QueryParser.StringValueContext ctx) {
        return addParameter(ctx.getText());
    }

    @Override
    public String visitAmount(QueryParser.AmountContext ctx) {
        Amount<?> amount = Amount.valueOf(ctx.getText());
        @SuppressWarnings("unchecked")
        double value = amount.doubleValue((Unit) amount.getUnit().getStandardUnit());
        return addParameter(value);
    }

    @Override
    public String visitUnit(QueryParser.UnitContext ctx) {
        return ctx.getText();
    }

    @Override
    public String visitElement(QueryParser.ElementContext ctx) {
        return ctx.getText();
    }

    @Override
    public String visitOperatorPredicate(QueryParser.OperatorPredicateContext ctx) {
        String operator = visit(ctx.operator());
        String value = visit(ctx.term());
        String reference = visitReference(ctx.reference(), parameters.get(value).getClass());
        return reference + " " + operator + " :" + value;
    }

    public String visitReference(QueryParser.ReferenceContext ctx, Class<?> type) {
        List<String> elements = ctx.element().stream()
	        .map(this::visitElement)
                .collect(Collectors.toList());
        String base = "d." + elements.get(0);
        if (elements.size() == 1) {
            return base;
        } else {
            List<String> tail = elements.subList(1, elements.size());
            String extract = "extract(" + base + ", '" + String.join("', '", tail) + "')";
            if (type == Integer.class) {
                return "CAST(" + extract + " integer)";
            } else if (type == Double.class) {
                return "CAST(" + extract + " double)";
            } else {
                return extract;
            }
        }
    }

    @Override
    public String visitLocationPredicate(QueryParser.LocationPredicateContext ctx) {
        String reference = visit(ctx.reference());
        String location = visit(ctx.location());
        String distance = visit(ctx.amount());
        return "distance(" + reference + ", :" + location + ") <= :" + distance;
    }

    @Override
    public String visitLocation(QueryParser.LocationContext ctx) {
        double latitude = Double.valueOf(ctx.latitude().getText());
        double longitude = Double.valueOf(ctx.longitude().getText());
        Point point = geometryFactory.createPoint(new Coordinate(latitude, longitude));
        point.setSRID(4326);
        return addParameter(point);
    }

    @Override
    public String visitTerm(QueryParser.TermContext ctx) {
        if (ctx.amount() != null) {
            return visit(ctx.amount());
        } else if (ctx.value() != null) {
            return visit(ctx.value());
        } else {
            return visit(ctx.reference());
        }
    }
}

Nothing here is particalarly complex or complicated. I use JScience (never ever use it) to convert units of measurement to standard units (like miles and kilometers to meters, and percentages to double numbers).

The rest of the project is boilerplate code mostly, like search service

@Service
@Transactional
public class SearchService {

    private static final Logger logger = LoggerFactory.getLogger(SearchService.class);

    @Autowired
    private EntityManager entityManager;

    @SuppressWarnings("unchecked")
    public List<Device> search(String query) throws IOException {

        logger.info("Parsing search query {}", query);

        ANTLRInputStream input = new ANTLRInputStream(
                new ByteArrayInputStream(query.getBytes(StandardCharsets.UTF_8)));
        QueryLexer lexer = new QueryLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);

        QueryParser parser = new QueryParser(tokens);
        ParseTree tree = parser.query();

        logger.info("Expression tree: {}", tree.toStringTree(parser));

        QueryVisitor visitor = new QueryVisitor();
        String jpqlQuery = visitor.visit(tree);

        logger.info("Resulting JPQL query:\n{}", jpqlQuery);

        Query queryObject = entityManager.createQuery(jpqlQuery);
        for (Map.Entry<String, Object> entry : visitor.getParameters().entrySet()) {
            queryObject.setParameter(entry.getKey(), entry.getValue());
        }
        return queryObject.getResultList();
    }
}

Now we can send queries to the database like following

List<Device> devices = searchService
        .search("location within 10 km from (-37.814, 144.963) and status.stateOfCharge < 10%");

Rather neat. You can see the complete source code on GitHub

As a warning note: you probably should not build JPQL queries like this, unless you PoC’ing like I do here. A much more sound would be using JPQ Criteria API.