Using Spring request scoped beans for auditing

Every multi-layerd application is built on idea of separation of concerns. The presentational layer knows about raw JSON payload and HTTP headers, but the service layer would only know about the domain objects, whereas the underlying data access layer wouldn’t even know about which business methods the current transaction is spanning.

So for this post let’s assume that we want a tracing context during the whole execution. We want to timer all the methods in our controllers and see who sent the request. The approach is easily extensible, so we’ll keep the example to bare minimum.

Let’s create a timer aspect that will count execution time

@Aspect
@Component
public class Timer {

    @Autowired
    private Auditor auditor;

    @Pointcut("execution(* com.kartashov.auditing.controllers.*.*(..))")
    public void methods() {}

    @Around("methods()")
    public Object profile(ProceedingJoinPoint point) throws Throwable {
        long start = System.currentTimeMillis();
        Object result = point.proceed();
        long time = System.currentTimeMillis() - start;
        auditor.register(point.getSignature().toShortString(), time);
        return result;
    }
}

Make sure the aspect is marked as @Component otherwise Spring will not pick it up. We also need to create an Auditor who would store the execution context for the duration of request.

public class Auditor {

    private final static Logger logger = LoggerFactory.getLogger(Auditor.class);

    private List<Record> records = new ArrayList<>();
    private String remoteAddress;

    void register(String method, long time) {
        records.add(new Record(method, time));
    }

    public void setRemoteAddress(String remoteAddress) {
        this.remoteAddress = remoteAddress;
    }

    @PreDestroy
    public void trace() {
        logger.info("Audit record. Request from {}\n{}",
                remoteAddress,
                records.stream().map(Record::toString).collect(Collectors.joining("\n")));
    }

    static class Record {
        ...
    }
}

We also need to report remote address from our controllers. For the sake of exposition I will add remote address manually and not use AspectJ for this.

@RestController
public class PingController {

    @Autowired
    private Auditor auditor;

    @RequestMapping("/")
    public String ping(HttpServletRequest request) {
        auditor.setRemoteAddress(request.getRemoteAddr());
        return Instant.now().toString();
    }
}

And the final thing we need to explain to our Spring boot application what we’re up to:

  • Enable AspectJ proxies
  • Make Auditor request scoped
  • Make sure that Sprign autowires not the object but a scoped proxy that would create separate instances for every web request
@SpringBootApplication
@EnableAspectJAutoProxy
@Configuration
public class Application {

    public static void main(String... args) {
        SpringApplication.run(Application.class, args);
    }

    @Bean
    @Scope(value = "request", proxyMode = ScopedProxyMode.TARGET_CLASS)
    public Auditor auditor() {
        return new Auditor();
    }
}

We’re all set. Running the application will give us the following trace

2016-06-17 13:23:49.440  INFO 6738 : Audit record. Request from 0:0:0:0:0:0:0:1
PingController.ping(..): 24ms

This approach can be further extended with factories that would produce different request scoped Auditors for different situations. Important point to take here is that by autowiring request scoped Auditors we are keeping cross-cutting auditing aspect mostly separate from our business logic.

Getting started with Spring Data Envers

At some point of every company’s adult life it starts reassessing the sanity and quality of its data set. One of the ways to regain the trust is to introduce auditing and versioning of database entities, which is exactly what Hibernate Envers is for.

As far as Envers is concerned all you need to do is to sparkle a couple of annotations around

@Entity
@Table(name = "users")
@Audited
@AuditTable("users_audit")
public class User {
    ...
}

The majority of interesting features are hidden behind the scenes, including creation of the revision tracking table and users_audit table as well as storing the audit records when you write the data to the database. In a certain way Envers’ unique selling proposition is to be as transparent as possible while adding audit storing capabilities. While adding auditing you’re not required to change your database schema.

The hiding complex stuff behind a facade has implications. It’s quite painful to fetch the objects back efficiently, and may require legwork in worst case scenario. On a positive side the default simple setup is quite trivial.

Project setup

Let’s start with a simple application that has a set of users and these users can send invitations. When an invitation is being sent we want to capture the name, email of the inviter and the invitation email. If the user at some point changes her name or the email we still want to be able to get the original inviter data.

We’ll need the following dependencies for our spring boot project

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-envers</artifactId>
</dependency>
<dependency>
    <groupId>org.aspectj</groupId>
    <artifactId>aspectjweaver</artifactId>
</dependency>

For some odd reason, you’ll need to add org.aspectj:aspectjweaver dependency to the project.

We’ll start with the User entity

@Entity
@Table(name = "users")
@Audited
@AuditTable("users_audit")
public class User {
    @Id
    @GeneratedValue
    private Long id;

    private String firstName, lastName, email;
}

For events we would like to be a tiny bit more flexible anticipating a huge set of different events ahead of us.

@Entity
@Table(name = "events")
@Inheritance(strategy = InheritanceType.JOINED)
@Audited
public abstract class Event {
    @Id
    @GeneratedValue
    private Long id;
}

@Entity
@Table(name = "invitation_events")
@Audited
@AuditTable("invitation_events_audit")
public class InvitationEvent extends Event {
    @ManyToOne
    private User inviter;
    private String invitationEmail;
}

Envers didn’t like InheritanceType.TABLE_PER_CLASS

Now to the spring-data-envers magic. Create repositories extending RevisionRepository<E, ID, N> like following

public interface UserRepository extends
        RevisionRepository<User, Long, Integer>,
        CrudRepository<User, Long> {
}

public interface EventRepository extends
        RevisionRepository<Event, Long, Integer>,
        CrudRepository<User, Long> {
}

It’s also important to change repositoryFactoryBeanClass while enabling JPA repositories

@SpringBootApplication
@EnableJpaRepositories(repositoryFactoryBeanClass = EnversRevisionRepositoryFactoryBean.class)
public class Application {
    ...
}

You’re all set. Let’s run a simple query and see what Envers does for us:

User user = new User();
user.setEmail("old-email@tradition.test");
userRepository.save(user);

InvitationEvent event = new InvitationEvent();
event.setInviter(user);
event.setInvitationEmail("invitation-email@kartashov.com");
eventRepository.save(event);
Long eventId = event.getId();

user.setEmail("completely-new-email@tradition.test");
userRepository.save(user);

At this point if we decide just to fetch the event by ID we’ll get the link to the updated user row in table users and won’t see the email the invitation email was sent from. This is where RevisionRepository comes into play. Let’s fetch the latest revision of our event and see how the inviter’s data looked like at the time the invitation was sent.

Event event = eventRepository
        .findLastChangeRevision(eventId).getEntity();
assert event.getInviter().getEmail().equals("old-email@tradition.test");

Behind the scene

Let’s see what goes behind the scenes. For ddl-auto set to create, create-drop or update Envers would create all the auxiliary tables on application start. For our case we will end up with the following selection

Envers database schema

Let’s see what are we asking our database to do for us when we’re after the latest revision on an event

In the first query we’re fetching all the revision IDs for the specified :event_id

    SELECT events_audit.revision_id
      FROM events_audit
CROSS JOIN revinfo
     WHERE events_audit.id = :event_id
       AND events_audit.revision_id = revinfo.rev
  ORDER BY events_audit.revision_id ASC

In the next step we’re fetching all the revisions for the IDs that we just fetched, which certainly seems like a not such a smart thing to do considering that we could have done it in the first query. One possible explanation is that Envers lets you extend the revinfo entities and add fields to it, therefore prepares for worst case scenario.

SELECT revinfo.rev,
       revinfo.revtstmp
  FROM revinfo
 WHERE rev IN (:revision_ids)

In the next step we’re dealing with this beauty, and whoever feels it’s not a masterpiece needs an urgent appointment with a doctor

         SELECT events_audit.id AS id1_1_,
                events_audit.revision_id AS revision2_1_,
                events_audit.revision_type AS revision3_1_,
                invitation_events_audit.invitation_email,
                invitation_events_audit.user_id,
                CASE
                    WHEN invitation_events_audit.id IS NOT NULL THEN 1
                    WHEN events_audit.id IS NOT NULL THEN 0
                END AS clazz
           FROM events_audit
LEFT OUTER JOIN invitation_events_audit invitation_events_audit
             ON events_audit.id = invitation_events_audit.id
            AND events_audit.revision_id = invitation_events_audit.revision_id
          WHERE events_audit.revision_id = (
                SELECT MAX(ea.revision_id)
                  FROM events_audit ea
                 WHERE ea.revision_id <= ?
                   AND events_audit.id = ea.id)
            AND events_audit.revision_type <> :revision_type
            AND events_audit.id = :event_id

A lot of the clutter is caused by joined table inheritance strategy.

The last query is the lazily loaded user revision which pretty much mimics the last query only in this case we don’t have to deal with inheritance hierarchies:

SELECT users_audit.id,
       users_audit.revision_id,
       users_audit.revision_type,
       users_audit.email,
       users_audit.first_name,
       users_audit.last_name
  FROM users_audit
 WHERE users_audit.revision_id = (
       SELECT max(ua.revision_id)
         FROM users_audit ua
        WHERE ua.revision_id <= :revision_id
          AND user_audit.id = ua.id)
   AND user_audit.revision_type <> :revision_type
   AND user_audit.id = :user_id

We are looking for a revision (create or update), based on :user_id. Our target :revision_id might not have affected the user row, therefore we’re looking for max(revision_id) for this specific user, which is less than or equal to our target :revision_id.

Querying data

For querying the data you have a Criteria API which is similar to JPA Criteria API. Now let’s say we want to find all invitations that where sent from a specific email address, no matter who owns this email now and whether it’s currently used by any user in the database. The best up-to-date documentation can be found in Hibernate User Guide

In our case unfortunately a surprise is waiting for us. Innocently looking query throws UnsupportedOperationException

return AuditReaderFactory(entityManager)
        .createQuery()
        .forRevisionsOfEntity(InvitationEvent.class, true, false)
        .traverseRelation("inviter", JoinType.LEFT)
        .add(AuditEntry.property("email").eq(email))
        .getResultList();

Let’s just hope that this will be addressed soon.

Creating null-safe comparators in Java

Sorting domain objects can be challenging because it may require significant effort to consider all edge cases and produce a robust sorting order that meets business requirements. For example, consider a client database where users are sorted first by city, then by zip code, then by last name, and finally by first name. If any of these properties are not set, we want to move the user to the end of the list. It may also be necessary to ignore the case when comparing cities. An example of a simple result set in tabulated form is shown below:

| Brisbane  | 4000 | Burke     | Jason   |
| Brisbane  | null | Appleseed | Frank   |
| melbourne | 3001 | Collins   | Lindon  |
| Melbourne | 3003 | Collins   | Grant   |
| null      | 1000 | null      | Matthew |

The corresponding User class could look like this:

class User {
    @Nullable public String getCity() { ... }
    @Nullable public Integer getZipCode() { ... }
    @Nullable public String getFirstName() { ... }
    @Nullable public String getLastName() { ... }
}

Although it’s not an overly complex task, composing multiple comparators can be generalized. Here is a pseudo-code algorithm for this purpose:

result = 0
for (comparator : comparators):
    result = comparator.compare(a, b)
    if (result != 0):
        break

Using the flexibility of Java 8, we can create a generic ComparatorBuilder to simplify this task, particularly in regards to null checks:

public class ComparatorBuilder<T> {

    private final List<Comparator<T>> steps = new ArrayList<>();

    public <S extends Comparable<S>>
    ComparatorBuilder<T> by(Function<T, S> property) {
        return by(property, Function.identity());
    }

    public <S extends Comparable<S>, Q>
    ComparatorBuilder<T> by(Function<T, Q> property, Function<Q, S> converter) {
        steps.add((T a1, T a2) -> {
            Q q1 = property.apply(a1);
            Q q2 = property.apply(a2);
            S s1 = q1 == null ? null : converter.apply(q1);
            S s2 = q2 == null ? null : converter.apply(q2);
            if (s1 == null) {
                return (s2 == null) ?  0 : 1;
            } else {
                return (s2 == null) ? -1 : s1.compareTo(s2);
            }
        });
        return this;
    }

    public Comparator<T> build() {
        return (T a, T b) -> {
            int result = 0;
            for (Comparator<T> step : steps) {
                result = step.compare(a, b);
                if (result != 0) {
                    break;
                }
            }
            return result;
        };
    }
}

For example, we can create a comparator for the User class as follows:

Comparator<User> comparator = new ComparatorBuilder<User>()
        .by(User::getCity, String::toLowerCase)
        .by(User::getZipCode)
        .by(User::getLastName)
        .by(User::getFirstName)
        .build();
Collections.sort(users, comparator);

Overall, this approach is quite neat compared to the alternatives.

Hierarchical caches with Spring

This article provides a solution to optimize the performance of a large database by using memoization and caching techniques in the Database and Integrator services. The goal is to reduce the stress on the database by providing read-only access to the data and perform calculations on this data. In this solution, you will use a range of Spring caching tools to achieve these goals.

To start, you create a simple class called Database that loads a time series for you:

public class Database {
    public double[] load(String series) {
        ... // horribly expensive database access goes here
    }
}

You also create a class Integrator that uses this database to calculate the sum of the time series, producing an integral:

public class Integrator {

    private Database database;

    public double[] run(String series) {
        double[] data = database.load(series);
        double[] result = new double[data.length];
        double sum = 0;
        for (int i = 0; i < data.length; i++) {
            sum += data[i];
            result[i] = sum;
        }
        return result;
    }
}

However, this design has some drawbacks; if a new data point is added to the time series in the database, the entire time series must be retrieved again. Even if there is no new data, if we request the integrated time series, the original time series will still be loaded and the calculation will be performed.

One solution to this problem is to use memoization, but because this is a stateful system, our caching strategy cannot be overly simplistic. To address these issues, we will use a range of Spring caching tools.

We will begin by adding caching to the Database and Integrator services:

public class Integrator {

    ...

    @Cacheable(cacheNames = "integral", key = "#series")
    public double[] run(String series) {
        ...
    }
}

public class Database {

    ...

    @Cacheable(cacheNames = "nominal", key = "#series")
    public double[] load(String series) {
        ...
    }
}

We have made progress by memoizing and caching everything, but next step is crucial. We need to coordinate cache eviction with the arrival of new data. To achieve this, we will introduce a new abstraction called a Repository:

public class Repository {

    ...

    private Database database;

    @Caching(
            evict = @CacheEvict(value = "integral", key = "#series"),
            put = @CachePut(value = "nominal", key = "#series")
    )
    public double[] update(String series, double value) {
        double[] e = database.load(series);
        double[] d = new double[e.length + 1];
        System.arraycopy(e, 0, d, 0, e.length);
        d[e.length] = value;
        return d;
    }

    @Caching(evict = {
            @CacheEvict(value = "nominal", key = "#secret"),
            @CacheEvict(value = "integral", key = "#secret")
    })
    public void reset(String series) {
    }
}

When it comes to this design, there are a few subtle details to keep in mind. The update method, for example, doesn’t update the database directly. Instead, it:

  • Retrieves the value from the “nominal” cache or the database by executing database.load()
  • Adds a value to the end of the loaded series
  • Stores the updated series in the “nominal” cache using the @CachePut annotation
  • Clears the “integral” cache for the updated series using the @CacheEvict annotation

Additionally, the reset method does not access the database at all. It simply resets both the “nominal” and “integral” caches using annotations.

It’s worth noting that this approach may seem complex at first glance, but it becomes more practical when there are multiple hierarchical calculations that need to be run on the data. By specifying functional dependencies in the Java code and caching policies in the annotations, the concerns are separated and the code becomes more reusable and testable.

Spring data JPA and inheritance

Let’s start adding user management to our application. There are two kind of personal information that we want to capture at the moment: existing users and invitations sent out for new potential customers. Invitations and Users have a lot in common so let’s start with entities for our Spring Boot application

@Entity
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)
public abstract class Person {

    private @Id @GeneratedValue(strategy = GenerationType.TABLE) Long id;
    private String firstName;
    private String lastName;
    private String email;

    ...
}

@Entity
public class Invitation extends Person {

    ...

}

@Entity
public class User extends Person {

    ...

}

Let’s start with keeping invitations and users in completely separate tables by using InheritanceType.TABLE_PER_CLASS

To work with the set of data create a simple repository

public interface PersonRepository extends JpaRepository<Person, Long> {

    Page<Person> findAll(Pageable pageable);
}

If we now issue a simple Pageable request

Pageable pageable = new PageRequest(0, 10, Sort.Direction.ASC, "firstName", "lastName");
Page<Person> firstPage = personRepository.findAll(pageable);
for (Person person : firstPage) {
    System.out.println(person);
}

So let’s see what kind of queries the database issues in the background by placing the following application.properties in /src/test/resources

spring.jpa.show-sql = true
spring.jpa.properties.hibernate.format_sql = true

logging.level.org.hibernate.SQL = DEBUG
logging.level.org.hibernate.type.descriptor.sql.BasicBinder = TRACE

If we stay with TABLE_PER_CLASS strategy the hibernate issues the following query

SELECT person0_.id AS id1_1_,
       person0_.email AS email2_1_,
       person0_.first_name AS first_na3_1_,
       person0_.last_name AS last_nam4_1_,
       person0_.date AS date1_0_,
       person0_.clazz_ AS clazz_
  FROM ( SELECT id,
                email,
                first_name,
                last_name,
                null AS date,
                1 AS clazz_
           FROM user UNION ALL
         SELECT id,
                email,
                first_name,
                last_name,
                date,
                2 AS clazz_
           FROM invitation
       ) person0_
 ORDER BY person0_.first_name ASC,
          person0_.last_name ASC
 LIMIT ?

For SINGLE_TABLE strategy we get

SELECT person0_.id AS id2_0_,
       person0_.email AS email3_0_,
       person0_.first_name AS first_na4_0_,
       person0_.last_name AS last_nam5_0_,
       person0_.date AS date6_0_,
       person0_.dtype AS dtype1_0_
  FROM person person0_
 ORDER BY person0_.first_name ASC,
          person0_.last_name ASC
 LIMIT ?

And for the JOINED strategy we get

SELECT person0_.id AS id1_1_,
       person0_.email AS email2_1_,
       person0_.first_name AS first_na3_1_,
       person0_.last_name AS last_nam4_1_,
       person0_2_.date AS date1_0_,
       CASE
           WHEN person0_1_.id IS NOT NULL THEN 1
           WHEN person0_2_.id IS NOT NULL THEN 2
           WHEN person0_.id IS NOT NULL THEN 0
       END as clazz_
  FROM person person0_
  LEFT OUTER JOIN user person0_1_
               ON person0_.id=person0_1_.id
  LEFT OUTER JOIN invitation person0_2_
               ON person0_.id=person0_2_.id
 ORDER BY person0_.first_name ASC,
          person0_.last_name ASC
 LIMIT ?

That’s it for now

Writing a simple query language with ANTLR

Let’s assume we now have our database of IoT devices and we’d like to provide a simple search interface that would allow us to send queries like the following

location within 10 km from (-37.814, 144.963) and status.stateOfCharge < 10%

Although it might sound like an intimidatingly complex task, the fact is it’s rather simple to achive with ANTLR. Let’s begin with the query language. The following grammar will do for now.

grammar Query;

@header {
    package com.kartashov.postgis.antlr;
}

AND     : ',' | 'and' ;
OR      : 'or' ;
INT     : '-'? [0-9]+ ;
DOUBLE  : '-'? [0-9]+'.'[0-9]+ ;
WITHIN  : 'within' ;
FROM    : 'from' ;
ID      : [a-zA-Z_][a-zA-Z_0-9]* ;
STRING  :  '"' (~["])* '"' | '\'' (~['])* '\''
        {
            String s = getText();
            setText(s.substring(1, s.length() - 1));
        }
        ;
EQ      : '=' '=' ? ;
LE      : '<=' ;
GE      : '>=' ;
NE      : '!=' ;
LT      : '<' ;
GT      : '>' ;
SEP     : '.' ;
WS      : [ \t\r\n]+ -> skip ;

query : expression ;

expression
    : expression AND expression # AndExpression
    | expression OR expression  # OrExpression
    | predicate                 # PredicateExpression
    | '(' expression ')'        # BracketExpression
    ;

reference : element (SEP element)* ;

element : ID ;

predicate
    : reference WITHIN amount FROM location # LocationPredicate
    | reference operator term               # OperatorPredicate
    ;

location : '(' DOUBLE ',' DOUBLE ')' ;

term
    : reference
    | value
    | amount
    ;

operator
    : LE
    | GE
    | NE
    | LT
    | GT
    | EQ
    ;

amount : value unit ;

value
   : INT          # IntegerValue
   | DOUBLE       # DoubleValue
   | STRING       # StringValue
   | ID           # StringValue
   ;

unit :
   | '%'
   | ID
   ;

If you’re not using IntelliJ IDEA, please start doing so for this project, as the ANTLR4 plugin can help you a great deal debugging your grammars.

There are a lot of things that missing in this language, from a very relaxed handling of units of measurement all the way to not checking the type of device properties. As they loved to say at my alma mater, we leave the proof of the theorem as an exercise for a curious reader.

With the language so simple it’s actually easy to just walk down the syntactic tree and generate JPQL statement on our way down. We extend the class QueryBaseVisitor that was generated by the ANTLR maven plugin. This pattern is fairly widely used so it would be helpful to understand it better. The abstract autogenerated parent registers callbacks, and by overriding these callbacks you can 1) emit the target language constructs, 2) update the internal state of the visitor, and 3) adjust the walking algorithm by skipping certain branches of a syntactic tree.

public class QueryVisitor extends QueryBaseVisitor<String> {

    private static final GeometryFactory geometryFactory = new GeometryFactory();

    private final Map<String, Object> parameters = new HashMap<>();

    public Map<String, Object> getParameters() {
        return parameters;
    }

    private String addParameter(Object value) {
        String name = "var" + parameters.size();
        parameters.put(name, value);
        return name;
    }

    @Override
    public String visitQuery(QueryParser.QueryContext ctx) {
        return "SELECT d FROM Device AS d WHERE " + visit(ctx.expression());
    }

    @Override
    public String visitBracketExpression(QueryParser.BracketExpressionContext ctx) {
        return visit(ctx.expression());
    }

    @Override
    public String visitAndExpression(QueryParser.AndExpressionContext ctx) {
        return visit(ctx.expression(0)) + " AND " + visit(ctx.expression(1));
    }

    @Override
    public String visitPredicateExpression(QueryParser.PredicateExpressionContext ctx) {
        return visit(ctx.predicate());
    }

    @Override
    public String visitOrExpression(QueryParser.OrExpressionContext ctx) {
        return "(" + visit(ctx.expression(0)) + " OR " + visit(ctx.expression(1)) + ")";
    }

    @Override
    public String visitOperator(QueryParser.OperatorContext ctx) {
        return ctx.getText();
    }

    @Override
    public String visitIntegerValue(QueryParser.IntegerValueContext ctx) {
        return addParameter(Integer.valueOf(ctx.getText()));
    }

    @Override
    public String visitDoubleValue(QueryParser.DoubleValueContext ctx) {
        return addParameter(Double.valueOf(ctx.getText()));
    }

    @Override
    public String visitStringValue(QueryParser.StringValueContext ctx) {
        return addParameter(ctx.getText());
    }

    @Override
    public String visitAmount(QueryParser.AmountContext ctx) {
        Amount<?> amount = Amount.valueOf(ctx.getText());
        @SuppressWarnings("unchecked")
        double value = amount.doubleValue((Unit) amount.getUnit().getStandardUnit());
        return addParameter(value);
    }

    @Override
    public String visitUnit(QueryParser.UnitContext ctx) {
        return ctx.getText();
    }

    @Override
    public String visitElement(QueryParser.ElementContext ctx) {
        return ctx.getText();
    }

    @Override
    public String visitOperatorPredicate(QueryParser.OperatorPredicateContext ctx) {
        String operator = visit(ctx.operator());
        String value = visit(ctx.term());
        String reference = visitReference(ctx.reference(), parameters.get(value).getClass());
        return reference + " " + operator + " :" + value;
    }

    public String visitReference(QueryParser.ReferenceContext ctx, Class<?> type) {
        List<String> elements = ctx.element().stream()
	        .map(this::visitElement)
                .collect(Collectors.toList());
        String base = "d." + elements.get(0);
        if (elements.size() == 1) {
            return base;
        } else {
            List<String> tail = elements.subList(1, elements.size());
            String extract = "extract(" + base + ", '" + String.join("', '", tail) + "')";
            if (type == Integer.class) {
                return "CAST(" + extract + " integer)";
            } else if (type == Double.class) {
                return "CAST(" + extract + " double)";
            } else {
                return extract;
            }
        }
    }

    @Override
    public String visitLocationPredicate(QueryParser.LocationPredicateContext ctx) {
        String reference = visit(ctx.reference());
        String location = visit(ctx.location());
        String distance = visit(ctx.amount());
        return "distance(" + reference + ", :" + location + ") <= :" + distance;
    }

    @Override
    public String visitLocation(QueryParser.LocationContext ctx) {
        double latitude = Double.valueOf(ctx.latitude().getText());
        double longitude = Double.valueOf(ctx.longitude().getText());
        Point point = geometryFactory.createPoint(new Coordinate(latitude, longitude));
        point.setSRID(4326);
        return addParameter(point);
    }

    @Override
    public String visitTerm(QueryParser.TermContext ctx) {
        if (ctx.amount() != null) {
            return visit(ctx.amount());
        } else if (ctx.value() != null) {
            return visit(ctx.value());
        } else {
            return visit(ctx.reference());
        }
    }
}

Nothing here is particalarly complex or complicated. I use JScience (never ever use it) to convert units of measurement to standard units (like miles and kilometers to meters, and percentages to double numbers).

The rest of the project is boilerplate code mostly, like search service

@Service
@Transactional
public class SearchService {

    private static final Logger logger = LoggerFactory.getLogger(SearchService.class);

    @Autowired
    private EntityManager entityManager;

    @SuppressWarnings("unchecked")
    public List<Device> search(String query) throws IOException {

        logger.info("Parsing search query {}", query);

        ANTLRInputStream input = new ANTLRInputStream(
                new ByteArrayInputStream(query.getBytes(StandardCharsets.UTF_8)));
        QueryLexer lexer = new QueryLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);

        QueryParser parser = new QueryParser(tokens);
        ParseTree tree = parser.query();

        logger.info("Expression tree: {}", tree.toStringTree(parser));

        QueryVisitor visitor = new QueryVisitor();
        String jpqlQuery = visitor.visit(tree);

        logger.info("Resulting JPQL query:\n{}", jpqlQuery);

        Query queryObject = entityManager.createQuery(jpqlQuery);
        for (Map.Entry<String, Object> entry : visitor.getParameters().entrySet()) {
            queryObject.setParameter(entry.getKey(), entry.getValue());
        }
        return queryObject.getResultList();
    }
}

Now we can send queries to the database like following

List<Device> devices = searchService
        .search("location within 10 km from (-37.814, 144.963) and status.stateOfCharge < 10%");

Rather neat. You can see the complete source code on GitHub

As a warning note: you probably should not build JPQL queries like this, unless you PoC’ing like I do here. A much more sound would be using JPQ Criteria API.

Adding JSONB support to Hibernate

As you may be aware, the use of @Converters allows for the seamless translation of JSON objects to and from domain objects. However, there is a limitation to this approach in that it prohibits the use of JSON functionalities in JPQL queries, forcing one to rely solely on native SQL queries. To overcome this limitation, a custom Hibernate dialect that supports JSONB type and functions can be implemented, thereby exposing them to the JPQL level.

To accomplish this, we can create a custom Hibernate dialect that is based on PostgisDialect, as we plan to incorporate both PostGIS and JSON in our application. This can be achieved by overriding the registerTypesAndFunctions function. The source code for this implementation can be found on GitHub.

Our objective is to achieve three key goals:

  • the ability to have custom types backed by JSON to be available in JPA entities
  • the ability to utilize jsonb_* functions in JPLQ queries
  • the ability to easily add new types mapped to JSONB columns with minimal coding.

By creating a custom Hibernate dialect, we can accomplish these goals, making our development process more efficient and streamlined.

Let’s start by creating a custom Hibernate dialect, basing it on PostgisDialect, as we plan to use both PostGIS and JSON in our application. We need to override the registerTypesAndFunctions function:

@Override
protected void registerTypesAndFunctions() {
    super.registerTypesAndFunctions();
    registerColumnType(JSONTypeDescriptor.INSTANCE.getSqlType(), "jsonb");
    registerFunction("extract",
            new StandardSQLFunction("jsonb_extract_path_text", StandardBasicTypes.STRING));
}

When registering a column type, we inform Hibernate on how to handle instances where the target column is of type jsonb. The JSONTypeDescriptor class registers the SQL type name and the converters that translate between the database object and the domain object, which are known as “binder” and “extractor”. The process is relatively simple, as JSONB is received as a PGobject which is simply a tagged string.

By registering a function, we are able to use that function, such as jsonb_extract_path_text, in our code. The StandardSQLFunction class is utilized to explain how to translate the standard form of f(x, y, ...) into plain SQL. This process is also relatively straightforward.

However, just having a JSON string is only half of the solution. We also need to translate between domain objects and JSON strings. To accomplish this, we use the Jackson library for its powerful data binding capabilities:

public class JSONBackedTypeDescriptor<T> extends AbstractTypeDescriptor<T> {

    private static final Logger logger = LoggerFactory.getLogger(JSONBackedTypeDescriptor.class);
    private static final ObjectMapper objectMapper = new ObjectMapper();

    public JSONBackedTypeDescriptor(Class<T> type) {
        super(type);
        JavaTypeDescriptorRegistry.INSTANCE.addDescriptor(this);
    }

    @Override
    public String toString(T value) {
        try {
            return objectMapper.writeValueAsString(value);
        } catch (JsonProcessingException e) {
            logger.warn("Cannot convert map {} to string", e);
            return "{}";
        }
    }

    @Override
    public T fromString(String string) {
        try {
            return objectMapper.readValue(string, getJavaTypeClass());
        } catch (IOException e) {
            logger.warn("Cannot read value from {}", string, e);
            return null;
        }
    }

    @Override
    public <X> X unwrap(T value, Class<X> type, WrapperOptions options) {
        if (value == null) {
            return null;
        }
        if (type.isAssignableFrom(value.getClass())) {
            return type.cast(value);
        }
        if (String.class.isAssignableFrom(type)) {
            return type.cast(toString(value));
        }
        throw unknownUnwrap(type);
    }

    @Override
    public <X> T wrap(X value, WrapperOptions options) {
        if (value == null) {
            return null;
        }
        if (value.getClass().isAssignableFrom(getJavaTypeClass())) {
            return getJavaTypeClass().cast(value);
        }
        if (value instanceof String) {
            return fromString((String) value);
        }
        throw unknownWrap(value.getClass());
    }
}

To ensure seamless integration, it is essential to link our two converters together.

Domain Object <-- JavaTypeDescriptor --> String <-- SqlTypeDescriptor --> JSONP

This includes creating a connection between the domain object, the JavaTypeDescriptor, the string representation, the SqlTypeDescriptor, and finally, the JSONP format. By creating an abstract class that facilitates this connection, we can ensure a smooth integration process:

public abstract class JSONBackedType<T> extends AbstractSingleColumnStandardBasicType<T> {

    public JSONBackedType(JSONBackedTypeDescriptor<T> javaTypeDescriptor) {
        super(JSONTypeDescriptor.INSTANCE, javaTypeDescriptor);
    }

    @Override
    public String[] getRegistrationKeys() {
        return new String[] { getJavaTypeDescriptor().getJavaTypeClass().getCanonicalName() };
    }

    @Override
    public String getName() {
        return getJavaTypeDescriptor().getJavaTypeClass().getName();
    }
}

With all the necessary non-domain specific code in place, we can now proceed to create our entities and repositories. Let’s take a look at our Device entity, defined as follows:

@Entity
@Table(name = "devices")
public class Device {

    @Id
    @Column(name = "id")
    private String id;

    @Column(name = "location", nullable = false, columnDefinition = "geometry(Point,4326)")
    private Point location;

    @Column(name = "status", nullable = false, columnDefinition = "jsonb")
    @Type(type = "com.kartashov.postgis.types.StatusType")
    private Status status = new Status();

    ...
}

The Device entity is accompanied by an embedded Status object, which is defined as follows:

public class Status {

    private double stateOfCharge;

    private String lifeCycle;

    ...
}

To link the Java and SQL type descriptors, we have defined the StatusType class, which extends the JSONBackedType class:

public class StatusType extends JSONBackedType<Status> {

    private final static JSONBackedTypeDescriptor<Status> descriptor =
            new JSONBackedTypeDescriptor<>(Status.class);

    public StatusType() {
        super(descriptor);
    }
}

With our custom Hibernate dialect in place, we can now seamlessly integrate JSONB functionality into our JPQL queries. For instance, we can easily retrieve all devices with a state of charge greater than 10% by utilizing the following query within our DeviceRepository interface:

public interface DeviceRepository extends CrudRepository<Device, String> {

    @Query("SELECT d FROM Device d WHERE CAST(extract(d.status, 'stateOfCharge') float) > 0.1")
    List<Device> findHealthyDevices();

    ...
}

This JPQL query will be translated into the following SQL query:

select device0_.id as id1_0_,
       device0_.location as location2_0_,
       device0_.status as status3_0_
  from devices device0_
 where cast(jsonb_extract_path_text(device0_.status, 'stateOfCharge') as float4) > 0.1

This implementation effectively achieves our goal of integrating JSONB functionality into our JPQL queries, allowing for easy and efficient data retrieval from our devices.

PostGIS for very impatient

PostGIS is a powerful PostgreSQL extension designed to work with spatial data, including points, lines, and polygons. To begin experimenting with this extension, it is often easiest to pull a Docker image with PostGIS already installed. The following command can be used to do so:

sudo apt-get install docker.io
sudo docker run --name postgis -e POSTGRES_PASSWORD=postgis -d mdillon/postgis:9.4

Once the image is pulled, you can access the psql console by executing:

sudo docker exec -it postgis psql -U postgres

At the prompt, you can check the version of PostgreSQL/PostGIS by running the following command:

SELECT version();
SELECT postgis_full_version();

As Docker runs the database in a container, you will need the IP address if you wish to access PostgreSQL from outside the container. The following command can be used to retrieve the IP address:

sudo docker inspect postgis | grep IPAddress

ou can also use pgcli, a user-friendly cli tool, to access the database:

pgcli -h 172.17.0.2 -U postgres

To further explore the capabilities of PostGIS, we can create a simple Spring Boot project. The source code for this project can be found on GitHub

Initliaze the application

spring.jpa.database-platform = org.hibernate.spatial.dialect.postgis.PostgisDialect
spring.jpa.show-sql = true
spring.jpa.hibernate.ddl-auto = update

spring.datasource.url = jdbc:postgresql://172.17.0.1:5432/postgres?user=postgres&password=postgis
spring.datasource.driver-class-name = org.postgresql.Driver

In this example, we will use a device entity that concerns itself primarily with the location of the device:

@Entity
@Table(name = "devices")
public class Device {

    @Id
    @Column(name = "id")
    private String id;

    @Column(name = "location", nullable = false, columnDefinition = "geometry(Point,4326)")
    private Point location;

    ...
}

It is essential to define a custom type for the location attribute, so that Hibernate can link Java geometry types with SQL geometry types.

The Hibernate dialect allows us to send queries like the following through JPQL:

public interface DeviceRepository extends CrudRepository<Device, String> {

    @Query("SELECT d FROM Device AS d WHERE within(d.location, :polygon) = TRUE")
    List<Device> findWithinPolygon(@Param("polygon") Polygon polygon);
}

You can take a pick at the class org.hibernate.spatial.dialect.postgis.PostgisDialect to see what functions are available on JPQL side and how they map to PostGIS functions

Now we can test the search functionality

Device device = new Device();
device.setId("de-001");
Point point = (Point) new WKTReader().read("POINT(5 5)");
device.setLocation(point);
deviceRepository.save(device);

Polygon polygon = (Polygon) new WKTReader().read("POLYGON((0 0,0 10,10 10,10 0,0 0))");
List<Device> devices = deviceRepository.findWithinPolygon(polygon);

System.out.println(devices);

Working with JSON in Postgres 9.5

Docker is an incredibly useful tool that allows for easy experimentation with new software. It enables the simultaneous installation, suspension, and deletion of multiple versions of programs, such as Postgres, with minimal effort. The installation process is as simple as executing the following command:

sudo apt-get install docker.io
sudo docker run --name postgres -e POSTGRES_PASSWORD=postgres -d postgres:9.5

Once Docker is installed, you can access an interactive Postgres shell by executing the following command:

sudo docker exec -it postgres psql -U postgres

With this tool, we can now delve into the new JSON functionality available in Postgres 9.5. We will begin by creating two tables, devices and observations, and adding some data to them:

CREATE TABLE sensors (
    id     integer PRIMARY KEY,
    type   character varying(64) NOT NULL,
    config jsonb NOT NULL
);

CREATE INDEX config_gin ON sensor USING gin(config);

INSERT INTO sensors (id, type, config)
     VALUES (1, 'soil-moisture',    '{ "alpha": 0.543, "beta": -2.312, "enabled": true }'),
            (2, 'soil-temperature', '{ "enabled": true, "depth": 0.24 }'),
            (3, 'humidity', '{ "enabled": false, "height": 1.34, "device": { "version": "3.4", "supported": true } }');

We can then enable the humidity sensor:

UPDATE sensors
   SET config = config || '{ "enabled": true }'::jsonb
 WHERE id = 3;

Remove the alpha and the beta parameters used for soil moisture calibration:

UPDATE sensors
   SET config = config - 'alpha' - 'beta'
 WHERE type = 'soil-moisture';

And now let’s remove the supported tag from all device sections:

UPDATE sensors
   SET config = config #- '{device,supported}';

Fetch the device version information wherever it’s available:

WITH versioning AS (
    SELECT id, type, config #>> '{device,version}' AS version
      FROM sensors
)
SELECT *
  FROM versioning
 WHERE version IS NOT NULL;

Find all the sensors where the depth is specified:

SELECT *
  FROM sensors
 WHERE config ? 'depth';

Let’s find all properly placed sensors, where either the depth or the height are specified, but not both:

SELECT *
  FROM sensors
 WHERE config ?| array['depth', 'height']
   AND NOT config ?& array['depth', 'height'];

For further reading on the JSON functionality available in Postgres 9.5, I recommend consulting the official documentation.

Working with JSON in PostgreSQL

The following table, referred to as “sensors”, displays sensor data in the format of an id, location, and status. The status field is in the form of a JSON object.

+------+------------+---------------------------------------+
|   id | location   | status                                |
|------+------------+---------------------------------------|
|    1 | backyard   | {"healthy": false, "temperture":0.0}  |
|    2 | frontyard  | {"healthy": false, "temperture":12.7} |
|    3 | unknown    | {"healthy": true, "temperture":28.1}  |
|    4 | farm       | {"healthy": false, "temperture":48.1, |
|      |            |  "errorCode": 78}                     |
+------+------------+---------------------------------------+

We can extract a list of all possible status fields with the following query:

SELECT DISTINCT json_object_keys(status) AS status_field
           FROM sensors
       ORDER BY status_field;

which produces the following result:

+----------------+
| status_field   |
|----------------|
| errorCode      |
| healthy        |
| temperture     |
+----------------+

To determine the highest temperature recorded by any sensor, we can use the following query:

SELECT MAX((status->>'temperature')::text)
  FROM sensors;

This query returns:

+-------+
|   max |
|-------|
|  48.1 |
+-------+

However, if we wish to determine the highest temperature recorded by only healthy sensors, we can use the following query:

SELECT MAX((status->>'temperture')::text)
  FROM sensors
 WHERE (status->>'healthy')::bool

which produces the following result:

+-------+
|   max |
|-------|
|  28.1 |
+-------+

Additionally, we can retrieve the average temperature of healthy sensors grouped by location with the following query:

SELECT location,
       AVG((status->>'temperture')::float) AS temperature
  FROM sensors
 WHERE (status->>'healthy')::bool
 GROUP BY location;

which produces the following result:

+------------+---------------+
| location   |   temperature |
|------------+---------------|
| unknown    |          28.1 |
+------------+---------------+

We can further expand upon this data by using the functions json_each, json_each_text, and so on, to expand the JSON status object into a larger table. So in our next iteration, we can use the following SQL query to fetch the packed records:

SELECT id,
       json_each_text(status) AS status_field
  FROM sensors;

This query returns:

+------+-------------------+
|   id | json_each_text    |
|------+-------------------|
|    1 | (healthy,false)   |
|    1 | (temperture,0.0)  |
|    2 | (healthy,false)   |
|    2 | (temperture,12.7) |
|    3 | (healthy,true)    |
|    3 | (temperture,28.1) |
|    4 | (healthy,false)   |
|    4 | (temperture,48.1) |
|    4 | (errorCode,78)    |
+------+-------------------+

We will then proceed to unfold the status_field record through the following query:

SELECT id,
       (status).key,
       (status).value
  FROM (SELECT id,
               json_each(status) AS status
          FROM sensors) AS statuses;

This will yield the following table:

+------+------------+---------+
|   id | key        | value   |
|------+------------+---------|
|    1 | healthy    | false   |
|    1 | temperture | 0.0     |
|    2 | healthy    | false   |
|    2 | temperture | 12.7    |
|    3 | healthy    | true    |
|    3 | temperture | 28.1    |
|    4 | healthy    | false   |
|    4 | temperture | 48.1    |
|    4 | errorCode  | 78      |
+------+------------+---------+

Note that the third column has a type of text.

SELECT DISTINCT pg_typeof((status).key) AS key_type,
                pg_typeof((status).value) AS value_type
           FROM (SELECT id,
                        json_each_text(status) AS status
                   FROM sensors) AS statuses;

Which gives us

+------------+--------------+
| key_type   | value_type   |
|------------+--------------|
| text       | text         |
+------------+--------------+

It is important to note that when using json_each, the value_type should be json and not text. This is significant because it preserves the data type information (as limited as it may be in JSON) as opposed to casting it to text.

Additionally, it is worth mentioning that utilizing the pgcli tool while experimenting with Postgres proved to be extremely beneficial. Further information regarding composite types can be found in the Postgres documentation: http://www.postgresql.org/docs/9.4/static/rowtypes.html