Adding JSONB support to Hibernate

As you may be aware, the use of @Converters allows for the seamless translation of JSON objects to and from domain objects. However, there is a limitation to this approach in that it prohibits the use of JSON functionalities in JPQL queries, forcing one to rely solely on native SQL queries. To overcome this limitation, a custom Hibernate dialect that supports JSONB type and functions can be implemented, thereby exposing them to the JPQL level.

To accomplish this, we can create a custom Hibernate dialect that is based on PostgisDialect, as we plan to incorporate both PostGIS and JSON in our application. This can be achieved by overriding the registerTypesAndFunctions function. The source code for this implementation can be found on GitHub.

Our objective is to achieve three key goals:

  • the ability to have custom types backed by JSON to be available in JPA entities
  • the ability to utilize jsonb_* functions in JPLQ queries
  • the ability to easily add new types mapped to JSONB columns with minimal coding.

By creating a custom Hibernate dialect, we can accomplish these goals, making our development process more efficient and streamlined.

Let’s start by creating a custom Hibernate dialect, basing it on PostgisDialect, as we plan to use both PostGIS and JSON in our application. We need to override the registerTypesAndFunctions function:

@Override
protected void registerTypesAndFunctions() {
    super.registerTypesAndFunctions();
    registerColumnType(JSONTypeDescriptor.INSTANCE.getSqlType(), "jsonb");
    registerFunction("extract",
            new StandardSQLFunction("jsonb_extract_path_text", StandardBasicTypes.STRING));
}

When registering a column type, we inform Hibernate on how to handle instances where the target column is of type jsonb. The JSONTypeDescriptor class registers the SQL type name and the converters that translate between the database object and the domain object, which are known as “binder” and “extractor”. The process is relatively simple, as JSONB is received as a PGobject which is simply a tagged string.

By registering a function, we are able to use that function, such as jsonb_extract_path_text, in our code. The StandardSQLFunction class is utilized to explain how to translate the standard form of f(x, y, ...) into plain SQL. This process is also relatively straightforward.

However, just having a JSON string is only half of the solution. We also need to translate between domain objects and JSON strings. To accomplish this, we use the Jackson library for its powerful data binding capabilities:

public class JSONBackedTypeDescriptor<T> extends AbstractTypeDescriptor<T> {

    private static final Logger logger = LoggerFactory.getLogger(JSONBackedTypeDescriptor.class);
    private static final ObjectMapper objectMapper = new ObjectMapper();

    public JSONBackedTypeDescriptor(Class<T> type) {
        super(type);
        JavaTypeDescriptorRegistry.INSTANCE.addDescriptor(this);
    }

    @Override
    public String toString(T value) {
        try {
            return objectMapper.writeValueAsString(value);
        } catch (JsonProcessingException e) {
            logger.warn("Cannot convert map {} to string", e);
            return "{}";
        }
    }

    @Override
    public T fromString(String string) {
        try {
            return objectMapper.readValue(string, getJavaTypeClass());
        } catch (IOException e) {
            logger.warn("Cannot read value from {}", string, e);
            return null;
        }
    }

    @Override
    public <X> X unwrap(T value, Class<X> type, WrapperOptions options) {
        if (value == null) {
            return null;
        }
        if (type.isAssignableFrom(value.getClass())) {
            return type.cast(value);
        }
        if (String.class.isAssignableFrom(type)) {
            return type.cast(toString(value));
        }
        throw unknownUnwrap(type);
    }

    @Override
    public <X> T wrap(X value, WrapperOptions options) {
        if (value == null) {
            return null;
        }
        if (value.getClass().isAssignableFrom(getJavaTypeClass())) {
            return getJavaTypeClass().cast(value);
        }
        if (value instanceof String) {
            return fromString((String) value);
        }
        throw unknownWrap(value.getClass());
    }
}

To ensure seamless integration, it is essential to link our two converters together.

Domain Object <-- JavaTypeDescriptor --> String <-- SqlTypeDescriptor --> JSONP

This includes creating a connection between the domain object, the JavaTypeDescriptor, the string representation, the SqlTypeDescriptor, and finally, the JSONP format. By creating an abstract class that facilitates this connection, we can ensure a smooth integration process:

public abstract class JSONBackedType<T> extends AbstractSingleColumnStandardBasicType<T> {

    public JSONBackedType(JSONBackedTypeDescriptor<T> javaTypeDescriptor) {
        super(JSONTypeDescriptor.INSTANCE, javaTypeDescriptor);
    }

    @Override
    public String[] getRegistrationKeys() {
        return new String[] { getJavaTypeDescriptor().getJavaTypeClass().getCanonicalName() };
    }

    @Override
    public String getName() {
        return getJavaTypeDescriptor().getJavaTypeClass().getName();
    }
}

With all the necessary non-domain specific code in place, we can now proceed to create our entities and repositories. Let’s take a look at our Device entity, defined as follows:

@Entity
@Table(name = "devices")
public class Device {

    @Id
    @Column(name = "id")
    private String id;

    @Column(name = "location", nullable = false, columnDefinition = "geometry(Point,4326)")
    private Point location;

    @Column(name = "status", nullable = false, columnDefinition = "jsonb")
    @Type(type = "com.kartashov.postgis.types.StatusType")
    private Status status = new Status();

    ...
}

The Device entity is accompanied by an embedded Status object, which is defined as follows:

public class Status {

    private double stateOfCharge;

    private String lifeCycle;

    ...
}

To link the Java and SQL type descriptors, we have defined the StatusType class, which extends the JSONBackedType class:

public class StatusType extends JSONBackedType<Status> {

    private final static JSONBackedTypeDescriptor<Status> descriptor =
            new JSONBackedTypeDescriptor<>(Status.class);

    public StatusType() {
        super(descriptor);
    }
}

With our custom Hibernate dialect in place, we can now seamlessly integrate JSONB functionality into our JPQL queries. For instance, we can easily retrieve all devices with a state of charge greater than 10% by utilizing the following query within our DeviceRepository interface:

public interface DeviceRepository extends CrudRepository<Device, String> {

    @Query("SELECT d FROM Device d WHERE CAST(extract(d.status, 'stateOfCharge') float) > 0.1")
    List<Device> findHealthyDevices();

    ...
}

This JPQL query will be translated into the following SQL query:

select device0_.id as id1_0_,
       device0_.location as location2_0_,
       device0_.status as status3_0_
  from devices device0_
 where cast(jsonb_extract_path_text(device0_.status, 'stateOfCharge') as float4) > 0.1

This implementation effectively achieves our goal of integrating JSONB functionality into our JPQL queries, allowing for easy and efficient data retrieval from our devices.

PostGIS for very impatient

PostGIS is a powerful PostgreSQL extension designed to work with spatial data, including points, lines, and polygons. To begin experimenting with this extension, it is often easiest to pull a Docker image with PostGIS already installed. The following command can be used to do so:

sudo apt-get install docker.io
sudo docker run --name postgis -e POSTGRES_PASSWORD=postgis -d mdillon/postgis:9.4

Once the image is pulled, you can access the psql console by executing:

sudo docker exec -it postgis psql -U postgres

At the prompt, you can check the version of PostgreSQL/PostGIS by running the following command:

SELECT version();
SELECT postgis_full_version();

As Docker runs the database in a container, you will need the IP address if you wish to access PostgreSQL from outside the container. The following command can be used to retrieve the IP address:

sudo docker inspect postgis | grep IPAddress

ou can also use pgcli, a user-friendly cli tool, to access the database:

pgcli -h 172.17.0.2 -U postgres

To further explore the capabilities of PostGIS, we can create a simple Spring Boot project. The source code for this project can be found on GitHub

Initliaze the application

spring.jpa.database-platform = org.hibernate.spatial.dialect.postgis.PostgisDialect
spring.jpa.show-sql = true
spring.jpa.hibernate.ddl-auto = update

spring.datasource.url = jdbc:postgresql://172.17.0.1:5432/postgres?user=postgres&password=postgis
spring.datasource.driver-class-name = org.postgresql.Driver

In this example, we will use a device entity that concerns itself primarily with the location of the device:

@Entity
@Table(name = "devices")
public class Device {

    @Id
    @Column(name = "id")
    private String id;

    @Column(name = "location", nullable = false, columnDefinition = "geometry(Point,4326)")
    private Point location;

    ...
}

It is essential to define a custom type for the location attribute, so that Hibernate can link Java geometry types with SQL geometry types.

The Hibernate dialect allows us to send queries like the following through JPQL:

public interface DeviceRepository extends CrudRepository<Device, String> {

    @Query("SELECT d FROM Device AS d WHERE within(d.location, :polygon) = TRUE")
    List<Device> findWithinPolygon(@Param("polygon") Polygon polygon);
}

You can take a pick at the class org.hibernate.spatial.dialect.postgis.PostgisDialect to see what functions are available on JPQL side and how they map to PostGIS functions

Now we can test the search functionality

Device device = new Device();
device.setId("de-001");
Point point = (Point) new WKTReader().read("POINT(5 5)");
device.setLocation(point);
deviceRepository.save(device);

Polygon polygon = (Polygon) new WKTReader().read("POLYGON((0 0,0 10,10 10,10 0,0 0))");
List<Device> devices = deviceRepository.findWithinPolygon(polygon);

System.out.println(devices);

Working with JSON in Postgres 9.5

Docker is an incredibly useful tool that allows for easy experimentation with new software. It enables the simultaneous installation, suspension, and deletion of multiple versions of programs, such as Postgres, with minimal effort. The installation process is as simple as executing the following command:

sudo apt-get install docker.io
sudo docker run --name postgres -e POSTGRES_PASSWORD=postgres -d postgres:9.5

Once Docker is installed, you can access an interactive Postgres shell by executing the following command:

sudo docker exec -it postgres psql -U postgres

With this tool, we can now delve into the new JSON functionality available in Postgres 9.5. We will begin by creating two tables, devices and observations, and adding some data to them:

CREATE TABLE sensors (
    id     integer PRIMARY KEY,
    type   character varying(64) NOT NULL,
    config jsonb NOT NULL
);

CREATE INDEX config_gin ON sensor USING gin(config);

INSERT INTO sensors (id, type, config)
     VALUES (1, 'soil-moisture',    '{ "alpha": 0.543, "beta": -2.312, "enabled": true }'),
            (2, 'soil-temperature', '{ "enabled": true, "depth": 0.24 }'),
            (3, 'humidity', '{ "enabled": false, "height": 1.34, "device": { "version": "3.4", "supported": true } }');

We can then enable the humidity sensor:

UPDATE sensors
   SET config = config || '{ "enabled": true }'::jsonb
 WHERE id = 3;

Remove the alpha and the beta parameters used for soil moisture calibration:

UPDATE sensors
   SET config = config - 'alpha' - 'beta'
 WHERE type = 'soil-moisture';

And now let’s remove the supported tag from all device sections:

UPDATE sensors
   SET config = config #- '{device,supported}';

Fetch the device version information wherever it’s available:

WITH versioning AS (
    SELECT id, type, config #>> '{device,version}' AS version
      FROM sensors
)
SELECT *
  FROM versioning
 WHERE version IS NOT NULL;

Find all the sensors where the depth is specified:

SELECT *
  FROM sensors
 WHERE config ? 'depth';

Let’s find all properly placed sensors, where either the depth or the height are specified, but not both:

SELECT *
  FROM sensors
 WHERE config ?| array['depth', 'height']
   AND NOT config ?& array['depth', 'height'];

For further reading on the JSON functionality available in Postgres 9.5, I recommend consulting the official documentation.

Working with JSON in PostgreSQL

The following table, referred to as “sensors”, displays sensor data in the format of an id, location, and status. The status field is in the form of a JSON object.

+------+------------+---------------------------------------+
|   id | location   | status                                |
|------+------------+---------------------------------------|
|    1 | backyard   | {"healthy": false, "temperture":0.0}  |
|    2 | frontyard  | {"healthy": false, "temperture":12.7} |
|    3 | unknown    | {"healthy": true, "temperture":28.1}  |
|    4 | farm       | {"healthy": false, "temperture":48.1, |
|      |            |  "errorCode": 78}                     |
+------+------------+---------------------------------------+

We can extract a list of all possible status fields with the following query:

SELECT DISTINCT json_object_keys(status) AS status_field
           FROM sensors
       ORDER BY status_field;

which produces the following result:

+----------------+
| status_field   |
|----------------|
| errorCode      |
| healthy        |
| temperture     |
+----------------+

To determine the highest temperature recorded by any sensor, we can use the following query:

SELECT MAX((status->>'temperature')::text)
  FROM sensors;

This query returns:

+-------+
|   max |
|-------|
|  48.1 |
+-------+

However, if we wish to determine the highest temperature recorded by only healthy sensors, we can use the following query:

SELECT MAX((status->>'temperture')::text)
  FROM sensors
 WHERE (status->>'healthy')::bool

which produces the following result:

+-------+
|   max |
|-------|
|  28.1 |
+-------+

Additionally, we can retrieve the average temperature of healthy sensors grouped by location with the following query:

SELECT location,
       AVG((status->>'temperture')::float) AS temperature
  FROM sensors
 WHERE (status->>'healthy')::bool
 GROUP BY location;

which produces the following result:

+------------+---------------+
| location   |   temperature |
|------------+---------------|
| unknown    |          28.1 |
+------------+---------------+

We can further expand upon this data by using the functions json_each, json_each_text, and so on, to expand the JSON status object into a larger table. So in our next iteration, we can use the following SQL query to fetch the packed records:

SELECT id,
       json_each_text(status) AS status_field
  FROM sensors;

This query returns:

+------+-------------------+
|   id | json_each_text    |
|------+-------------------|
|    1 | (healthy,false)   |
|    1 | (temperture,0.0)  |
|    2 | (healthy,false)   |
|    2 | (temperture,12.7) |
|    3 | (healthy,true)    |
|    3 | (temperture,28.1) |
|    4 | (healthy,false)   |
|    4 | (temperture,48.1) |
|    4 | (errorCode,78)    |
+------+-------------------+

We will then proceed to unfold the status_field record through the following query:

SELECT id,
       (status).key,
       (status).value
  FROM (SELECT id,
               json_each(status) AS status
          FROM sensors) AS statuses;

This will yield the following table:

+------+------------+---------+
|   id | key        | value   |
|------+------------+---------|
|    1 | healthy    | false   |
|    1 | temperture | 0.0     |
|    2 | healthy    | false   |
|    2 | temperture | 12.7    |
|    3 | healthy    | true    |
|    3 | temperture | 28.1    |
|    4 | healthy    | false   |
|    4 | temperture | 48.1    |
|    4 | errorCode  | 78      |
+------+------------+---------+

Note that the third column has a type of text.

SELECT DISTINCT pg_typeof((status).key) AS key_type,
                pg_typeof((status).value) AS value_type
           FROM (SELECT id,
                        json_each_text(status) AS status
                   FROM sensors) AS statuses;

Which gives us

+------------+--------------+
| key_type   | value_type   |
|------------+--------------|
| text       | text         |
+------------+--------------+

It is important to note that when using json_each, the value_type should be json and not text. This is significant because it preserves the data type information (as limited as it may be in JSON) as opposed to casting it to text.

Additionally, it is worth mentioning that utilizing the pgcli tool while experimenting with Postgres proved to be extremely beneficial. Further information regarding composite types can be found in the Postgres documentation: http://www.postgresql.org/docs/9.4/static/rowtypes.html

Using JPA with PostgreSQL JSON

As database developers, we sometimes come across the challenge of storing data that doesn’t have a fixed schema, such as user preferences or dynamic telemetry readings. When the set of keys is constantly changing or unknown, it’s not possible to normalize the data. This is where “transposing” tables comes in - it’s a common solution where a table is created with the following kind of schema:

| sensor_id | location        |
+-----------+-----------------+
|         1 |         unknown |
...

| status_field | status_value | sensor_id |
+--------------+--------------+-----------+
| temperature  | "0.9"        |         1 |
| healthy      | "false"      |         1 |
...

However, this approach has its downsides - it can lead to unnecessary joins, single type (usually a string or a blob), and a single layer of indirection. With the rise of NoSQL databases, the focus has shifted towards more flexible data structures as a solution to these issues.

A more sensible approach is to store the status data in the same table as the sensors, using a structured format such as XML or JSON. For example, in PostgreSQL:

| sensor_id | sensor_location | sensor_status           |
+-----------+-----------------+-------------------------+
|         1 |         unknown | { "temperature" : 0.9,  |
|           |                 |   "healthy" : false }   |
...

This approach allows for structured data in the database and makes it possible to use SQL queries to quickly find all unhealthy sensors, for example:

SELECT sensor_id
  FROM sensors
 WHERE (sensor_status->>'healthy')::boolean = false;

Another great aspect of this solution is that it can also be used in the application logic by using converters that handle the data conversions. For example, by using the Jackson library, you can create a Status class with annotations that allow it to work seamlessly with the JSON data stored in the database:

public class Status {

   private final Map<String, String> properties = new HashMap<>();

   @JsonAnyGetter
   public Map<Srting, Object> properties() {
       return properties;
   }

   @JsonAnySetter
   public void set(String key, Object value) {
       properties.put(key, value);
   }

   public Object get(String key) {
       if (properties.containsKey(key)) {
           return properties.get(key);
       } else {
           return null;
       }
   }
}

And a converter class can convert between the database data and the domain Status object, taking care of the unpleasant serialization and deserialization work:

@Converter
public class ConfigConverter implements AttributeConverter<Config, String> {

   private final ObjectMapper mapper = new ObjectMapper();

   @Override
   public String convertToDatabaseColumn(Status attribute) {
      try {
          return mapper.writeValueAsString(attribute);
      } catch (JsonProcessingException e) {
          throw new RuntimeException("Cannot serialize", e);
      }
   }

   @Override
   public Status convertToEntityAttribute(String dbData) {
      if (dbData == null) {
          return null;
      }
      try {
          return mapper.readValue(dbData, Status.class);
      } catch (IOException e) {
          throw new RuntimeException("Cannot deserialize", e);
      }
   }
}

By using this approach, you can have a more flexible, dynamic and maintainable data structure with a well-structured format, like JSON, and you can perform complex queries on it by just writing SQL and you can use the structure also in your application logic for more flexibility and maintainability.

@Entity
@Table(name = "sensors")
public class Sensor {

    @Id
    @GeneratedValue
    @Column(name = "sensor_id")
    private Long id;

    @Column(name = "sensor_location")
    private String location;

    @Column(name = "sensor_status")
    @Convert(converter = StatusConverter.class)
    private Status status;

    ...
}

One tiny note. You will need to use pgjdbc-ng driver for this. At least this is the only one that worked for me.