20 Jun 2016
Every multi-layerd application is built on idea of separation of concerns. The presentational layer knows about raw JSON payload and HTTP headers, but the service layer would only know about the domain objects, whereas the
underlying data access layer wouldn’t even know about which business methods the current transaction is spanning.
So for this post let’s assume that we want a tracing context during the whole execution. We want to timer all the methods in our controllers and see who sent the request. The approach is easily extensible, so we’ll keep the example to bare minimum.
Let’s create a timer aspect that will count execution time
@Aspect
@Component
public class Timer {
@Autowired
private Auditor auditor;
@Pointcut("execution(* com.kartashov.auditing.controllers.*.*(..))")
public void methods() {}
@Around("methods()")
public Object profile(ProceedingJoinPoint point) throws Throwable {
long start = System.currentTimeMillis();
Object result = point.proceed();
long time = System.currentTimeMillis() - start;
auditor.register(point.getSignature().toShortString(), time);
return result;
}
}
Make sure the aspect is marked as @Component
otherwise Spring will not pick it up. We also need to create an Auditor who would store the execution context for the duration of request.
public class Auditor {
private final static Logger logger = LoggerFactory.getLogger(Auditor.class);
private List<Record> records = new ArrayList<>();
private String remoteAddress;
void register(String method, long time) {
records.add(new Record(method, time));
}
public void setRemoteAddress(String remoteAddress) {
this.remoteAddress = remoteAddress;
}
@PreDestroy
public void trace() {
logger.info("Audit record. Request from {}\n{}",
remoteAddress,
records.stream().map(Record::toString).collect(Collectors.joining("\n")));
}
static class Record {
...
}
}
We also need to report remote address from our controllers. For the sake of exposition I will add remote address manually and not use AspectJ for this.
@RestController
public class PingController {
@Autowired
private Auditor auditor;
@RequestMapping("/")
public String ping(HttpServletRequest request) {
auditor.setRemoteAddress(request.getRemoteAddr());
return Instant.now().toString();
}
}
And the final thing we need to explain to our Spring boot application what we’re up to:
- Enable AspectJ proxies
- Make Auditor request scoped
- Make sure that Sprign autowires not the object but a scoped proxy that would create separate instances for every web request
@SpringBootApplication
@EnableAspectJAutoProxy
@Configuration
public class Application {
public static void main(String... args) {
SpringApplication.run(Application.class, args);
}
@Bean
@Scope(value = "request", proxyMode = ScopedProxyMode.TARGET_CLASS)
public Auditor auditor() {
return new Auditor();
}
}
We’re all set. Running the application will give us the following trace
2016-06-17 13:23:49.440 INFO 6738 : Audit record. Request from 0:0:0:0:0:0:0:1
PingController.ping(..): 24ms
This approach can be further extended with factories that would produce different request scoped Auditors for different situations. Important point to take here is that by autowiring request scoped Auditors we are keeping cross-cutting auditing aspect mostly separate from our business logic.
16 Jun 2016
At some point of every company’s adult life it starts reassessing the sanity and quality of its data set. One of the ways to regain the trust is to introduce auditing and versioning of database entities, which is exactly what Hibernate Envers is for.
As far as Envers is concerned all you need to do is to sparkle a couple of annotations around
@Entity
@Table(name = "users")
@Audited
@AuditTable("users_audit")
public class User {
...
}
The majority of interesting features are hidden behind the scenes, including creation of the revision tracking table and users_audit
table as well as storing the audit records when you write the data to the database. In a certain way Envers’ unique selling proposition is to be as transparent as possible while adding audit storing capabilities. While adding auditing you’re not required to change your database schema.
The hiding complex stuff behind a facade has implications. It’s quite painful to fetch the objects back efficiently, and may require legwork in worst case scenario. On a positive side the default simple setup is quite trivial.
Project setup
Let’s start with a simple application that has a set of users and these users can send invitations. When an invitation is being sent we want to capture the name, email of the inviter and the invitation email. If the user at some point changes her name or the email we still want to be able to get the original inviter data.
We’ll need the following dependencies for our spring boot project
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-envers</artifactId>
</dependency>
<dependency>
<groupId>org.aspectj</groupId>
<artifactId>aspectjweaver</artifactId>
</dependency>
For some odd reason, you’ll need to add org.aspectj:aspectjweaver
dependency to
the project.
We’ll start with the User entity
@Entity
@Table(name = "users")
@Audited
@AuditTable("users_audit")
public class User {
@Id
@GeneratedValue
private Long id;
private String firstName, lastName, email;
}
For events we would like to be a tiny bit more flexible anticipating a huge set of different events ahead of us.
@Entity
@Table(name = "events")
@Inheritance(strategy = InheritanceType.JOINED)
@Audited
public abstract class Event {
@Id
@GeneratedValue
private Long id;
}
@Entity
@Table(name = "invitation_events")
@Audited
@AuditTable("invitation_events_audit")
public class InvitationEvent extends Event {
@ManyToOne
private User inviter;
private String invitationEmail;
}
Envers didn’t like InheritanceType.TABLE_PER_CLASS
Now to the spring-data-envers
magic. Create repositories extending RevisionRepository<E, ID, N>
like following
public interface UserRepository extends
RevisionRepository<User, Long, Integer>,
CrudRepository<User, Long> {
}
public interface EventRepository extends
RevisionRepository<Event, Long, Integer>,
CrudRepository<User, Long> {
}
It’s also important to change repositoryFactoryBeanClass
while enabling JPA repositories
@SpringBootApplication
@EnableJpaRepositories(repositoryFactoryBeanClass = EnversRevisionRepositoryFactoryBean.class)
public class Application {
...
}
You’re all set. Let’s run a simple query and see what Envers does for us:
User user = new User();
user.setEmail("old-email@tradition.test");
userRepository.save(user);
InvitationEvent event = new InvitationEvent();
event.setInviter(user);
event.setInvitationEmail("invitation-email@kartashov.com");
eventRepository.save(event);
Long eventId = event.getId();
user.setEmail("completely-new-email@tradition.test");
userRepository.save(user);
At this point if we decide just to fetch the event by ID we’ll get the link to the updated user row in table users
and won’t see the email the invitation email was sent from. This is where RevisionRepository
comes into play. Let’s fetch the latest revision of our event and see how the inviter’s data looked like at the time the invitation was sent.
Event event = eventRepository
.findLastChangeRevision(eventId).getEntity();
assert event.getInviter().getEmail().equals("old-email@tradition.test");
Behind the scene
Let’s see what goes behind the scenes. For ddl-auto
set to create
, create-drop
or update
Envers would create all the auxiliary tables on application start. For our case we will end up with the following selection
Let’s see what are we asking our database to do for us when we’re after the latest revision on an event
In the first query we’re fetching all the revision IDs for the specified :event_id
SELECT events_audit.revision_id
FROM events_audit
CROSS JOIN revinfo
WHERE events_audit.id = :event_id
AND events_audit.revision_id = revinfo.rev
ORDER BY events_audit.revision_id ASC
In the next step we’re fetching all the revisions for the IDs that we just fetched, which certainly seems like a not such a smart thing to do considering that we could have done it in the first query. One possible explanation is that Envers lets you extend the revinfo
entities and add fields to it, therefore prepares for worst case scenario.
SELECT revinfo.rev,
revinfo.revtstmp
FROM revinfo
WHERE rev IN (:revision_ids)
In the next step we’re dealing with this beauty, and whoever feels it’s not a masterpiece needs an urgent appointment with a doctor
SELECT events_audit.id AS id1_1_,
events_audit.revision_id AS revision2_1_,
events_audit.revision_type AS revision3_1_,
invitation_events_audit.invitation_email,
invitation_events_audit.user_id,
CASE
WHEN invitation_events_audit.id IS NOT NULL THEN 1
WHEN events_audit.id IS NOT NULL THEN 0
END AS clazz
FROM events_audit
LEFT OUTER JOIN invitation_events_audit invitation_events_audit
ON events_audit.id = invitation_events_audit.id
AND events_audit.revision_id = invitation_events_audit.revision_id
WHERE events_audit.revision_id = (
SELECT MAX(ea.revision_id)
FROM events_audit ea
WHERE ea.revision_id <= ?
AND events_audit.id = ea.id)
AND events_audit.revision_type <> :revision_type
AND events_audit.id = :event_id
A lot of the clutter is caused by joined table inheritance strategy.
The last query is the lazily loaded user revision which pretty much mimics the last query only in this case we don’t have to deal with inheritance hierarchies:
SELECT users_audit.id,
users_audit.revision_id,
users_audit.revision_type,
users_audit.email,
users_audit.first_name,
users_audit.last_name
FROM users_audit
WHERE users_audit.revision_id = (
SELECT max(ua.revision_id)
FROM users_audit ua
WHERE ua.revision_id <= :revision_id
AND user_audit.id = ua.id)
AND user_audit.revision_type <> :revision_type
AND user_audit.id = :user_id
We are looking for a revision (create or update), based on :user_id
. Our target :revision_id
might not have affected the user row, therefore we’re looking for max(revision_id)
for this specific user, which is less than or equal to our target :revision_id
.
Querying data
For querying the data you have a Criteria API which is similar to JPA Criteria API.
Now let’s say we want to find all invitations that where sent from a specific email address, no matter who owns this email now and whether it’s currently used by any user in the database. The best up-to-date documentation can be found in Hibernate User Guide
In our case unfortunately a surprise is waiting for us. Innocently looking query throws UnsupportedOperationException
return AuditReaderFactory(entityManager)
.createQuery()
.forRevisionsOfEntity(InvitationEvent.class, true, false)
.traverseRelation("inviter", JoinType.LEFT)
.add(AuditEntry.property("email").eq(email))
.getResultList();
Let’s just hope that this will be addressed soon.
10 Jun 2016
Sorting domain objects can be challenging because it may require significant effort to consider all edge cases and produce a robust sorting order that meets business requirements. For example, consider a client database where users are sorted first by city, then by zip code, then by last name, and finally by first name. If any of these properties are not set, we want to move the user to the end of the list. It may also be necessary to ignore the case when comparing cities. An example of a simple result set in tabulated form is shown below:
| Brisbane | 4000 | Burke | Jason |
| Brisbane | null | Appleseed | Frank |
| melbourne | 3001 | Collins | Lindon |
| Melbourne | 3003 | Collins | Grant |
| null | 1000 | null | Matthew |
The corresponding User class could look like this:
class User {
@Nullable public String getCity() { ... }
@Nullable public Integer getZipCode() { ... }
@Nullable public String getFirstName() { ... }
@Nullable public String getLastName() { ... }
}
Although it’s not an overly complex task, composing multiple comparators can be generalized. Here is a pseudo-code algorithm for this purpose:
result = 0
for (comparator : comparators):
result = comparator.compare(a, b)
if (result != 0):
break
Using the flexibility of Java 8, we can create a generic ComparatorBuilder
to simplify this task, particularly in regards to null checks:
public class ComparatorBuilder<T> {
private final List<Comparator<T>> steps = new ArrayList<>();
public <S extends Comparable<S>>
ComparatorBuilder<T> by(Function<T, S> property) {
return by(property, Function.identity());
}
public <S extends Comparable<S>, Q>
ComparatorBuilder<T> by(Function<T, Q> property, Function<Q, S> converter) {
steps.add((T a1, T a2) -> {
Q q1 = property.apply(a1);
Q q2 = property.apply(a2);
S s1 = q1 == null ? null : converter.apply(q1);
S s2 = q2 == null ? null : converter.apply(q2);
if (s1 == null) {
return (s2 == null) ? 0 : 1;
} else {
return (s2 == null) ? -1 : s1.compareTo(s2);
}
});
return this;
}
public Comparator<T> build() {
return (T a, T b) -> {
int result = 0;
for (Comparator<T> step : steps) {
result = step.compare(a, b);
if (result != 0) {
break;
}
}
return result;
};
}
}
For example, we can create a comparator for the User
class as follows:
Comparator<User> comparator = new ComparatorBuilder<User>()
.by(User::getCity, String::toLowerCase)
.by(User::getZipCode)
.by(User::getLastName)
.by(User::getFirstName)
.build();
Collections.sort(users, comparator);
Overall, this approach is quite neat compared to the alternatives.
19 May 2016
This article provides a solution to optimize the performance of a large database by using memoization and caching techniques in the Database
and Integrator
services. The goal is to reduce the stress on the database by providing read-only access to the data and perform calculations on this data. In this solution, you will use a range of Spring caching tools to achieve these goals.
To start, you create a simple class called Database
that loads a time series for you:
public class Database {
public double[] load(String series) {
... // horribly expensive database access goes here
}
}
You also create a class Integrator
that uses this database to calculate the sum of the time series, producing an integral:
public class Integrator {
private Database database;
public double[] run(String series) {
double[] data = database.load(series);
double[] result = new double[data.length];
double sum = 0;
for (int i = 0; i < data.length; i++) {
sum += data[i];
result[i] = sum;
}
return result;
}
}
However, this design has some drawbacks; if a new data point is added to the time series in the database, the entire time series must be retrieved again. Even if there is no new data, if we request the integrated time series, the original time series will still be loaded and the calculation will be performed.
One solution to this problem is to use memoization, but because this is a stateful system, our caching strategy cannot be overly simplistic. To address these issues, we will use a range of Spring caching tools.
We will begin by adding caching to the Database
and Integrator
services:
public class Integrator {
...
@Cacheable(cacheNames = "integral", key = "#series")
public double[] run(String series) {
...
}
}
public class Database {
...
@Cacheable(cacheNames = "nominal", key = "#series")
public double[] load(String series) {
...
}
}
We have made progress by memoizing and caching everything, but next step is crucial. We need to coordinate cache eviction with the arrival of new data. To achieve this, we will introduce a new abstraction called a Repository
:
public class Repository {
...
private Database database;
@Caching(
evict = @CacheEvict(value = "integral", key = "#series"),
put = @CachePut(value = "nominal", key = "#series")
)
public double[] update(String series, double value) {
double[] e = database.load(series);
double[] d = new double[e.length + 1];
System.arraycopy(e, 0, d, 0, e.length);
d[e.length] = value;
return d;
}
@Caching(evict = {
@CacheEvict(value = "nominal", key = "#secret"),
@CacheEvict(value = "integral", key = "#secret")
})
public void reset(String series) {
}
}
When it comes to this design, there are a few subtle details to keep in mind. The update
method, for example, doesn’t update the database directly. Instead, it:
- Retrieves the value from the “nominal” cache or the database by executing
database.load()
- Adds a value to the end of the loaded series
- Stores the updated series in the “nominal” cache using the
@CachePut
annotation
- Clears the “integral” cache for the updated series using the
@CacheEvict
annotation
Additionally, the reset
method does not access the database at all. It simply resets both the “nominal” and “integral” caches using annotations.
It’s worth noting that this approach may seem complex at first glance, but it becomes more practical when there are multiple hierarchical calculations that need to be run on the data. By specifying functional dependencies in the Java code and caching policies in the annotations, the concerns are separated and the code becomes more reusable and testable.
05 Apr 2016
Let’s start adding user management to our application. There are two kind of personal information that we want to capture at the moment: existing users and invitations sent out for new potential customers.
Invitations and Users have a lot in common so let’s start with entities for our Spring Boot application
@Entity
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)
public abstract class Person {
private @Id @GeneratedValue(strategy = GenerationType.TABLE) Long id;
private String firstName;
private String lastName;
private String email;
...
}
@Entity
public class Invitation extends Person {
...
}
@Entity
public class User extends Person {
...
}
Let’s start with keeping invitations and users in completely separate tables by using InheritanceType.TABLE_PER_CLASS
To work with the set of data create a simple repository
public interface PersonRepository extends JpaRepository<Person, Long> {
Page<Person> findAll(Pageable pageable);
}
If we now issue a simple Pageable request
Pageable pageable = new PageRequest(0, 10, Sort.Direction.ASC, "firstName", "lastName");
Page<Person> firstPage = personRepository.findAll(pageable);
for (Person person : firstPage) {
System.out.println(person);
}
So let’s see what kind of queries the database issues in the background by placing the following application.properties
in /src/test/resources
spring.jpa.show-sql = true
spring.jpa.properties.hibernate.format_sql = true
logging.level.org.hibernate.SQL = DEBUG
logging.level.org.hibernate.type.descriptor.sql.BasicBinder = TRACE
If we stay with TABLE_PER_CLASS
strategy the hibernate issues the following query
SELECT person0_.id AS id1_1_,
person0_.email AS email2_1_,
person0_.first_name AS first_na3_1_,
person0_.last_name AS last_nam4_1_,
person0_.date AS date1_0_,
person0_.clazz_ AS clazz_
FROM ( SELECT id,
email,
first_name,
last_name,
null AS date,
1 AS clazz_
FROM user UNION ALL
SELECT id,
email,
first_name,
last_name,
date,
2 AS clazz_
FROM invitation
) person0_
ORDER BY person0_.first_name ASC,
person0_.last_name ASC
LIMIT ?
For SINGLE_TABLE
strategy we get
SELECT person0_.id AS id2_0_,
person0_.email AS email3_0_,
person0_.first_name AS first_na4_0_,
person0_.last_name AS last_nam5_0_,
person0_.date AS date6_0_,
person0_.dtype AS dtype1_0_
FROM person person0_
ORDER BY person0_.first_name ASC,
person0_.last_name ASC
LIMIT ?
And for the JOINED
strategy we get
SELECT person0_.id AS id1_1_,
person0_.email AS email2_1_,
person0_.first_name AS first_na3_1_,
person0_.last_name AS last_nam4_1_,
person0_2_.date AS date1_0_,
CASE
WHEN person0_1_.id IS NOT NULL THEN 1
WHEN person0_2_.id IS NOT NULL THEN 2
WHEN person0_.id IS NOT NULL THEN 0
END as clazz_
FROM person person0_
LEFT OUTER JOIN user person0_1_
ON person0_.id=person0_1_.id
LEFT OUTER JOIN invitation person0_2_
ON person0_.id=person0_2_.id
ORDER BY person0_.first_name ASC,
person0_.last_name ASC
LIMIT ?
That’s it for now
30 Jan 2016
Let’s assume we now have our database of IoT devices and we’d like to provide a simple search interface that would allow us to send queries like the following
location within 10 km from (-37.814, 144.963) and status.stateOfCharge < 10%
Although it might sound like an intimidatingly complex task, the fact is it’s rather simple to achive with ANTLR. Let’s begin with the query language. The following grammar will do for now.
grammar Query;
@header {
package com.kartashov.postgis.antlr;
}
AND : ',' | 'and' ;
OR : 'or' ;
INT : '-'? [0-9]+ ;
DOUBLE : '-'? [0-9]+'.'[0-9]+ ;
WITHIN : 'within' ;
FROM : 'from' ;
ID : [a-zA-Z_][a-zA-Z_0-9]* ;
STRING : '"' (~["])* '"' | '\'' (~['])* '\''
{
String s = getText();
setText(s.substring(1, s.length() - 1));
}
;
EQ : '=' '=' ? ;
LE : '<=' ;
GE : '>=' ;
NE : '!=' ;
LT : '<' ;
GT : '>' ;
SEP : '.' ;
WS : [ \t\r\n]+ -> skip ;
query : expression ;
expression
: expression AND expression # AndExpression
| expression OR expression # OrExpression
| predicate # PredicateExpression
| '(' expression ')' # BracketExpression
;
reference : element (SEP element)* ;
element : ID ;
predicate
: reference WITHIN amount FROM location # LocationPredicate
| reference operator term # OperatorPredicate
;
location : '(' DOUBLE ',' DOUBLE ')' ;
term
: reference
| value
| amount
;
operator
: LE
| GE
| NE
| LT
| GT
| EQ
;
amount : value unit ;
value
: INT # IntegerValue
| DOUBLE # DoubleValue
| STRING # StringValue
| ID # StringValue
;
unit :
| '%'
| ID
;
If you’re not using IntelliJ IDEA, please start doing so for this project, as the ANTLR4 plugin can help you a great deal debugging your grammars.
There are a lot of things that missing in this language, from a very relaxed handling of units of measurement all the way to not checking the type of device properties. As they loved to say at my alma mater, we leave the proof of the theorem as an exercise for a curious reader.
With the language so simple it’s actually easy to just walk down the syntactic tree and generate JPQL statement on our way down. We extend the class QueryBaseVisitor
that was generated by the ANTLR maven plugin. This pattern is fairly widely used so it would be helpful to understand it better. The abstract autogenerated parent registers callbacks, and by overriding these callbacks you can 1) emit the target language constructs, 2) update the internal state of the visitor, and 3) adjust the walking algorithm by skipping certain branches of a syntactic tree.
public class QueryVisitor extends QueryBaseVisitor<String> {
private static final GeometryFactory geometryFactory = new GeometryFactory();
private final Map<String, Object> parameters = new HashMap<>();
public Map<String, Object> getParameters() {
return parameters;
}
private String addParameter(Object value) {
String name = "var" + parameters.size();
parameters.put(name, value);
return name;
}
@Override
public String visitQuery(QueryParser.QueryContext ctx) {
return "SELECT d FROM Device AS d WHERE " + visit(ctx.expression());
}
@Override
public String visitBracketExpression(QueryParser.BracketExpressionContext ctx) {
return visit(ctx.expression());
}
@Override
public String visitAndExpression(QueryParser.AndExpressionContext ctx) {
return visit(ctx.expression(0)) + " AND " + visit(ctx.expression(1));
}
@Override
public String visitPredicateExpression(QueryParser.PredicateExpressionContext ctx) {
return visit(ctx.predicate());
}
@Override
public String visitOrExpression(QueryParser.OrExpressionContext ctx) {
return "(" + visit(ctx.expression(0)) + " OR " + visit(ctx.expression(1)) + ")";
}
@Override
public String visitOperator(QueryParser.OperatorContext ctx) {
return ctx.getText();
}
@Override
public String visitIntegerValue(QueryParser.IntegerValueContext ctx) {
return addParameter(Integer.valueOf(ctx.getText()));
}
@Override
public String visitDoubleValue(QueryParser.DoubleValueContext ctx) {
return addParameter(Double.valueOf(ctx.getText()));
}
@Override
public String visitStringValue(QueryParser.StringValueContext ctx) {
return addParameter(ctx.getText());
}
@Override
public String visitAmount(QueryParser.AmountContext ctx) {
Amount<?> amount = Amount.valueOf(ctx.getText());
@SuppressWarnings("unchecked")
double value = amount.doubleValue((Unit) amount.getUnit().getStandardUnit());
return addParameter(value);
}
@Override
public String visitUnit(QueryParser.UnitContext ctx) {
return ctx.getText();
}
@Override
public String visitElement(QueryParser.ElementContext ctx) {
return ctx.getText();
}
@Override
public String visitOperatorPredicate(QueryParser.OperatorPredicateContext ctx) {
String operator = visit(ctx.operator());
String value = visit(ctx.term());
String reference = visitReference(ctx.reference(), parameters.get(value).getClass());
return reference + " " + operator + " :" + value;
}
public String visitReference(QueryParser.ReferenceContext ctx, Class<?> type) {
List<String> elements = ctx.element().stream()
.map(this::visitElement)
.collect(Collectors.toList());
String base = "d." + elements.get(0);
if (elements.size() == 1) {
return base;
} else {
List<String> tail = elements.subList(1, elements.size());
String extract = "extract(" + base + ", '" + String.join("', '", tail) + "')";
if (type == Integer.class) {
return "CAST(" + extract + " integer)";
} else if (type == Double.class) {
return "CAST(" + extract + " double)";
} else {
return extract;
}
}
}
@Override
public String visitLocationPredicate(QueryParser.LocationPredicateContext ctx) {
String reference = visit(ctx.reference());
String location = visit(ctx.location());
String distance = visit(ctx.amount());
return "distance(" + reference + ", :" + location + ") <= :" + distance;
}
@Override
public String visitLocation(QueryParser.LocationContext ctx) {
double latitude = Double.valueOf(ctx.latitude().getText());
double longitude = Double.valueOf(ctx.longitude().getText());
Point point = geometryFactory.createPoint(new Coordinate(latitude, longitude));
point.setSRID(4326);
return addParameter(point);
}
@Override
public String visitTerm(QueryParser.TermContext ctx) {
if (ctx.amount() != null) {
return visit(ctx.amount());
} else if (ctx.value() != null) {
return visit(ctx.value());
} else {
return visit(ctx.reference());
}
}
}
Nothing here is particalarly complex or complicated. I use JScience (never ever use it) to convert units of measurement to standard units (like miles and kilometers to meters, and percentages to double numbers).
The rest of the project is boilerplate code mostly, like search service
@Service
@Transactional
public class SearchService {
private static final Logger logger = LoggerFactory.getLogger(SearchService.class);
@Autowired
private EntityManager entityManager;
@SuppressWarnings("unchecked")
public List<Device> search(String query) throws IOException {
logger.info("Parsing search query {}", query);
ANTLRInputStream input = new ANTLRInputStream(
new ByteArrayInputStream(query.getBytes(StandardCharsets.UTF_8)));
QueryLexer lexer = new QueryLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
QueryParser parser = new QueryParser(tokens);
ParseTree tree = parser.query();
logger.info("Expression tree: {}", tree.toStringTree(parser));
QueryVisitor visitor = new QueryVisitor();
String jpqlQuery = visitor.visit(tree);
logger.info("Resulting JPQL query:\n{}", jpqlQuery);
Query queryObject = entityManager.createQuery(jpqlQuery);
for (Map.Entry<String, Object> entry : visitor.getParameters().entrySet()) {
queryObject.setParameter(entry.getKey(), entry.getValue());
}
return queryObject.getResultList();
}
}
Now we can send queries to the database like following
List<Device> devices = searchService
.search("location within 10 km from (-37.814, 144.963) and status.stateOfCharge < 10%");
Rather neat. You can see the complete source code on GitHub
As a warning note: you probably should not build JPQL queries like this, unless you PoC’ing like I do here. A much more sound would be using JPQ Criteria API.
29 Jan 2016
As you may be aware, the use of @Converters
allows for the seamless translation of JSON objects to and from domain objects. However, there is a limitation to this approach in that it prohibits the use of JSON functionalities in JPQL queries, forcing one to rely solely on native SQL queries. To overcome this limitation, a custom Hibernate dialect that supports JSONB type and functions can be implemented, thereby exposing them to the JPQL level.
To accomplish this, we can create a custom Hibernate dialect that is based on PostgisDialect
, as we plan to incorporate both PostGIS and JSON in our application. This can be achieved by overriding the registerTypesAndFunctions
function. The source code for this implementation can be found on GitHub.
Our objective is to achieve three key goals:
- the ability to have custom types backed by JSON to be available in JPA entities
- the ability to utilize
jsonb_*
functions in JPLQ queries
- the ability to easily add new types mapped to JSONB columns with minimal coding.
By creating a custom Hibernate dialect, we can accomplish these goals, making our development process more efficient and streamlined.
Let’s start by creating a custom Hibernate dialect, basing it on PostgisDialect
, as we plan to use both PostGIS and JSON in our application. We need to override the registerTypesAndFunctions
function:
@Override
protected void registerTypesAndFunctions() {
super.registerTypesAndFunctions();
registerColumnType(JSONTypeDescriptor.INSTANCE.getSqlType(), "jsonb");
registerFunction("extract",
new StandardSQLFunction("jsonb_extract_path_text", StandardBasicTypes.STRING));
}
When registering a column type, we inform Hibernate on how to handle instances where the target column is of type jsonb
. The JSONTypeDescriptor
class registers the SQL type name and the converters that translate between the database object and the domain object, which are known as “binder” and “extractor”. The process is relatively simple, as JSONB is received as a PGobject
which is simply a tagged string.
By registering a function, we are able to use that function, such as jsonb_extract_path_text
, in our code. The StandardSQLFunction
class is utilized to explain how to translate the standard form of f(x, y, ...)
into plain SQL. This process is also relatively straightforward.
However, just having a JSON string is only half of the solution. We also need to translate between domain objects and JSON strings. To accomplish this, we use the Jackson library for its powerful data binding capabilities:
public class JSONBackedTypeDescriptor<T> extends AbstractTypeDescriptor<T> {
private static final Logger logger = LoggerFactory.getLogger(JSONBackedTypeDescriptor.class);
private static final ObjectMapper objectMapper = new ObjectMapper();
public JSONBackedTypeDescriptor(Class<T> type) {
super(type);
JavaTypeDescriptorRegistry.INSTANCE.addDescriptor(this);
}
@Override
public String toString(T value) {
try {
return objectMapper.writeValueAsString(value);
} catch (JsonProcessingException e) {
logger.warn("Cannot convert map {} to string", e);
return "{}";
}
}
@Override
public T fromString(String string) {
try {
return objectMapper.readValue(string, getJavaTypeClass());
} catch (IOException e) {
logger.warn("Cannot read value from {}", string, e);
return null;
}
}
@Override
public <X> X unwrap(T value, Class<X> type, WrapperOptions options) {
if (value == null) {
return null;
}
if (type.isAssignableFrom(value.getClass())) {
return type.cast(value);
}
if (String.class.isAssignableFrom(type)) {
return type.cast(toString(value));
}
throw unknownUnwrap(type);
}
@Override
public <X> T wrap(X value, WrapperOptions options) {
if (value == null) {
return null;
}
if (value.getClass().isAssignableFrom(getJavaTypeClass())) {
return getJavaTypeClass().cast(value);
}
if (value instanceof String) {
return fromString((String) value);
}
throw unknownWrap(value.getClass());
}
}
To ensure seamless integration, it is essential to link our two converters together.
Domain Object <-- JavaTypeDescriptor --> String <-- SqlTypeDescriptor --> JSONP
This includes creating a connection between the domain object, the JavaTypeDescriptor
, the string representation, the SqlTypeDescriptor
, and finally, the JSONP format. By creating an abstract class that facilitates this connection, we can ensure a smooth integration process:
public abstract class JSONBackedType<T> extends AbstractSingleColumnStandardBasicType<T> {
public JSONBackedType(JSONBackedTypeDescriptor<T> javaTypeDescriptor) {
super(JSONTypeDescriptor.INSTANCE, javaTypeDescriptor);
}
@Override
public String[] getRegistrationKeys() {
return new String[] { getJavaTypeDescriptor().getJavaTypeClass().getCanonicalName() };
}
@Override
public String getName() {
return getJavaTypeDescriptor().getJavaTypeClass().getName();
}
}
With all the necessary non-domain specific code in place, we can now proceed to create our entities and repositories. Let’s take a look at our Device
entity, defined as follows:
@Entity
@Table(name = "devices")
public class Device {
@Id
@Column(name = "id")
private String id;
@Column(name = "location", nullable = false, columnDefinition = "geometry(Point,4326)")
private Point location;
@Column(name = "status", nullable = false, columnDefinition = "jsonb")
@Type(type = "com.kartashov.postgis.types.StatusType")
private Status status = new Status();
...
}
The Device
entity is accompanied by an embedded Status
object, which is defined as follows:
public class Status {
private double stateOfCharge;
private String lifeCycle;
...
}
To link the Java and SQL type descriptors, we have defined the StatusType
class, which extends the JSONBackedType
class:
public class StatusType extends JSONBackedType<Status> {
private final static JSONBackedTypeDescriptor<Status> descriptor =
new JSONBackedTypeDescriptor<>(Status.class);
public StatusType() {
super(descriptor);
}
}
With our custom Hibernate dialect in place, we can now seamlessly integrate JSONB functionality into our JPQL queries. For instance, we can easily retrieve all devices with a state of charge greater than 10% by utilizing the following query within our DeviceRepository
interface:
public interface DeviceRepository extends CrudRepository<Device, String> {
@Query("SELECT d FROM Device d WHERE CAST(extract(d.status, 'stateOfCharge') float) > 0.1")
List<Device> findHealthyDevices();
...
}
This JPQL query will be translated into the following SQL query:
select device0_.id as id1_0_,
device0_.location as location2_0_,
device0_.status as status3_0_
from devices device0_
where cast(jsonb_extract_path_text(device0_.status, 'stateOfCharge') as float4) > 0.1
This implementation effectively achieves our goal of integrating JSONB functionality into our JPQL queries, allowing for easy and efficient data retrieval from our devices.
31 Dec 2015
PostGIS is a powerful PostgreSQL extension designed to work with spatial data, including points, lines, and polygons. To begin experimenting with this extension, it is often easiest to pull a Docker image with PostGIS already installed. The following command can be used to do so:
sudo apt-get install docker.io
sudo docker run --name postgis -e POSTGRES_PASSWORD=postgis -d mdillon/postgis:9.4
Once the image is pulled, you can access the psql
console by executing:
sudo docker exec -it postgis psql -U postgres
At the prompt, you can check the version of PostgreSQL/PostGIS by running the following command:
SELECT version();
SELECT postgis_full_version();
As Docker runs the database in a container, you will need the IP address if you wish to access PostgreSQL from outside the container. The following command can be used to retrieve the IP address:
sudo docker inspect postgis | grep IPAddress
ou can also use pgcli
, a user-friendly cli tool, to access the database:
pgcli -h 172.17.0.2 -U postgres
To further explore the capabilities of PostGIS, we can create a simple Spring Boot project. The source code for this project can be found on GitHub
Initliaze the application
spring.jpa.database-platform = org.hibernate.spatial.dialect.postgis.PostgisDialect
spring.jpa.show-sql = true
spring.jpa.hibernate.ddl-auto = update
spring.datasource.url = jdbc:postgresql://172.17.0.1:5432/postgres?user=postgres&password=postgis
spring.datasource.driver-class-name = org.postgresql.Driver
In this example, we will use a device entity that concerns itself primarily with the location of the device:
@Entity
@Table(name = "devices")
public class Device {
@Id
@Column(name = "id")
private String id;
@Column(name = "location", nullable = false, columnDefinition = "geometry(Point,4326)")
private Point location;
...
}
It is essential to define a custom type for the location attribute, so that Hibernate can link Java geometry types with SQL geometry types.
The Hibernate dialect allows us to send queries like the following through JPQL:
public interface DeviceRepository extends CrudRepository<Device, String> {
@Query("SELECT d FROM Device AS d WHERE within(d.location, :polygon) = TRUE")
List<Device> findWithinPolygon(@Param("polygon") Polygon polygon);
}
You can take a pick at the class org.hibernate.spatial.dialect.postgis.PostgisDialect
to see what functions are available on JPQL side and how they map to PostGIS functions
Now we can test the search functionality
Device device = new Device();
device.setId("de-001");
Point point = (Point) new WKTReader().read("POINT(5 5)");
device.setLocation(point);
deviceRepository.save(device);
Polygon polygon = (Polygon) new WKTReader().read("POLYGON((0 0,0 10,10 10,10 0,0 0))");
List<Device> devices = deviceRepository.findWithinPolygon(polygon);
System.out.println(devices);
27 Dec 2015
Docker is an incredibly useful tool that allows for easy experimentation with new software. It enables the simultaneous installation, suspension, and deletion of multiple versions of programs, such as Postgres, with minimal effort. The installation process is as simple as executing the following command:
sudo apt-get install docker.io
sudo docker run --name postgres -e POSTGRES_PASSWORD=postgres -d postgres:9.5
Once Docker is installed, you can access an interactive Postgres shell by executing the following command:
sudo docker exec -it postgres psql -U postgres
With this tool, we can now delve into the new JSON functionality available in Postgres 9.5. We will begin by creating two tables, devices and observations, and adding some data to them:
CREATE TABLE sensors (
id integer PRIMARY KEY,
type character varying(64) NOT NULL,
config jsonb NOT NULL
);
CREATE INDEX config_gin ON sensor USING gin(config);
INSERT INTO sensors (id, type, config)
VALUES (1, 'soil-moisture', '{ "alpha": 0.543, "beta": -2.312, "enabled": true }'),
(2, 'soil-temperature', '{ "enabled": true, "depth": 0.24 }'),
(3, 'humidity', '{ "enabled": false, "height": 1.34, "device": { "version": "3.4", "supported": true } }');
We can then enable the humidity sensor:
UPDATE sensors
SET config = config || '{ "enabled": true }'::jsonb
WHERE id = 3;
Remove the alpha
and the beta
parameters used for soil moisture calibration:
UPDATE sensors
SET config = config - 'alpha' - 'beta'
WHERE type = 'soil-moisture';
And now let’s remove the supported tag from all device sections:
UPDATE sensors
SET config = config #- '{device,supported}';
Fetch the device version information wherever it’s available:
WITH versioning AS (
SELECT id, type, config #>> '{device,version}' AS version
FROM sensors
)
SELECT *
FROM versioning
WHERE version IS NOT NULL;
Find all the sensors where the depth is specified:
SELECT *
FROM sensors
WHERE config ? 'depth';
Let’s find all properly placed sensors, where either the depth
or the height
are specified, but not both:
SELECT *
FROM sensors
WHERE config ?| array['depth', 'height']
AND NOT config ?& array['depth', 'height'];
For further reading on the JSON functionality available in Postgres 9.5, I recommend consulting the official documentation.
16 Nov 2015
The following table, referred to as “sensors”, displays sensor data in the format of an id
, location
, and status
. The status
field is in the form of a JSON object.
+------+------------+---------------------------------------+
| id | location | status |
|------+------------+---------------------------------------|
| 1 | backyard | {"healthy": false, "temperture":0.0} |
| 2 | frontyard | {"healthy": false, "temperture":12.7} |
| 3 | unknown | {"healthy": true, "temperture":28.1} |
| 4 | farm | {"healthy": false, "temperture":48.1, |
| | | "errorCode": 78} |
+------+------------+---------------------------------------+
We can extract a list of all possible status fields with the following query:
SELECT DISTINCT json_object_keys(status) AS status_field
FROM sensors
ORDER BY status_field;
which produces the following result:
+----------------+
| status_field |
|----------------|
| errorCode |
| healthy |
| temperture |
+----------------+
To determine the highest temperature recorded by any sensor, we can use the following query:
SELECT MAX((status->>'temperature')::text)
FROM sensors;
This query returns:
+-------+
| max |
|-------|
| 48.1 |
+-------+
However, if we wish to determine the highest temperature recorded by only healthy sensors, we can use the following query:
SELECT MAX((status->>'temperture')::text)
FROM sensors
WHERE (status->>'healthy')::bool
which produces the following result:
+-------+
| max |
|-------|
| 28.1 |
+-------+
Additionally, we can retrieve the average temperature of healthy sensors grouped by location with the following query:
SELECT location,
AVG((status->>'temperture')::float) AS temperature
FROM sensors
WHERE (status->>'healthy')::bool
GROUP BY location;
which produces the following result:
+------------+---------------+
| location | temperature |
|------------+---------------|
| unknown | 28.1 |
+------------+---------------+
We can further expand upon this data by using the functions json_each
, json_each_text
, and so on, to expand the JSON status object into a larger table. So in our next iteration, we can use the following SQL query to fetch the packed records:
SELECT id,
json_each_text(status) AS status_field
FROM sensors;
This query returns:
+------+-------------------+
| id | json_each_text |
|------+-------------------|
| 1 | (healthy,false) |
| 1 | (temperture,0.0) |
| 2 | (healthy,false) |
| 2 | (temperture,12.7) |
| 3 | (healthy,true) |
| 3 | (temperture,28.1) |
| 4 | (healthy,false) |
| 4 | (temperture,48.1) |
| 4 | (errorCode,78) |
+------+-------------------+
We will then proceed to unfold the status_field
record through the following query:
SELECT id,
(status).key,
(status).value
FROM (SELECT id,
json_each(status) AS status
FROM sensors) AS statuses;
This will yield the following table:
+------+------------+---------+
| id | key | value |
|------+------------+---------|
| 1 | healthy | false |
| 1 | temperture | 0.0 |
| 2 | healthy | false |
| 2 | temperture | 12.7 |
| 3 | healthy | true |
| 3 | temperture | 28.1 |
| 4 | healthy | false |
| 4 | temperture | 48.1 |
| 4 | errorCode | 78 |
+------+------------+---------+
Note that the third column has a type of text.
SELECT DISTINCT pg_typeof((status).key) AS key_type,
pg_typeof((status).value) AS value_type
FROM (SELECT id,
json_each_text(status) AS status
FROM sensors) AS statuses;
Which gives us
+------------+--------------+
| key_type | value_type |
|------------+--------------|
| text | text |
+------------+--------------+
It is important to note that when using json_each, the value_type should be json and not text. This is significant because it preserves the data type information (as limited as it may be in JSON) as opposed to casting it to text.
Additionally, it is worth mentioning that utilizing the pgcli tool while experimenting with Postgres proved to be extremely beneficial. Further information regarding composite types can be found in the Postgres documentation: http://www.postgresql.org/docs/9.4/static/rowtypes.html