Result Set Processor
09 May 2017When it comes to fetching data from a database, there’s often a dilemma of whether to join tables and retrieve all the data in one go, or fetch only the necessary data for the moment and risk N+1 query problems. Let’s assume we have a list of controllers, each with a set of sensors attached. The result should look something like:
[
{
"controllerId": "c001",
"sensors": []
},
{
"controllerId": "c002",
"sensors": [
{
"sensorId": "s001",
"active": false
}
]
}
]
One solution is to use the following SQL query:
SELECT controllers.id AS controller_id,
sensors.id AS sensor_id
sensors.active
FROM controllers
JOIN sensors
ON contoller.id = sensors.controller_id;
$controller = null;
foreach ($tupes as $tuple) {
if ($tuple.controller != $controller) {
$controller = $tuple.controller;
}
$controller.append($tuple.sensor);
}
However, this can become tedious and complex for more advanced use cases. Another solution is to use a foreach loop and fetch data for each controller individually:
foreach (SELECT controllers.id FROM controllers => $id) {
SELECT * FROM sensors WHERE sensors.controller_id = $id;
}
This second solution also has its downsides and can lead to N+1 query problems.
In this article, the author presents a way to address this dilemma in PHP by using the concept of processing contexts. The basic idea is to define a ProcessingContext
class that keeps track of a set of records and provides methods for manipulating and extracting those records.
The job of processing result sets is a fascinating area to explore. The core data structure is a set of records:
abstract class ProcessingContext
{
private $records;
}
When there is no information about the structure of these records, the only meaningful operations for this context are collectAll
and collectHead
. This context can be called a Collector
:
class Collector extends ProcessingContext
{
public function collectAll(): array ...
public function collectHead() ...
}
Once we know that the $records are associative arrays, we can do more interesting things with them. However, whatever we want to do with these records, it should always start with picking a subset of each record for further processing. The next context is the Selector
. Even though knowing that the records are associative arrays allows us to add more operations to the context, we can still do what the Collector
does i.e. collectHead
and collectAll
:
class Selector extends Collector
{
public function selectValue(string $field): Converter ...
public function selectFields(strings... $fields): Converter ...
public function map(string $keyField, string $valueField): MapConverter ...
}
What is a Converter
or a MapConverter
? A selector allows you to pick fields from each record and place them in some sort of structure. For example, selectValue
lets you pick a value of a field and store it as a scalar, selectFields
lets you fetch an embedded associative array from each record, and map lets you create a new key/value pair from the values of two fields. The Converter
is the context in which the API user must decide what to do with the selected subrecord.
class Converter extends ProcessingContext
{
public function name(string $name): Selector ...
public function group(): Collector ...
public function groupInto(string $name): Selector ...
...
}
class MapConverter extends Converter
{
public function flatten(): Collector ...
public function flattenInto(string $name): Selector ...
...
}
So the name
method returns the subrecord back into the record it was extracted from under a new name. The group
method groups subrecords by using the remainder of each record as a group key. It does not return the group back into the record, so the result of group
is actually a collector, i.e. the records are the groups extracted by the selector. The groupInto
method not only groups subrecords but also pushes the groups back into the record.
I understand that the example provided may be complex and difficult to follow. Here is how I would simplify the example join above:
$query = "
SELECT controllers.id,
sensors.id AS sensor_id
sensors.active AS sensor_active
FROM controllers
JOIN sensors
ON contoller.id = sensors.controller_id;
";
$procedure = $database->prepare($query);
$procedure->processAll()
->selectByPrefix('sensor_')->group('sensors')
->collectAll();
The records would look like this:
| id | sensor_id | sensor_active |
+------+-----------+---------------+
| c001 | NULL | NULL |
| c002 | s001 | false |
Then, we select by prefix and group them into a record called sensors:
| id | sensors |
+------+-------------------------------+
| c001 | [] |
| c002 | [{ id: s001, active: false }] |
And that’s it! If you’d like oto see a working example, you can check out the example imeplementation and some unit tests for further reference.