✂️ Query Partitioning🔗
Query partitioning is a time and space efficient way to split a query into n sub-queries, that a mutually exclusive and collectively exhaustive (MECE). The original iterable is only iterated once, and the sub-queries are lazily evaluated.
Note
Paritioning is done lazily. It does not incur any performance penalty. Thus, it can be freely used even where logic may lead to some sub-queries eventually not being consumed.
One common use case is to split a query into two sub-queries, depending on a condition. So, for example, instead of writing:
numbers = [1, 2, 3, 4, 5]
even = [x for x in numbers if x % 2 == 0]
od = [x for x in numbers if x % 2 != 0]
This is more:
- Time-efficient. The iterable is only iterated once.
- Space-efficient. Both sub-queries are lazily evaluated, so they do not materialize the iterable, thus using less memory.
- Readable. Less repetitive, more declarative.
partition()
🔗
Yields n queries, each containing the elements that match the partition index selected. Supports infinite iterables, conditioned that there are finite sequences between elements of different partitions.
Examples:
>>> from fliq import q
>>> first, second, third = q(range(10)).partition(lambda x: x % 3, n=3)
>>> first.to_list(), second.to_list(), third.to_list()
([0, 3, 6, 9], [1, 4, 7], [2, 5, 8])
>>> even, odd = q([1, 2, 3]).partition(lambda x: x % 2 == 0)
>>> even.to_list(), odd.to_list()
([1, 3], [2])
Parameters:
-
by
(Union[IndexSelector[T], Predicate[T]]
) –IndexSelector that returns partition index for each element, in the range [0, n). Or, a Predicate to be used for a binary partition (when n=2). In case of a Selector, the first query will contain the elements in partition 0, the second query will contain the elements in partition 1, and so on. In case of a Predicate, the first query will contain the elements that don't satisfy the predicate (for alignment between 0 and False as the first index).
-
n
(int
, default:2
) –Optional. The number of partitions to create. Defaults to 2. Must be positive. When n=2,
by
can also be a Predicate.
Raises:
-
ValueError
–In case the partition index is outside the range [0, n).
-
TypeError
–In case the partition index is not an integer.