I’m still fairly new to Julia, even though I’ve been trying to learn it for a few years. It’s *extremely* powerful (fast, expressive, … whatever metric you want to use) but with that comes some complexity.

I saw this post in my feed and it seemed like a great bite-sized chunk of code to learn from. I *think* I understand everything that’s happening, even if I certainly couldn’t write that myself, with one exception.

The connection that for `Bool`

data, `all()`

is equivalent to `minimum()`

(it’s false as soon as there is one 0, otherwise it’s true) and `any()`

is equivalent to `maximum()`

(if there’s a 1 it’s true) took me a moment, but seems pretty cool. That wasn’t the problem I had.

The bit that surprised me was that for `ByRow`

calculations on a `DataFrame`

, `minimum()`

is **faster** than `all()`

. The reason this is so surprising for me is that I understand `all()`

from an R-perspective and my understanding was that `all()`

could short-circuit because as soon as it sees a `FALSE`

it can ignore any other values - the result is guaranteed to be `FALSE`

(yes, yes, up to missingness). Surely, a calculation of `minimum()`

needs to evaluate every value at least once (?). Where this might (must?) fall apart is that I’m thinking purely of vectors. Sure enough, checking out some timings on a vector in Julia shows `all()`

is near-instantaneous (after compilation)

```
x = rand(Bool, 100_000_000)
@time all(x)
0.009047 seconds (218 allocations: 9.531 KiB, 99.85% compilation time)
false
@time all(x)
0.000002 seconds
false
@time minimum(x)
0.091183 seconds (85.03 k allocations: 4.461 MiB, 41.98% compilation time)
false
@time minimum(x)
0.052287 seconds
false
```

I get similar results, expectedly, from R

```
x <- sample(c(TRUE, FALSE), 1e8, replace = TRUE)
microbenchmark::microbenchmark(
min = max(x),
any = any(x),
times = 10
)
# Unit: nanoseconds
# expr min lq mean median uq max neval
# min 208741173 210539351 223219500.3 212388892 222673528 285974960 10
# any 160 187 2403.4 295 5095 7451 10
```

So, what’s going on? I *think* the answer is that we’re not dealing with just a vector, it’s rows from a `DataFrame`

, right? Now, from the R side, that’s complicated enough - `rowwise()`

is a necessary thing because R stores a `data.frame`

as a list of vectors representing *columns*, so extracting a row means slicing across those.

I can reproduce the speedup in Julia (and honestly, I struggle to find a clean and fast way to do it in R) but the statement “This time things are very fast, as row-wise aggregation for maximum and minimum is optimized.” got me thinking - where should I have learned that? Google isn’t showing me any relevant results, so is this just a known thing? I can imagine that such an optimization for doing this might exist, but can anyone provide a reference or guide?? The author of the blog post used this optimization in a StackOverflow answer without challenge (no reference provided) so I feel like it’s potentially just something I should know.