Deriving Monad Comprehension

List comprehension, 或者说更 general 一点的 monad comprehension, 都是可以在 Lean 里边实现的，借助其强大的元编程功能。

实际上这个东西只是我一个库（LazyList，模拟Haskell惰性求值的表）的副产物，但具体实现的时候还是遇到了一些意想不到的地方，所以写成一篇文章来记录。关于所有的代码，见

lazylist

notch1p • Updated Jun 30, 2025

Introducing Comprehension

Basics

The notation for list comprehension is the operational interpretation of a set-builder notation.

集合在类型论里边通常用 dependent pair/coproduct 来表示。给定 universe , 有 formation rule

把 instantiate 成 , 就得到集合的一个 encoding:

将之记作 , 表示所有以类型的值为元素的集合。当然，如果需要编码一个完整的集合论，同样还应该包括集合论的外延公理 setext ，这个可以从函数外延和命题外延导出，而且涉及到几种不同 flavor （intensional/extensional/observational）的类型论之间处理函数外延的区别，是一个很有意思的东西，但是这里不多做说明。

（HoTT 则一般用 h-set 或者直接 , 但这里不讨论这个）

一句话即集合是类型上的谓语，可以发现这个定义本质上是一个满足的的子集/subtype.

比较一下（ Set Subtype 是 Lean 自带的语法）：


{(x, y) | (x > 2) ∧ y < 2}

Set from mathlib


{ xy : ℕ × ℕ // xy.1 > 2 ∧ xy.2 < 2}

Subtype


[(x, y) | x <- [0..], y <- [0..], x > 2 && y < 2]

Haskell 的列表推导式


(loop for x from 0
  collect
    (loop for y from 0
	    when (and (> x 2) (< y 2)) 
		    nconc (cons x y))
  into ls
  finally (return (remove-if #'null ls)))

不得不品尝我们 cl 抽象的 loop DSL


[(x, y) for x in [0..] for y in [0..] if x > 2 and y < 2]

假设 Python 真的支持惰性求值的话

给某个语言加入推导式，难点有四个：

从上面 type theoretic 的定义可以看出，集合构建式说到底还是一个命题——还不具有 operational semantics. 这要我们去赋予；

此外其通常用于描述无穷可数集合，然而在无穷集合/序列上做计算，对于Haskell之类 lazy 的语言是较为简单的；但此外的绝大多数即 strict langauge 是不好操作的，要求一个惰性的数据结构以及相适应的方法，在这里就是我实现的 LazyList.

有隐含的 pattern matching (比如 pair 的 destructuring)，虽然这里没有发生：只要语言支持 pattern matching 就没有任何问题，否则应该先去做 pattern matching.

区分 sequential / parallel. 即嵌套的 map 和平行 map ；命令式的话就是 for 循环：嵌套其实就是计算笛卡儿积，一般用于 nondeterministic search，后者可以翻译成一个 zip 或 zipWith .

Translating by Examples: Sequential

（这之后都用 Lean，我们会实现和 Haskell 类似的推导式）

对于上面这个例子，应该可以立即反应出来该如何翻译（只要知道推导式的语义的话）：


[(x, y) | x <- [0..], y <- [0..], x > 2 && y < 2] :=
	flatten $ filter (· matches some _) $
	[0..].map fun x => 
		[0..].map fun y => 
			if x > 2 && y < 2 then some (x, y) else none

发现有几个细节

虽然在这里 y <- [0..] 中不含 x ，但是 Haskell 的推导式是有 sequential binding 的——也就是说 x <- e1, y <- e2,.. 中 x 在 e2 中有定义， y 在其之后的表达式中有定义。

单纯的嵌套 map 肯定是不行，否则类型就是 List (List (Nat × Nat)) ，需要用 flatten 展平。

x > 2 && y < 2 这个 predicative filtering 可以把 (x, y) 即我们用来 map 的函数的 codomain 给 lift 到 Option 中，构造一个 anamorphism，说人话就是把 Nat -> Nat × Nat 转换成 Nat -> Option (Nat × Nat) .

正是以下 F-coalgebra :

特别地，我们希望直接构造出，最好是能够进行一些系统性的句法（syntactic）上的变换，因为 Lean 是 strict 的，采用复合不合适。
我们希望能把和 filterMap fuse 起来，减少一次遍历
其中 filterMap id = filter (· matches some _)

性能如何？虽然函数式已经没有性能可言，但是能提升的地方当然应该提升，不妨看看遍历性能：

嵌套 map 两次，这个不可能没有
flatten 一次
filter 一次

我们发现，一个简简单单的东西居然要在一个列表上跑 4 遍，这实在是很难说得过去。

（这是因为 Lean 不是 call-by-name/need，没有办法直接复合这几个操作，Haskell 在这里的翻译就直观得多）。

Translation #1

根据上面的分析，我们先设计一个 sequential comprehension 的语法：


comp ::=
	  ...
	| [$exp (where $p)? | ($pat <- $exp' in)+]

其中 p 是谓词，和 exp 一样允许 pat 中绑定的任何符号出现。

写成 Lean:


syntax "[" term " | " withoutPosition(sepBy((term (" ← " <|> " <- ") term), " in ")) "]" : term
syntax "[" term " where " term " | " withoutPosition(sepBy((term (" ← " <|> " <- ") term), " in ")) "]" : ter

性能改进

flatMap

我们发现 map 和 flatten 是可以复合起来的，即 flatMap 或 concatMap 或 >>= . 这个东西，和 monad 都是说烂了的玩意儿，没什么好说的。我就特别提一下 LazyList 的实现。首先给一个定义：


inductive LazyList (α : Type u)
  | nil
  | cons : α -> Thunk (LazyList α) -> LazyList α
-- ... （中略）
def appendH : LazyList α -> Thunk (LazyList α) -> LazyList α
  | [||], l => l.get
  | x ::' xs, l => x ::' appendH xs.get l

def append : LazyList α -> LazyList α -> LazyList α
  | nil, l => l
  | x ::' xs, l => x ::' ↑(append xs.get l)

instance : Append (LazyList α) := ⟨append⟩
instance : HAppend (LazyList α) (Thunk $ LazyList α) (LazyList α) := ⟨appendH⟩

-- ... (中略）
def flatMap (f : α -> LazyList β) : LazyList α -> LazyList β
  | nil => nil
  | x ::' xs => (f x) ++ (Thunk.mk $ fun _ => flatMap f xs.get)

可以看到， append appendH 都是 lazy 的。其中 ↑ 表示 coercion, 因为存在一个 α 到 Thunk α 的 Coe instance.

(++) 同时 dispatch 到 homogeneous 的 Append 和 heterogeneous 的 HAppend

(s₁ ++ s₂) 会根据 s₂ 的类型从这俩函数中选取一个。static multiple dispatch 确实是很爽。

那么，很自然的就会去想，依赖这两个方法的方法，是不是也是 lazy 的呢——显然不全是的。

因为绝大多数语言包括 Lean 都是 call-by-value，考虑第二个分支内的 recursive call 的两种情况

f x ++ flatMap f xs.get

f x ++ (Thunk.mk fun _ => flatMap f xs.get)

我们发现虽然不管哪个 (++) 都是 lazy 的，但是 flatMap 自身也是 recursive 且 recursive call 发生在 (++) 的参数位置，根据 CBV 定义，Lean 会先求参数，而不是直接代入，因而在一个无穷序列上计算时仍然会爆，所以情况一肯定会无法终止而爆栈。情况二则相当于是模拟了惰性求值将参数代入 (++) 的一步，只不过这一步被我们放在了 caller 而不是 callee 中： flatMap 的 recursive cal 被包在一个函数中，所以肯定不会 eagar eval.

我们 fuse 了 flatten 和 map ，那么应该放在哪里呢？嵌套的 Function application 都是从内向外展开的，对于一个链

考虑最内层的 map ，其 map 的应该是一个结构 F (..) 其中 (..) 一定不可能包含更多的 F （因为已经在最内层），这说明如果我们需要 F α 且要求，除了最内层的 map 应当保留之外，其他 map 全部应该改写为 flatMap 即

那么这样就减少了一次遍历（实际上是次，考虑个列表）。


[(x, y) where x > 2 && y < 2 | x <- [0..] in y <- [0..]] :=
	filter (· matches some _) $
	[0..].flatMap fun x => 
		[0..].filterMap fun y => 
			if x > 2 && y < 2 then some (x, y) else none

构造 `f'` 并 fuse `filterMap`

上面说过，如果能从AST的角度直接构造 f' 就好了。还是考虑上面那个例子：根据我们的语法，

exp 会被绑定到 (x, y)

出现的两个 binder pat （以后也可能是 pattern）

x 和 y
fun pat => e 已经支持 pattern matching，我们不必更改。

p 会被绑定到 x > 2 && y < 2

而翻译之后的产物是 if x > 2 && y < 2 then some (x, y) else none —— 很显然我们发现了两个 hole if (·) ... then some (·) .

那么完全可以写一个 macro 来方便我们构造：

（这实际上就是一条 denotational semantics 规则）


local macro f:term:50 " depends_on! " p:term:50 : term =>
  `(if $p then Option.some $f else Option.none)

根据上面的优化，我们已经把最内层以外的 map 全部优化成 flatMap 了，接下来想想，能不能优化掉 filter 呢？答案当然是可以的。我们只需要把 map 改成 filterMap 即可，且不管有没有 where 从句，我们统一把所有的 exp 变换成 exp depends_on! p 的形式，也就是说：

e ⇒ `($exp depends_on! Bool.true)

e where p ⇒ `($exp depends_on! $p)

亦即


[(x, y) where x > 2 && y < 2 | x <- [0..] in y <- [0..]] :=
	[0..].flatMap fun x => 
		[0..].filterMap fun y => 
			if x > 2 && y < 2 then some (x, y) else none

Sequential Binding

我们发现，如果使用上面的翻译，就已经是 sequential binding 了， x y 的 binding 已经具备先后顺序，这很好，我们不必多做处理。

Generalizing

因 flatMap = flip (· >>= ·) ，我们完全可以将

所有的 flatMap 替换成 bind

所有的 map 替换成 (· <$> ·)

然后这个语法就供所有 Monad 通用了。

之所以前面不提，是因为我感觉 Lean 的开发者不喜欢把 List 当成 Monad 来推理，诸如 Option 之类的都有 Monad instance，唯独 List 没有（但是 Functor 是有的）。

那么我们也入乡随俗，就不用 Monad ，我们用 Bind （虽然根本没差）。同时因为 Monad 没有 filterMap ，但是我们能够为同时实现了 Monad 和 Alternative

(本质上是因为实现 Alternative 相当于定义了某个 Monad 的 zero element)

去 derive 一个 filterMap . 我们把包含有 map 和 filterMap 的 class 称为 Mappable :



--   Extends `Functor`, with the addition of `filterMap`, for collections especially.
-- - Any `Monad` that has an `Alternative` instance is `Mappable`.
--
class Mappable (F : Type u -> Type v) extends Functor F where

  filterMap (p : α -> Option β) : F α -> F β
  
  -- Usually if there's `filterMap`, 
  -- there must be a `filter` method as well for collections. 
  -- But if it doesn't then we define `filter` in terms of `filterMap`.
  -- However a direct implementation has, very likely, better performance.
  filter (p : α -> Bool) : F α -> F α := filterMap (fun x => if p x then some x else none)
  
  map f := filterMap (some ∘ f)

(这里虽然定义了 Mappable.map 但是我们从来不用，我们还是用 Functor .

Mappable 延伸了（我们特意不用继承这个字眼以便和 java 等语言的 extends 区分开来） Functor .

只关注 filterMap 这一个方法： Array List LazyList 都已经直接定义了该方法，现在要为所有的 m s.t. Monad m Alternative m 导出一个 Mappable ，这实际上也简单：


@[default_instance] 
instance (priority := low) [Alternative m] [Monad m] : Mappable m where
  filterMap f xs := xs >>= fun x => 
    if let some x := f x then pure x else failure

有人可能会说，你这还需要一个 Alternative ，是不是依赖太强了？

这个依赖是必要的，据我所知， Haskell，至少是 ghc 的翻译，用的是

guard p *> e 或者 e <$ guard p
无论哪个，用到 guard 就需要 Alternative .
ghc 的翻译是很干净的，为什么我们做不到呢——当然是因为性能问题和 eager eval.

我觉得唯一的槽点就是 API 乱，因为我们下面实际的翻译，又用到 Functor 又用到 Mappable 还有 Bind . 确实不如 Haskell 统一用 Monad 来得舒服。这里大概率还有改进空间，不过我觉得不重要就没有管了。

Denotational semantics #1

既然已经优化好了，也定义好接口了，那就来写写 denotational semantics:

我们用表示作为推导式（）的 desugaring，那么有

(注意特指最左侧的表达式，且中不再嵌套推导式)

where E ::= (pat <- e' in E) .

Implement #1

数学总是看起来很美好的，但实现起来，就还有优化可做，但是就显得丑陋了：

我们发现还有优化可做：

不存在 predicate 的情况（即没有 where 从句）的情况下，没有必要去构造 f' 并 filterMap ，直接 map 即可——省几个 stack frames 嘛——因此我们分开处理两种情况；

我们只有两条完整的语法，所以 parse 之后会得到 i, l 两个数组，分别表示 pat 和 e' ；

当然我们也可以多加几条语法，然后直接在 macro_rules 上递归（即前面的 denotational semantics），不过性能恐怕不如直接 fold 数组来构造AST来得好。


macro_rules
	...  
  | `([ $f | $[$i <- $l]in* ]) => do
    if h : i.size >= 1 ∧ l.size >= 1 then
      let (iz, lz) := (i[i.size - 1], l[l.size - 1])
      let li := (l.zip i)[:l.size - 1]
      show MacroM Term from
      ``(Functor.map (fun $iz => $f) $lz) >>= li.foldrM fun (t, fb) a =>
        ``(Bind.bind $t (fun $fb => $a))
    else unreachable!
  | `([ $f | $[$i ← $l]in* ]) => ``([ $f | $[$i <- $l]in* ])
  | `([ $f where $p | $[$i <- $l]in* ]) => do
    let mf <- `($f depends_on! $p)
    let (iz, lz) := (i[i.size - 1]!, l[l.size - 1]!)
    let li := (l.zip i)[:l.size - 1]
    show MacroM Term from
    ``(Mappable.filterMap (fun $iz => $mf) $lz) >>=
      li.foldrM fun (t, fb) a => ``(Bind.bind $t (fun $fb => $a))
  | `([ $f where $p | $[$i ← $l]in* ]) => ``([ $f where $p | $[$i <- $l]in* ]

现在比如说我们求 n 以内的勾股数，就很简单：


def pyth (n : Nat) : LazyList (Nat × Nat × Nat) :=
  [(x, y, z) where x ^ 2 + y ^ 2 == z ^ 2
    | x <- [|1 ~~ n + 1|]
   in y <- [|x ~~ n + 1|]
   in z <- [|y ~~ n + 1|]]

其中 [||] 是我用来表示 LazyList literal 的语法，还加了点糖：


gen ::= [|num ~~ (num)? (: num)? |]
			| [|num to (num)? (by num)?|]

这些语法都会被翻译到 LazyList.gen start stop step ，产生一个左闭右开步数为 step 的无穷 / 有穷整数列.

小缺陷

一般来说， fun x => fun y => fun z => .. 应该是和 fun x y z => .. definitionally equal 的，不仅如此，前者在 WHNF 化之后，从 metalanguage 的角度来说，应该更是直接相等的—— rfl 能够直接证明两者相等印证了这点：


example : 
  (λ x => λ y => λ z => (x, y, z) : α -> β -> γ -> (α × β × γ)) 
  = 
  λ x y z => (x, y, z) := rfl

但是在 universe level inference 上 Lean 并非这样处理，特别是和 dependent product 相关，即便 result type 是 concrete 的时候：


(fun (x, _) (_, y) => (x, y)) <$> [(1,2), (3,4)] <*> [(5,6),(7,8)]
-- [(1, 6), (1, 8), (3, 6), (3, 8)]


(fun (x, _) => fun (_, y) => (x, y)) <$> [(1,2), (3,4)] <*> [(5,6),(7,8)]

-- stuck at solving universe constraint
--   max (?u.400287+1) (?u.400288+1) =?= max (?u.400212+1) (?u.400213+1)
-- while trying to unify
--   Prod.{?u.400213, ?u.400212} ?m.400220 ?m.400230 : Type (max ?u.400213 ?u.400212)
-- with
--   Prod.{?u.400213, ?u.400212} ?m.400220 ?m.400230 : Type (max ?u.400213 ?u.400212)

这里没有细究，能说明的是涉及到 (· × ·) 类型的 pattern matching 时，后者会加 explicit motive: 可能是这个导致的问题。

此外，Lean 的积类型的两个参数是 universe-polymorphic 的，恐怕这也造成了一些影响，考虑单参数 universe-polymorphic 的情况：


structure NProd (α β : Type u) where
  mkNp ::
  fst : α
  snd : β
  
open NProd in 
#eval 
  (fun {fst,..} => fun {snd,..} => (fst, snd)) 
 <$> [mkNp 1 2, mkNp 3 4] 
 <*> [mkNp 5 6, mkNp 7 8]

就能够正常推断。总的来说是一个很 obscure 的东西。

这个问题在我们推导式这里尤其明显：上面的例子其实是推导式的另一种翻译，用 Seq 而不是 Bind . 同时上面的式子等价于


[(x, y) | (x, _) <- [(1, 2), (3, 4)] in (_, y) <- [(5, 6), (7, 8)]]
= [(1, 2), (3, 4)] >>= fun (x, _) => 
		[(5, 6), (7, 8)]] >>= fun (_, y) =>
			(x, y)

可以看到，我们这个翻译也不是 WHNF，同样也会遇到上述问题。

不过这个问题很好解决，显式添加类型标注 (type ascription) 即可。 show List (Nat × Nat) from ...

实际使用，toplevel 的定义本就最好要写类型；local 有 local bidirectional type inference, 真正会报错的情况还是少数。

如果就是不想写类型标注，那只能考虑用 Seq 的翻译了，就像上面这个例子的第一种情况那样。下面介绍这一种翻译。

Translation #2

首先需要明确一个很基本的东西：

对任意的有

一般习惯性的理解成：


do 
	let x₁' <- x₁
	let x₂' <- x₂
	...
	let xₙ' <- xₙ
	pure (f x₁' x₂' ... xₙ')

即把 lift 到一个 Applicative 中，再 apply 到各个参数上。但是这个东西其实不是特别准确，在语义上。我认为最准确的叫法是 sequential application. 也就是说按顺序一个个 apply. 这很重要：

告诉我们把中的第个元素全部转换成，由于其需要多个参数，相当于 partial application, 因此相当于产生一个全是函数的列表，记作，

接着看，两边都是列表，一边是一列函数，另一边是一列参数。

固定一个，对，其结果为一个列表，往复这个过程并把所有的列表连接起来，就是结果。

有标准定义 f <*> x = f >>= (· <$> x)

从语义上来讲，特别是在列表上，相当于爆搜。因而我们也可以将推导式翻译成类似的形式，而且异常简单。

仍然先给出一个语法：


comp ::=
	  ...
	| [$exp (where $p)? | ($pat <- $exp',)+]

写成 Lean


syntax "[" term " | " withoutPosition((term (" ← " <|> " <- ") term),+) "]" : term
syntax "[" term " where"  term " | " withoutPosition((term (" ← " <|> " <- ") term),+) "]" : term

有 semantics (注意 <&> = flip <$> )

where E ::= (pat <- e', E) .

对于没有 where 从句的仍分开处理，直接用 <$> :


macro_rules
	...
  | `([ $f | $[$i <- $l],* ]) => do
    if h : l.size >= 1 then
      let is <- ``(fun $i* => $f)
      let fn <- ``(Functor.map $is)
      let l0 := l[0]
      show MacroM Term from
      ``($fn $l0) >>= l.foldlM (start := 1) fun a s =>
        ``(Seq.seq $a (fun _ : Unit => $s))
    else unreachable!
  | `([ $f | $[$i ← $l],* ]) => ``([ $f | $[$i <- $l],* ])
  | `([ $f where $p | $[$i <- $l],* ]) => do
    let mf <- ``($f depends_on! $p)
    ``(Mappable.filterMap id [ $mf | $[$i <- $l]in* ])
  | `([ $f where $p | $[$i ← $l],* ]) => do
    ``([ $f where $p | $[$i <- $l],* ])

我们前面说过， <$> 即 map 是可以和 filterMap 合并的，这里也如此吗？显然不是的，因为求值顺序不一样，具体来说， <$> 是左结合的。

我们这里是一个 left fold, <$> 最先求值；

和前面的 right fold, map 最后求值（在最内层）不一样。

所以此处不能合并，即这种翻译在 predicative filtering 的情况下必须要多遍历一遍。

这种方法主要是直观，也是我第一个想到的翻译，反而不是前面那种。但是其参考作用大于实际作用：

性能上并不好

而且显然不支持 sequential binding

但不存在上面说过的小缺陷：

注意 ``(fun $i* => $f)

其中 $i* 是 antiquote 的 splice 语法，可以看作是把 (i : Array (TSyntax `funBinder) 在 fun ... 处直接展平，这样就避免了 fun ... => fun ... => 手动 curry 导致的 universe level inference 问题。但我估计这东西大概率没有健壮性可言，稍微复杂一点可能仍会出现上述缺陷。

Translating by Examples: Parallel

个人感觉上面的 sequential 这种爆搜恐怕并不很常用，很常用的或许是 parallel comprehension，即


zipWith f xs ys = [| f.body | x <- xs | y <- ys |]

那么对于两个以上的列表怎么办呢——有一种 naive thought:

zip = zipWith (· , ·) ，记为 . right associative, binds tighter than . (因为 (· × ·) 就是右结合的)

考虑 zipWithN ，其工作在个列表上，那么

我们用表示 uncurried version of .

定义很美妙，跑起来像屎一样。

因为是平行的，理想情况下的时间复杂度应当是

很明显，对个长度为的列表来说复杂度是，同时还会存在一个超大的 intermediate data structure. 虽然后者可以通过 LazyList 缓解，但是就单纯的时间复杂度而言已经十分不理想。

土办法

上面所说的问题，在任何一个把 parallel comprehension, parallel for 解糖到 zip zipWith 的语言中都会发生。对于没有元编程，且又不想提供原生支持自动生成 zipN 的语言来说，最好也最简单的办法就是枚举。

还记得我原来工作过的一个语言，其 for...in 用 continuation-based iterator 实现，parallel for 则是用 iterator 的 zipWith() 方法。该语言并没有元编程，同样也没有在编译器内做 zipWithN （在我工作的时期，现在似乎也没有），因而采取的办法就是把 zipWith...zipWith15 全部枚举一遍，超出部分用仍用上面的 nested zip .

但我们仍要尽力减少这种 boilerplate 的东西，先考虑 zip zipWith 的关系：

一般来说 zip 都是直接用 zipWith 实现的，即 zip = zipWith (· , ·) . 考虑 zipN ，显然我们有

翻译成 Lean 的 macro prod_of! 就是


macro:max "prod_of!" n:num : term => 
  match n.getNat with
  | 0 => ``(())
  | 1 => ``(id)
  | n + 2 =>
    ``((· , ·)) >>= n.foldM fun _ _ a =>
      ``((· , $a))

💡

prod_of! binds tighter than function application. Prod nests to the right.

这样，我们就能够专心去实现 zipWithN 了。

那先给一个 Zippable 的定义，generalizing zip .


--  Similar to Haskell's `MonadZip`, but for collections especially.
--  But relaxes on the constraint that `m` must be a `Monad` first.
--  - Any `Monad` is `Zippable`.

class Zippable (m : Type u -> Type v) where

  zipWith  (f : α -> β -> γ) : m α -> m β -> m γ
  zipWith3 (f : α -> β -> γ -> δ) : m α -> m β -> m γ -> m δ
  zipWith4 (f : α -> β -> γ -> δ -> ζ) : m α -> m β -> m γ -> m δ -> m ζ  
  zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : m α -> m β -> m γ -> m δ -> m ζ -> m η
  zipWith6 (f : α -> β -> γ -> δ -> ζ -> η -> θ) : m α -> m β -> m γ -> m δ -> m ζ -> m η -> m θ
  zipWith7 (f : α -> β -> γ -> δ -> ζ -> η -> θ -> ι) : m α -> m β -> m γ -> m δ -> m ζ -> m η -> m θ -> m ι
  
  zip  : m α -> m β -> m (α × β) := zipWith prod_of! 2
  zip3 : m α -> m β -> m γ -> m (α × β × γ) := zipWith3 prod_of! 3
  zip4 : m α -> m β -> m γ -> m δ -> m (α × β × γ × δ) := zipWith4 prod_of! 4
  zip5 : m α -> m β -> m γ -> m δ -> m ε -> m (α × β × γ × δ × ε) := zipWith5 prod_of! 5
  zip6 : m α -> m β -> m γ -> m δ -> m ε -> m ζ -> m (α × β × γ × δ × ε × ζ) := zipWith6 prod_of! 6
  zip7 : m α -> m β -> m γ -> m δ -> m ε -> m ζ -> m η -> m (α × β × γ × δ × ε × ζ × η) := zipWith7 prod_of! 7

很多人看来是屎山，但确是很好用的办法

Generalizing 土办法

让我们先跳出 List 这个框架，想一下针对所有的 Monad 该怎么处理：

我们上面说过， <*> 主要是用来爆搜，但是考虑一下非线性/递归结构的 Monad 呢？实际上，就不存在这个问题，没有差别。也就是说对于任何非 Array List LazyList 的 Monad ，在语义上有

<*> = <+> .

所以我们可以实现一个通用的 Zippable instance:



instance (priority := low) [Monad m] : Zippable m where
  zipWith  := (· <$> · <*> ·)
  zipWith3 := (· <$> · <*> · <*> ·)
  zipWith4 := (· <$> · <*> · <*> · <*> ·)
  zipWith5 := (· <$> · <*> · <*> · <*> · <*> ·)
  zipWith6 := (· <$> · <*> · <*> · <*> · <*> · <*> ·)
  zipWith7 := (· <$> · <*> · <*> · <*> · <*> · <*> · <*> ·)

接着，我们再单独为 Array List LazyList 实现 Zippable , 根据 Lean 的 instance search algorithm, 这几个 instance 的优先级会比通用的高。这也可以通过 #synth 来验证。

这样，我们在确保所有 Monad 都能使用的情况下（通用性），也满足序列结构的特殊 zip 实现的性能要求。

其实通篇看下来会发现我们特意不去统一使用Monad 这个API，而且实现几乎都是从序列结构 List 等视角叙述的，这有几个原因：

第一是 Lean 的求值语义和 Haskell 不同，上面也强调过很多遍了；

第二是这些序列结构确实是有特殊的实现，没有必要为了统一而统一，失去本可以避免失去的性能优势

第三是大家都爱 generalizing，monad comprehension 确实是很美妙的统一，但是就我所知很少有人真的用 comprehension 的语法去写 monadic code, 绝大多数都是用 do ; 而 comprehension 则仍然主要用于序列结构，

那么很显然，我们的 comprehension 就一定要为序列结构而优化，generalization 都是次要的

实际上我们发现，在做足了必要的优化下也没有损失任何一点 generality，使用的 API 虽然较乱，但是这对库使用者来说是不可见的。

不仅如此，涉及到的几个 typeclass 本就有很强的关联性，实现起来bing

`Zippable List`

很简单，我们特意写 tailrec 版本的，用数组存中间数据，这里给出两个例子：


def zipWith4 (f : α -> β -> γ -> δ -> ζ) : List α -> List β -> List γ -> List δ -> List ζ := go #[] where
  go acc
  | a :: as, b :: bs, c :: cs, d :: ds => go (acc.push $ f a b c d) as bs cs ds
  | _, _, _, _ => acc.toList
def zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : List α -> List β -> List γ -> List δ -> List ζ -> List η := go #[] where
  go acc
  | a :: as, b :: bs, c :: cs, d :: ds, e :: es => go (acc.push $ f a b c d e) as bs cs ds es
  | _, _, _, _, _ => acc.toList

`Zippable LazyList`

同理，但是肯定没法写 tailrec 的，因为要保证 lazy property


def zipWith4 (f : α -> β -> γ -> δ -> ζ) : LazyList α -> LazyList β -> LazyList γ -> LazyList δ -> LazyList ζ
  | a ::' as, b ::' bs, c ::' cs, d ::' ds => f a b c d ::' zipWith4 f as.get bs.get cs.get ds.get
  | _, _, _, _ => [||]
def zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : LazyList α -> LazyList β -> LazyList γ -> LazyList δ -> LazyList ζ -> LazyList η
  | a ::' as, b ::' bs, c ::' cs, d ::' ds, e ::' es => f a b c d e ::' zipWith5 f as.get bs.get cs.get ds.get es.get

`Zippable Array`

我这里想特别谈谈 Array 的实现。 Array 在 prelude 中是由 List 实现的，但这不过是一个 reference implementation, 主要写给 typechecker/kernel 看 (因为 primitive 就没法 reason 了，不够 self-contained，需要构造出来). 那么显然这个性能是完全说不过去的。

在运行时编译器将用一个真正的， primitive的，适合编程的 Array 覆盖之。

所以在实现上，我们需要利用 Array 的这一点优势，通过 index 去访问它，就像命令式语言一样。

但是这里就涉及到一个问题，对某个 arr : Array α，Lean 有三种下标访问：

arr[i] 需要证明 i < arr.size，但是最快，没有任何下面提到的 overhead.

arr[i]? 产生一个 Option α , 但如此，对 n 个数组而言我们就要做次运行时越界检查,

同时 constructing/destructuring Option 又要做同样多次，i.e.


match as[i]?, bs[i]? .. with
| some a, some b, .. =>
| _, _, .. =>

这么多 overhead 在性能上不可接受。

arr[i]! 相当于命令式语言的下标访问，若越界运行时会 panic ，但是 Lean 毕竟是一个 theorem prover, 讲求 type soundness, 即便 panic ，我们也需要证明 Inhabited α ；同时这里的 result type 就是 α , 我们也不能通过 Nonempty α 并用选择公理 nonconstructively 直接选一个元素出来，否则整个 zipWithN 就是 noncomputable 的。

那么，留给我们的只有一种选择，即是用 canonical 的 arr[i] 来访问，这就涉及到我最爱的证明环节了。