この広告は、90日以上更新していないブログに表示しています。
ceronman/loxido: Rust implementation of the Lox programming language.
let bar= gc.alloc(LoxString::from_string("bar".to_owned()));
LoxString::from_string
andGcObject
implementation. They are staraightforward.
pubstructLoxString {pub header: GcObject,pub s:String,pub hash:usize,}impl LoxString {pubfnfrom_string(s:String)->Self {let hash=LoxString::hash_string(&s); LoxString { header:GcObject::new(ObjectType::LoxString), s, hash, }__snip__pubstructGcObject { marked:bool, next:Option<NonNull<GcObject>>, obj_type: ObjectType,}
Gc::alloc should be doing thetrick. I added a comment for each line.
pubfnalloc<T: Display+'static>(&mutself, object: T)-> GcRef<T> {unsafe {// Alloc object in heap.let boxed=Box::new(object);// Get raw pointer from Box::into_raw.// pointer is NonNull<T>.let pointer=NonNull::new_unchecked(Box::into_raw(boxed));// This assumes &T has GcObject as the first field with the exact sae alignhment.letmut header: NonNull<GcObject>=mem::transmute(pointer.as_ref());// Gc.first is linked list of GcObject(=header). header.as_mut().next=self.first.take();// The newly allocated one becomes head of the linked list.self.first=Some(header);// GcRef is a struct with one field pointer: NonNull<T>. GcRef { pointer } } }
Adding my comments inline.
fnmark_roots(&mutself) {// Mark VM stack.for&valuein&self.stack[0..self.stack_len()] {self.gc.mark_value(value); }// Mark call frames.for framein&self.frames[..self.frame_count] {self.gc.mark_object(frame.closure) }for&upvaluein&self.open_upvalues {self.gc.mark_object(upvalue); }self.gc.mark_table(&self.globals);self.gc.mark_object(self.init_string); }
mark_object
pushes in-use objects togrey_stack
.blacken_object
marks recursively in objects. Unlike RustGC this doesn't have Trace trait. So inblacken_object
it shouldenum all GcRef field.
fntrace_references(&mutself) {whileletSome(pointer)=self.grey_stack.pop() {self.blacken_object(pointer); } }
if the object is not marked release the object from heap.
fnsweep(&mutself) {letmut previous:Option<NonNull<GcObject>>=None;letmut current:Option<NonNull<GcObject>>=self.first;whileletSome(mut object)= current {unsafe {let object_ptr= object.as_mut(); current= object_ptr.next;if object_ptr.marked { object_ptr.marked=false; previous=Some(object); }else {ifletSome(mut previous)= previous { previous.as_mut().next= object_ptr.next }else {self.first= object_ptr.next }Box::from_raw(object_ptr); } } } }
Previously we exploredTrace
inTrace - Understanding Rust GC - higepon blog. Let's look intoFinalize
trait. This is really simple. EveryGC
-ed object should implement this finalize.
pubtraitFinalize {fnfinalize(&self) {}}
Let's look at finalize implementation forarray
andOption<T>
. They are empty. Hmm. I'm not quite sure why. Let me start fromcollect_garbage
and come back here later.
;; Arrayimpl<T: Trace,const N:usize> Finalizefor [T; N] {};;Option<T>impl<T: Trace> FinalizeforOption<T> {}
Incollect_garbage
after themark
returns unmarked objects (= not used objects) and it will callfinalize_glue()
for eachGC<T>
object.
let unmarked=mark(&mut st.boxes_start);if unmarked.is_empty() {return; }for nodein&unmarked {Trace::finalize_glue(&(*node.this.as_ptr()).data); }
finalize_glue
callsfinalize
in the Trait. So say an array is unmarked, thegc eventually calls emptytrace
of the array. Now I remember thatfinalize
give the object a chance to any necessary work before it's deallocated. For example an object may want to close a file associated with it.
Now we know that thegc collects unmarked objects and calls finalize for the unmarked objects. It's time tosweep
them.
unsafefnsweep(finalized:Vec<Unmarked>, bytes_allocated:&mutusize) {let _guard=DropGuard::new();for nodein finalized.into_iter().rev() {if (*node.this.as_ptr()).header.marked.get() {continue; }let incoming= node.incoming;letmut node=Box::from_raw(node.this.as_ptr());*bytes_allocated-=mem::size_of_val::<GcBox<_>>(&*node);*incoming= node.header.next.take(); } }
This is actually very interesting. It recreates Box from returning raw pointer using from_raw which makes sense!
Let's get back toroot
andunroot
.A Tour of Safe Tracing GC Designs in Rust - In Pursuit of Laziness has a good summary and examples of what is rooting and why we need it.
In one word: Rooting. In a garbage collector, the objects “directly” in use on the stack are the “roots”, and you need to be able to identify them.
structFoo { bar:Option<Gc<Bar>>}// this is a rootlet bar=Gc::new(Bar::new());// this is also a rootlet foo=Gc::new(Foo::new());// bar should no longer be a root (but we can't detect that!)foo.bar=Some(bar);// but foo should still be a root here since it's not inside// another GC'd objectlet v=vec![foo];
To track root objects, theGC maintains root counts inGC_BOX
. In shortGC_BOX
with root count > 0 is a root object.Rooting - Designing a GC in Rust explains it very well. Note that the count is incremented or decremented whenGC<T>
object is moved.
Box
.I want to understand howRust GC work to see if I can use it inmy Scheme VMinterpreter to be written in Rust.
GC-ed objects should implement Trace and Finalize. You should useGc::new
instead ofBox::new
to allocate objects in heap. Here is an example from theofficial document.
let x=Gc::new(1_u8);let y=Gc::new(Box::new(Gc::new(1_u8)));#[derive(Trace, Finalize)]structFoo { a: Gc<u8>, b:u8}let z=Gc::new(Foo {a: x.clone(), b:1})
z variable above isGc<Foo>
type. You canaccess fields of Foo likez.a
. But why? It doesn't have z field in it. Because it implementsDeref Trait Rust compiler can take care of it.
impl<T: Trace+ ?Sized> Dereffor Gc<T> {typeTarget= T;#[inline]fnderef(&self)->&T {&self.inner().value() }}
The definition of the structGC<T>
as follows. It has only 2 fields.
pubstructGc<T: Trace+ ?Sized+'static> { ptr_root: Cell<NonNull<GcBox<T>>>, marker: PhantomData<Rc<T>>,}
PerDesigning a GC in Rust - In Pursuit of Laziness the purpose oftrace
is a way of walking the fields of a given object and finding innerGc<T>
fields. For example if you have oneGC<T>
object, you should be able to find allGC<T>
fields or inner fields by the tracing.
Let's look into the Trace trait. I'm not sure what root and unroot are doing yet. We'll get back to here later.
/// The Trace trait, which needs to be implemented on garbage-collected objects.pubunsafetraitTrace: Finalize {/// Marks all contained `Gc`s.unsafefntrace(&self);/// Increments the root-count of all contained `Gc`s.unsafefnroot(&self);/// Decrements the root-count of all contained `Gc`s.unsafefnunroot(&self);/// Runs Finalize::finalize() on this object and all/// contained subobjectsfnfinalize_glue(&self);}
Here is one Trace implementation of for 'static lifetime struct.
unsafeimpl<T: ?Sized> Tracefor&'static T {unsafe_empty_trace!();}
unsafe_empty_trace!
is a macro which has no-op trace, root and unroot method. This makes sense because 'static lifetime indicates that the data pointed to by the reference lives for the entire lifetime of the running program. So we don't event need to track or sweep. Let's look at one more example. This is the trace for an array. I would guess this is marking all elements in the array. Let's confirm.
unsafeimpl<T: Trace,const N:usize> Tracefor [T; N] {custom_trace!(this, {for vin this {mark(v); } });}
custom_trace!
is implemented as follows. It defines inlinemark
method and call it in the$body
.
/// This rule implements the trace method.////// You define a `this` parameter name and pass in a body, which should call `mark` on every/// traceable element inside the body. The mark implementation will automatically delegate to the/// correct method on the argument.#[macro_export]macro_rules! custom_trace { ($this:ident,$body:expr)=> {#[inline]unsafefntrace(&self) {#[inline]unsafefnmark<T:$crate::Trace+ ?Sized>(it:&T) {$crate::Trace::trace(it); }let$this=self;$body }
Ithink$crate::Trace::trace(it);
is callingtrace()
forGc<T>
but not 100% sure.
unsafeimpl<T: Trace+ ?Sized> Tracefor Gc<T> {#[inline]unsafefntrace(&self) {self.inner().trace_inner(); }
Then it callstrace_inner()
forGcBox
.
/// Marks this `GcBox` and marks through its data.pub(crate)unsafefntrace_inner(&self) {let marked=self.header.marked.get();if!marked {self.header.marked.set(true);self.data.trace(); } }
Remember thatGcBox
is the raw pointer allocated inGc::new
and stored inptr_root
.
pubfnnew(value: T)->Self {assert!(mem::align_of::<GcBox<T>>()>1);unsafe {// Allocate the memory for the objectlet ptr=GcBox::new(value);// When we create a Gc<T>, all pointers which have been moved to the// heap no longer need to be rooted, so we unroot them. (*ptr.as_ptr()).value().unroot();let gc= Gc { ptr_root:Cell::new(NonNull::new_unchecked(ptr.as_ptr())), marker: PhantomData, };
Let's quickly look atGcBox
definition.
pub(crate)structGcBoxHeader {//XXX This is horribly space inefficient - not sure if we care// We are using a word word bool - there is a full 63 bits of unused data :(//XXX: Should be able to store marked in the high bit of roots? roots: Cell<usize>, next:Option<NonNull<GcBox<dyn Trace>>>, marked: Cell<bool>,}#[repr(C)]// to justify the layout computation in Gc::from_rawpub(crate)structGcBox<T: Trace+ ?Sized+'static> { header: GcBoxHeader, data: T,}
Okay. Now I have better understanding. Thetrace
method just setmarked=true
which means the ptr is in use. By the way who is calling thetrace
method and when? The answer iscollect_garbage
=>GcBox::trace_inner()
=>trace
. This is exciting! We're so close to the core of thegc. Let's look atcollect_garbage
.
/// Collects garbage.fncollect_garbage(st:&mut GcState) {__snip__unsafefnmark(head:&mutOption<NonNull<GcBox<dyn Trace>>>)->Vec<Unmarked> {// Walk the tree, tracing and marking the nodesletmut mark_head=*head;whileletSome(node)= mark_head {__snip__unsafe {let unmarked=mark(&mut st.boxes_start);if unmarked.is_empty() {return; }
Now we knowGcState.boxes_start
is the actual starting point ofmark
and it recursively marksGC<T>
objects. My next question is who's setting upboxes_start
? The answer is it is done inGcBox::new
whenever it allocates new GcBox, it maintains a link list of GcBox and set it toboxes_start
.
let gcbox=Box::into_raw(Box::new(GcBox { header: GcBoxHeader { roots:Cell::new(1), marked:Cell::new(false), next: st.boxes_start.take(), }, data: value, })); st.boxes_start=Some(unsafe {NonNull::new_unchecked(gcbox) });
And finally GcState is a thread local object as follows. So GcState is a kind of global variable which is local to a thread.
// The garbage collector's internal state.thread_local!(static GC_STATE: RefCell<GcState> = RefCell::new(GcState { bytes_allocated: 0, threshold: INITIAL_THRESHOLD, boxes_start: None,}));
Trace
trait.trace
method should recursively calltrace
method for inner objects.trace
forGcBox::new
ed objects.Next:Finalize, Sweep, root and unroot - Understanding Rust GC - higepon blog.
Mosh は R7RS に対応しつつあるのでecraven/r7rs-benchmarks というベンチマークを走らせてみた。今回は速度は気にせずにR7RSScheme 処理系としてコードを間違いなく正しく実行できるかに注力。結局57個中で3つのベンチマークで実行に失敗することが分かった。うち2つは比較的簡単に修正できた。最後の1つが手強かったので記録を残す。
失敗するのは conform というベンチマーク。期待される結果とは違うものをMosh が返してしまう。そして実行時間がやけに短い。conformベンチマークのスクリプト(r7rs-benchmarks/conform.scm)を見てみるとグラフ構造を作って何かやっているらしい。正直コードを理解しようとするのは3秒で諦めた。
この時点ではデバッグが難航する事は知る由もないのでデバッグ print を入れてなぜ期待と違うかを調べようとするがすぐに挫折。なぜならば
この状態で2-3時間無駄に使った。
これは難航しそうだとようやく気付いたので少し落ち着くことにした。正しい状態がよく分からないのが問題なのでGauche で実行して比べることとした。次に処理内容が分からないのは良いとして、メインの処理を何となく名前から特定した。ここでようやくメインの処理にデバッグ print を入れてGauche と比較できるようになり、ある関数で一瞬で間違った値が返っていることが分かった。
間違うポイントが分かったので勝利を確信。その関数への入力を維持したまま再現コードを小さくしていくことにした。ところがこれがかなり難しい。入力も出力もグラフなので文字列や数字を扱うのとは別の難しさがある。色々やっているうちにぐちゃぐちゃになってしまった。元に戻らなくなってしまい大反省。debug という git branch を作り少しずつ進むようにしたら急に捗ったし壊すことも無くなった。チェックポイントあるし壊しても大丈夫という心理的安全性大事。1日かけて小さなコードに絞ることができた。
(define (foo) (define (bar n) (cond ((= n1)#t) (else (let loop ((lst'(1))) (if (null? lst)#t (and (display"=> recusive1\n") (bar1) (display"=> loop\n") (loop (cdr lst)))))))) (bar0) )(foo)
このコードの(bar 1)
の呼び出しの後に(display "=> loop\n")
が呼ばれないことが分かった。これは明らかにおかしいなぜならば(bar 1)
は #t を返すから。
Scheme 処理系を書いたことのある人なら分かると思うが、これは色々怪しい。define の中に define があり named let があり末尾再帰があり。どこでバグっていてもおかしくない。
この時点でバグの原因候補は実行順に
というのを切り替えないといけない。その上この辺りを触るのは10年以上ぶりである!
最適化を OFF にすると再現しなくなったので
あたりが怪しい。
pass2 の最適化の途中。1週間のデバッグを終えた今の目で見ればおかしいところは明らか。
($if ($asm NUMBER_EQUAL ($lref n[1;0]) ($const1)) ($const#t) ($letrec ((loop[0;0] ($lambda[loop;2] (lst[2;0]) ($label #0 ($if ($asm NULL_P ($lref lst[2;0])) ($const#t) ($if ($call ($grefdisplay) ($const"=> recusive1\n")) ($if ($call[tail-rec] ($lref bar[2;0]) ($const1)) ($if ($call ($grefdisplay) ($const"=> loop\n")) ($call[jump] ($call[embed] ($lambda[loop;2] (lst[2;0]) label#0) ($const (1))) ($asm CDR ($lref lst[2;0]))) ($it)) ($it)) ($it)))))) ) ($call[embed] ($lambda[loop;2] (lst[2;0]) label#0) ($const (1)))))
pass2 の最適化後の iform は以下の通り。ここで時間を使いすぎた。
($define () foo ($lambda[foo;0] () ($call[embed40] ($lambda[bar;2] (n[1;0]) ($label #0 ($if ($asm NUMBER_EQUAL ($lref n[1;0]) ($const1)) ($const#t) ($call[embed70] ($lambda[loop;2] (lst[2;0]) ($label #1 ($if ($asm NULL_P ($lref lst[2;0])) ($const#t) ($if ($call ($grefdisplay) ($const"=> recusive1\n")) ($if ($call[jump00] ($call[embed40] ($lambda[bar;2] (n[1;0]) label#0) ($const0)) ($const1)) ($if ($call ($grefdisplay) ($const"=> loop\n")) ($call[jump00] ($call[embed70] ($lambda[loop;2] (lst[2;0]) label#1) ($const (1))) ($asm CDR ($lref lst[2;0]))) ($it)) ($it)) ($it))))) ($const (1)))))) ($const0))))
結局 pass2 の最適化で local call を埋め込む処理あたりで何かがおかしい事は分かるのだが。この iform がおかしいのか。後続の処理がおかしいのか分からないので後続も見る。実際のVM instruction 列を表示してみると。ますます分からない。
CLOSURE810#f011 ((reproduce.scm1) foo) LET_FRAME7 CONSTANT_PUSH0 ENTER;; Label #01 REFER_LOCAL_PUSH_CONSTANT01 BRANCH_NOT_NUMBER_EQUAL;; if (= n 1)5 CONSTANT#t LOCAL_JMP;; goto label #157 LET_FRAME5 REFER_LOCAL_PUSH0 DISPLAY1 CONSTANT_PUSH (1) ENTER1 REFER_LOCAL_BRANCH_NOT_NULL05 CONSTANT#t LOCAL_JMP38 FRAME6 CONSTANT_PUSH => recusive1 REFER_GLOBAL_CALLdisplay;; (display "=> recusive1\n")1;; # of args TEST;; (display ...) return value is true. So we skip the +1 next line and go to +2 line.29 CONSTANT_PUSH;; Come here after (display ...) call1 SHIFTJ;; adjust SP and FP1;; depth4;; diff0;; How many closure to go up? LOCAL_JMP;; Jump to label #0-42 TEST19 FRAME6 CONSTANT_PUSH => loop REFER_GLOBAL_CALLdisplay1 TEST10 REFER_LOCAL0 CDR_PUSH SHIFTJ110 LOCAL_JMP-43 LEAVE1 LEAVE;; label #21;; adjust stack RETURN;; return to the code (the code is taken from the top of stack). ** But we don't have the code in the stack?***0 DEFINE_GLOBAL foo HALT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP)
動的にVM で実行される様子を stack と共に。
========================FRAME FP|0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)"========================REFER_GLOBAL FP|0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)"========================CALL |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)"========================LET_FRAME # LET_FRAME forlambda[foo] |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" FP|4: 4 # Save fp |5:"#(#(CLOSURE ...))" # Closure========================CONSTANT |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" FP|4: 4 |5:"#(#(CLOSURE ...))"========================PUSH |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" FP|4: 4 |5:"#(#(CLOSURE ...))" |6: 0 # Constant 0========================ENTER # Adjust fp |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:0========================REFER_LOCAL # a=(Constant0) |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 0========================PUSH |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:0 # Constant0 |7: 0 # Constant 0========================CONSTANT # a=(Constant 1) |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:0 |7: 0========================BRANCH_NOT_NUMBER_EQUAL # a != stack-bottom(Constant 0) |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:0 # Discarded stack top.========================LET_FRAME # LET_FRAME for loop. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 0 |7:6 # Save fp |8: "#(#(CLOSURE ...))" # Push closure========================REFER_LOCAL # a=(Constant 0) REALLY??? |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:0 |7: 6 |8:"#(#(CLOSURE ...))"========================PUSH |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:0 # push (Constant0)========================DISPLAY # Create adisplayand set it to closure. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 0 |7:6 |8: "#(#(CLOSURE ...))"# Note stack is popped.========================CONSTANT # a=(1) |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:0 |7: 6 |8:"#(#(CLOSURE ...))"========================PUSH |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 # (1)========================ENTER |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" FP|9:1 # New FP.========================REFER_LOCAL # a=(1) |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" FP|9:1========================BRANCH_NOT_NULL # Go toelse. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" FP|9:1========================FRAME # Push stuff. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" FP|9:1 |10: "(#(#f ...)" # closure |11:9 # fp |12: 54 # pc + n |13:"(#(CLOSURE ...)" # Current codes========================CONSTANT # a="=> recursive1" |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" FP|9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)"========================PUSH |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" FP|9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" |14: "=> recusive1\n"========================REFER_GLOBAL # a=<display> |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" |6:0 |7: 6 |8:"#(#(CLOSURE ...))" FP|9: 1 |10:"(#(#f ...)" |11: 9 |12:54 |13: "(#(CLOSURE ...)" |14:"=> recusive1\n"========================CALL # call a=<display>. Note codes is now body ofdisplay. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" FP|14: "=> recusive1\n" |15: () #display has optional-arg'()========================REFER_LOCAL # a="=> recursive1\n" |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" FP|14: "=> recusive1\n" |15: ()========================BRANCH_NOT_NULL # a isnot null so we go to Else. But this is really weird. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" FP|14: "=> recusive1\n" |15: ()========================REFER_LOCAL # a="=> recursive1" |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" FP|14: "=> recusive1\n" |15: ()========================PUSH |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" FP|14: "=> recusive1\n" |15: () |16: "=> recusive1\n"========================REFER_FREE 0 # Point codes + 6 |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" |6:0 |7: 6 |8:"#(#(CLOSURE ...))" |9: 1 |10:"(#(#f ...)" |11: 9 |12:54 |13: "(#(CLOSURE ...)" FP|14:"=> recusive1\n" |15: () |16:"=> recusive1\n"========================TAIL_CALL # call <display>and jump to#(RETURN1 ...)=> recusive1 |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" |6: 0 |7:6 |8: "#(#(CLOSURE ...))" |9:1 |10: "(#(#f ...)" |11:9 |12: 54 |13:"(#(CLOSURE ...)" FP|14: "=> recusive1\n"========================RETURN |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" |6:0 |7: 6 |8:"#(#(CLOSURE ...))" FP|9: 1 # this is ()========================TEST # Return value of display is #t |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" |6:0 |7: 6 |8:"#(#(CLOSURE ...))" FP|9: 1========================CONSTANT |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" |6:0 |7: 6 |8:"#(#(CLOSURE ...))" FP|9: 1========================PUSH |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" |6:0 |7: 6 |8:"#(#(CLOSURE ...))" FP|9: 1 # (1) |10:1 #1========================SHIFTJ140 # Adjust frame for jump. Stack is now for bar call. |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 1========================LOCAL_JMP |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:1========================REFER_LOCAL # a=1 |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 1========================PUSH |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:1 |7: 1========================CONSTANT |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:1 |7: 1========================BRANCH_NOT_NUMBER_EQUAL # Now 1=1. Jump to else. |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:1========================CONSTANT#t |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)" |4: 4 |5:"#(#(CLOSURE ...))" FP|6: 1========================LOCAL_JMP 66 |0:"#(#(FRAME ...))" |1: 0 |2:6 |3: "(#(FRAME ...)" |4:4 |5: "#(#(CLOSURE ...))" FP|6:1========================LEAVE |0: "#(#(FRAME ...))" |1:0 |2: 6 |3:"(#(FRAME ...)"========================RETURN========================HALT
これを脳内で何回も再生し iform と行ったり来たりして実行していくと(bar 1)
を実行した後に local_jump したら戻ってこれないのは自明であることに気づく。この lambda 埋め込みは間違いである。それに気づけばあとは簡単で(and
の途中で埋め込まれてしまったのは(bar 1)
呼び出しが末尾再帰と誤認されたのが原因。
この1行の修正でお仕事完了。
Scheme で MNIST からの学び - higepon blog の続き。
前回のプロトタイプをもとにMosh のシステムライブラリとして f64array というものを導入した。double 型の2次元の行列を表現したもの。追加した手続きは最低限のもの。
f64array の構築、setter/getter、dot product 。プロファイラで見てみると行列の shape を返す手続きも無視できない回数呼ばれている。おそらく行列同士の計算での broadcast で呼ばれているのだろう。
R7RS では処理系ごとの違いをうまく扱える仕組みが用意されているのでGauche とMosh 両方で動くように整えてみよう。行列の初期化で make-normal-generator を使うのだがGauche にはSRFI-194 がないようなので data.random ライブラリから似たような手続きを持ってきて rename している。ところでSRFI-194 の author は Shiro さんなのであった。
(cond-expand [gauche (import(rename (data random) (reals-normal$ make-normal-generator)))] [(library (srfi194)) (import(only (srfi 194) make-normal-generator))])
またMosh では .mosh.sld という拡張子のライブラリを優先して読み込むという機能があるので以下のようにMosh が読み込むライブラリ、Mosh 以外が読み込むライブラリを分けることができる。ここで行列の実装を分岐している。
そしてどちらの行列ライブラリにも共通なコードは(include "matrix-common.scm")
のように直接ファイルを include する。
というわけでめでたくMosh とGauche の両方で動く MNIST デモができました。他のR7RS 処理系でも少しの手直しで動くのではないかと期待している。PR 待ってます。コードはmosh/tests/mnist at master · higepon/mosh · GitHub。
久しぶりにMoshScheme に触っている。良い機会なのでゼロから作るDeep Learning ―Pythonで学ぶディープラーニングの理論と実装 を参考にScheme で動く MNIST デモを実装しようと思い立つ。MNIST は 0 から 9 までの手書き数字の認識デモ。Neural Networkフレームワークのデモやチュートリアルでよく使われるあれだ。なおScheme にはPython でいうところの numpy は存在しないので必要な行列演算はすべてScheme で実装することとする。
特に工夫のないナイーブな実装。3重ループ。
(define (matrix-mul a b) (unless (= (matrix-shape a1) (matrix-shape b0)) (error"matrix-mul shapes don't match" (matrix-shape a) (matrix-shape b))) (let* ([nrows (matrix-shape a0)] [ncols (matrix-shape b1)] [m (matrix-shape a1)] [mat (matrix nrows ncols)]) (define (mul row col) (let loop ([k0] [ret0]) (if (= k m) ret (loop (+ k1) (+ ret (* (mat-at a row k) (mat-at b k col))))))) (do ((i0 (+ i1))) ((= i nrows) mat) (do ((j0 (+ j1))) ((= j ncols)) (mat-at mat i j (mul i j))))))
2層の NN を hidden_size = 10 batch_size=3 で動かしてみたが 1 batch 回したプロファイラの結果。matrix-mul は 32165 回呼ばれ 95%(98 秒)を占めている。
Time% msec calls name location 95 98500 32165 (matrix-mul a b) <transcoded-textual-input-port <binary-input-port tests/mnist.scm>>:175 1 1140 64344 (matrix-map proc a) <transcoded-textual-input-port <binary-input-port tests/mnist.scm>>:96
1 は明らかに遅いのでC++ 側で実装してみる。98秒->75秒だが思ったよりも速くない。逆に言えばVM ループは思ったよりも速い。
92 74590 32165 matrix-mul 1 680 64543 (matrix-element-wise op a...) <transcoded-textual-input-port <binary-input-port tests/mnist.scm>>:186
ここでC++ 側の実装を注意して見る必要がある。途中の計算結果の保持にScheme Object を使っていること。行列に Fixnum もしくは Flonum が存在するので型チェックをしてくれる Arithmetic::add と mul を利用して計算していること。これだとmost inner loop で heap allocation が発生してしまう。
Object ret = Object::makeVector(numRowsA);for (size_t i =0; i < numRowsA; i++) { ret.toVector()->set(i, Object::makeVector(numColsB)); }for (size_t i =0; i < numRowsA; i++) {for (size_t j =0; j < numColsB; j++) { Object value = Object::makeFlonum(0.0);for (size_t k =0; k < numColsA; k++) { Object aik = a->ref(i).toVector()->ref(k); Object bkj = b->ref(k).toVector()->ref(j); value = Arithmetic::add(value, Arithmetic::mul(aik, bkj)); } ret.toVector()->ref(i).toVector()->set(j, value); } }
もしも UniformVector なら内部データの型チェックは不要になり、すべての計算を double のまま行うことが可能なはず。これを模した適当な実装をやってみる。不正なデータを入力されたらインタプリタが落ちてしまうがデモなのでよしとする。98秒が60m秒になり実用的な速度に落ち着いた。heap allocation は避けたいと思わせる結果。
1 60 32165 matrix-mul
double value =0.0;for (size_t k =0; k < numColsA; k++) { Object aa = a->ref(i).toVector()->ref(k); Object bb = b->ref(k).toVector()->ref(j);double aik = aa.isFixnum() ? aa.toFixnum() : aa.toFlonum()->value();double bkj = bb.isFixnum() ? bb.toFixnum() : bb.toFlonum()->value(); value += aik * bkj; } ret.toVec
a.t
は a の転置。(t a)
だと若干意味合いが違うし広すぎる。Quote saved.
Login to quote this blog
Failed to save quote. Please try again later.
You cannot quote because this article is private.