reflection 反射编程与 go 序列化原理

reflection 反射编程

什么是 reflection

在计算机领域中，反射式编程（reflective programming）或反射（reflection），是指计算机程序在运行时（runtime）可以访问、检测和修改它本身状态或行为的一种能力。一言以蔽之反射就是程序在运行的时候能够“观察”并且修改自己的行为。

在编程语言中还有一个跟 reflection 类似的概念 introspection（自省），其实这两者不需要特意去区分，Wikipedia 中有一段关于两者关系的解释：

In computing, type introspection is the ability of a program to examine the type or properties of an object at runtime. Some programming languages possess this capability.

Introspection should not be confused with reflection, which goes a step further and is the ability for a program to manipulate the values, meta-data, properties and/or functions of an object at runtime. Some programming languages, e.g. Java, also possess that capability.

不同语言中的 reflection

Go

package main

import (
	"fmt"
	"reflect"
)

type Foo struct{}

func (f *Foo) Hello() {
	fmt.Println("hello")
}

func main() {
	// Without reflection
	f := Foo{}
	f.Hello()

	// With reflection
	fT := reflect.TypeOf(Foo{})
	fV := reflect.New(fT)

	m := fV.MethodByName("Hello")
	if m.IsValid() {
		m.Call(nil)
	}
}

Python

class Foo:
	def hello(self):
		print('hello')

# Without reflection
obj = Foo()
obj.hello()

# With reflection
obj = globals()['Foo']() # globals() Return a dictionary representing the current global symbol table.
getattr(obj, 'hello')()  # getattr(object, name) Return the value of the named attribute of object.

# With eval
eval('Foo().hello()')

Java

import java.lang.reflect.Method;

// Without reflection
Foo foo = new Foo();
foo.hello();

// With reflection
try {
    // Alternatively: Object foo = Foo.class.newInstance();
    Object foo = Class.forName("complete.classpath.and.Foo").newInstance();

    Method m = foo.getClass().getDeclaredMethod("hello", new Class<?>[0]);
    m.invoke(foo);
} catch (Exception e) {
    // Catching ClassNotFoundException, NoSuchMethodException
    // InstantiationException, IllegalAccessException
}

reflection 优缺点

优点

反射让静态类型语言在运行时具备了某种基于类型信息的动态特性。
可以在一定程度上避免硬编码，提供灵活性和通用性。
可以在运行时像对待源代码语句一样动态解析字符串中可执行的代码。

缺点

反射编程需要较多的高级知识，包括框架、关系映射和对象交互，以实现更通用的代码执行，需要非常多的时间和经验的积累才能避免反射引入的一些问题。
反射的概念和语法都比较抽象，过多地滥用反射会使得代码可读性降低。
反射将部分信息检查工作从编译期推迟到了运行期，调用方法和引用对象并非直接的地址引用而是通过 reflection 提供的一个抽象层间接访问，这虽然提高了灵活性但却牺牲了性能。
由于避开了编译器的严格检查，所以一些不正确的修改会导致程序 panic。

Go 序列化研究

编程中有一个广泛应用的场景：序列化与反序列化，Go 语言的序列化与反序列化实现就用到了非常多的 reflection 技术，下面就通过研究 Go encoding/json 包的序列化实现来深入理解一下 Go 语言的 reflection。

Go reflection 包

Go 里面有一个 reflect 包用于反射操作，最常见的就是 TypeOf 和 ValueOf 这 2 个方法，通过这 2 个方法，我们可以非常轻松的获取和修改对象的类型和属性，详细的可以参考文档：

package main

import (
	"fmt"
	"reflect"
)

type Item struct {
	Name  string `json:"name"`
	Price int    `json:"price"`
}

func (r Item) Hello() string {
	fmt.Printf("name = %s\n", r.Name)
	return r.Name
}

func main() {
	i := Item{
		Name:  "Jason",
		Price: 10,
	}
	t := reflect.TypeOf(i)
	v := reflect.ValueOf(i)
	// type 值
	fmt.Printf("%v\n", t)
	fmt.Printf("%v\n", t.Name())
	fmt.Printf("%v\n", t.Kind())
	fmt.Printf("%v\n", t.String())
	fmt.Printf("%v\n", t.Field(0))
	fmt.Printf("%v\n", t.NumField())
	fmt.Printf("%v\n", t.Field(1))
	// value 值
	fmt.Printf("%v\n", v)
	// 第1个成员值
	fmt.Printf("%v\n", v.Field(0))
	// 第2个成员值
	fmt.Printf("%v\n", v.Field(1))
	fmt.Printf("%v\n", v.FieldByName("Price"))
	// 调用方法
	fmt.Printf("return value of call: %v\n", v.Method(0).Call(nil))
	fmt.Printf("return value of call: %v\n", v.MethodByName("Hello").Call(nil))
	fmt.Printf("struct methods num = %v\n", v.NumMethod())

	//修改，注意这里传的是地址
	m := reflect.ValueOf(&i)
	fmt.Println(v.Kind())
	fmt.Println(m.Kind())
	// Elem() 获取 interface/ptr 指向的内存
	m.Elem().Field(0).SetString("JASON")
	m.Elem().FieldByName("Price").SetInt(100)
	fmt.Printf("%v\n", i.Name)
	fmt.Printf("%v\n", i.Price)

	fmt.Printf("%v\n", m.Type().Elem().Field(0).Tag.Get("json"))
}

P.S. go reflect 的底层实现暂不深究，主要依赖的是 go 的 interface 底层实现，可以参考之前的 interface 底层原理文章 ebpf 探测 go 应用的 interface 变量。

JSON Marshal

Go 的 json 序列化非常方便，如下所示简单的样例代码：

package main

import (
    "encoding/json"
    "fmt"
)

type Item struct {
    Name  string `json:"name"`
    Price int    `json:"price"`
}

func main() {
    i := Item{
        Name:  "Jason"
        Price: 10,
    }
    res, err := json.Marshal(i)
    if err != nil {
	    fmt.Println("marshal failed, error: ", err.Error())
	    return
    }
    fmt.Printf("%s\n", res)
}

接下来就跟随 json.Marshal 的源码实现来深入理解一下 Go 官方库是如何实现序列化的：

1. json.Marshal

func Marshal(v any) ([]byte, error) {
	e := newEncodeState()
	defer encodeStatePool.Put(e)

	err := e.marshal(v, encOpts{escapeHTML: true})
	if err != nil {
		return nil, err
	}
	buf := append([]byte(nil), e.Bytes()...)

	return buf, nil
}

新建一个序列化的编码状态机，newEncodeState() 用于组装这个序列化字节数组
e.marshal 做实际的序列化操作

2. e.marshal

func (e *encodeState) marshal(v any, opts encOpts) (err error) {
	defer func() {
		if r := recover(); r != nil {
			if je, ok := r.(jsonError); ok {
				err = je.error
			} else {
				panic(r)
			}
		}
	}()
	e.reflectValue(reflect.ValueOf(v), opts)
	return nil
}

序列化状态机根据需要序列化的对象 v 的反射值 reflect.ValueOf 进行序列化编码，主要实现在 e.reflectValue

3. e.reflectValue

func (e *encodeState) reflectValue(v reflect.Value, opts encOpts) {
	valueEncoder(v)(e, v, opts)
}

func valueEncoder(v reflect.Value) encoderFunc {
	if !v.IsValid() {
		return invalidValueEncoder
	}
	return typeEncoder(v.Type())
}

func typeEncoder(t reflect.Type) encoderFunc {
	if fi, ok := encoderCache.Load(t); ok {
		return fi.(encoderFunc)
	}

	// To deal with recursive types, populate the map with an
	// indirect func before we build it. This type waits on the
	// real func (f) to be ready and then calls it. This indirect
	// func is only used for recursive types.
	var (
		wg sync.WaitGroup
		f  encoderFunc
	)
	wg.Add(1)
	fi, loaded := encoderCache.LoadOrStore(t, encoderFunc(func(e *encodeState, v reflect.Value, opts encOpts) {
		wg.Wait()
		f(e, v, opts)
	}))
	if loaded {
		return fi.(encoderFunc)
	}

	// Compute the real encoder and replace the indirect func with it.
	f = newTypeEncoder(t, true)
	wg.Done()
	encoderCache.Store(t, f)
	return f
}

e.reflectValue 根据反射获取的类型创建该类型的序列化编码器
typeEncoder 根据反射类型解决类型递归的问题之后，通过 newTyepEncoder 进行静态类型的序列化编码

4. newTypeEncoder

func newTypeEncoder(t reflect.Type, allowAddr bool) encoderFunc {
	// If we have a non-pointer value whose type implements
	// Marshaler with a value receiver, then we're better off taking
	// the address of the value - otherwise we end up with an
	// allocation as we cast the value to an interface.
	if t.Kind() != reflect.Pointer && allowAddr && reflect.PointerTo(t).Implements(marshalerType) {
		return newCondAddrEncoder(addrMarshalerEncoder, newTypeEncoder(t, false))
	}
	if t.Implements(marshalerType) {
		return marshalerEncoder
	}
	if t.Kind() != reflect.Pointer && allowAddr && reflect.PointerTo(t).Implements(textMarshalerType) {
		return newCondAddrEncoder(addrTextMarshalerEncoder, newTypeEncoder(t, false))
	}
	if t.Implements(textMarshalerType) {
		return textMarshalerEncoder
	}

	switch t.Kind() {
	case reflect.Bool:
		return boolEncoder
	case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
		return intEncoder
	case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
		return uintEncoder
	case reflect.Float32:
		return float32Encoder
	case reflect.Float64:
		return float64Encoder
	case reflect.String:
		return stringEncoder
	case reflect.Interface:
		return interfaceEncoder
	case reflect.Struct:
		return newStructEncoder(t)
	case reflect.Map:
		return newMapEncoder(t)
	case reflect.Slice:
		return newSliceEncoder(t)
	case reflect.Array:
		return newArrayEncoder(t)
	case reflect.Pointer:
		return newPtrEncoder(t)
	default:
		return unsupportedTypeEncoder
	}
}

func boolEncoder(e *encodeState, v reflect.Value, opts encOpts) {
	b := e.AvailableBuffer()
	b = mayAppendQuote(b, opts.quoted)
	b = strconv.AppendBool(b, v.Bool())
	b = mayAppendQuote(b, opts.quoted)
	e.Write(b)
}

func AppendBool(dst []byte, b bool) []byte {
	if b {
		return append(dst, "true"...)
	}
	return append(dst, "false"...)
}

newTypeEncoder 根据具体的值类型实现序列化编码，以 bool 类型为例将布尔类型值对应的序列化字节数据（"true" 或者 "false"）追加到 buffer 中，最后序列化编码状态机将 buffer 写到缓存
reflectValue 递归的将对应序列化的字节数组写入到列化编码状态机的缓存中直到所有字段都完成序列化

关于 Go reflect 的三大原则

Go 语言之父 Rob Pike 曾经提到过的 reflection laws：

Reflection goes from interface value to reflection object.
Reflection goes from reflection object to interface value.
To modify a reflection object, the value must be settable.

结合 Rob Pike 的反射三定律，关于合适使用 reflection 编程技术的 Go 中的使用我的想法是：

在库和框架内部适当使用反射特性，将复杂的逻辑封装在内部，复杂留给自己，暴露给使用者的接口都是简单的。
除去库和框架以外的业务逻辑代码没有必要使用反射。
不到万不得已不把反射作为第一解决方法。