c语言实现lua函数调用次数和耗时统计

之前写了一个纯lua实现的模块纯lua实现统计函数运行时间和调用次数。但是统计的hook函数本身就太耗时了，简单使用还行，如果想要在本身就对性能比较敏感的地方使用，就完全不可行了。所以写了一版纯c语言的版本。

实现的完整代码在：https://github.com/rondsny/lua-profile/blob/main/lua-profiler.c

设置hook

原理是一样的，只是换了c实现而已。lua的c api提供了lua_sethook函数。

void lua_sethook (lua_State *L, lua_Hook f, int mask, int count);
设置一个调试用钩子函数。

参数 f 是钩子函数。 mask 指定在哪些事件时会调用： 它由下列一组位常量构成 LUA_MASKCALL， LUA_MASKRET， LUA_MASKLINE， LUA_MASKCOUNT。 参数 count 只在掩码中包含有 LUA_MASKCOUNT 才有意义。 对于每个事件，钩子被调用的情况解释如下：

call hook: 在解释器调用一个函数时被调用。 钩子将于 Lua 进入一个新函数后， 函数获取参数前被调用。
return hook: 在解释器从一个函数中返回时调用。 钩子将于 Lua 离开函数之前的那一刻被调用。 没有标准方法来访问被函数返回的那些值。
line hook: 在解释器准备开始执行新的一行代码时， 或是跳转到这行代码中时（即使在同一行内跳转）被调用。 （这个事件仅仅在 Lua 执行一个 Lua 函数时发生。）
count hook: 在解释器每执行 count 条指令后被调用。 （这个事件仅仅在 Lua 执行一个 Lua 函数时发生。）

我们同样监听LUA_HOOKCALL和LUA_HOOKRET这两种事件。对应的是一个函数被调用和返回时候的hook。对于尾调用其实意义不是很大，我们从上一层统计也能包含，只是统计的位置会不一样，所以这里不对尾调用的情况额外区分统计。

在调用lua_sethook的时候，我们第三个参数传入LUA_MASKCALL和LUA_MASKRET。

1	lua_sethook(L, profiler_hook, LUA_MASKCALL \| LUA_MASKRET, 0);

统计

统计上也是类似的，只不过c这边需要一个map结构，本来是想起一个lua_State，然后用来记录统计数据，因为lua的table结构本身就是一个非常好的map结构。后来考虑尽量减少性能损耗，就额外实现了哈希表结构。这里其实只是需要一个哈希表而已，实现只要没问题都可以。

首先是监听到call的时候，创建一个item用于记录调用时间和函数信息，然后塞入到一个后入先出的数组结构中。

当发生return的时候，我们就从数组中拿出最后一个item，这个item必然是当前return的call（此处把尾调用的return记录到最开始的call里面）。此时进行时间差统计，并且去除调用函数的时间，则是当前函数的执行时间。然后统计到哈希表中。如果没有则创建记录，有则把执行时间和调用次数统计上去。

static void profiler_hook(lua_State *L, lua_Debug *arv) {
    lua_Debug ar;
    lua_getstack(L, 0, &ar);
    lua_getinfo(L, "nSlf", &ar);

    if (arv->event == LUA_HOOKCALL) {
        const void* point = lua_topointer(L, -1);
        uint64_t now = get_high_precision_time();
        struct call_item* item = (struct call_item*)malloc(sizeof(struct call_item));
        item->point = point;
        item->name = ar.name ? ar.name : "?";
        item->source = ar.source ? ar.source : "?";
        item->line = ar.currentline;
        item->call_time = now;
        item->sub_cost = 0;
        push_call_item(item);
    }
    else if (arv->event == LUA_HOOKRET) {
        struct call_item* item = pop_call_item();
        if (item == NULL) {
            return;
        }
        uint64_t now = get_high_precision_time();
        uint64_t cost = now - item->call_time;
        uint64_t key = (uint64_t)((uintptr_t)item->point);
        profile_record* record = (profile_record*)game_hashtbl_get(hashtbl, key);
        if(record==NULL){
            record = (profile_record*)malloc(sizeof(profile_record));
            record->call_count = 1;
            record->total_time = cost - item->sub_cost;
            record->line = item->line;
            strncpy(record->name, item->name, sizeof(record->name));
            strncpy(record->source, item->source, sizeof(record->source));
            game_hashtbl_insert(hashtbl, key, record);
        }else{
            record->call_count++;
            record->total_time += cost - item->sub_cost;
        }
        if(CALL_STACK->head){
            CALL_STACK->head->sub_cost += cost;
        }
        free(item);
    }
}

性能提升

纯lua版本数据：

use profiler_pure_lua
======== profiler start ========
total time is   22.453125
========================================
        time, percent,   count, function
----------------------------------------
52.5470021956,  99.52%, 8160000, deep2                         : test_profiler.lua:1
0.2536309999,   0.48%,     200, call_func                     : test_profiler.lua:14
0.0007711950,   0.00%,       2, loop                          : test_profiler.lua:23
0.0006873300,   0.00%,      10, nono                          : test_profiler.lua:49
0.0004256395,   0.00%,       5, clock                         : [C]:-1
0.0001266750,   0.00%,       1, foo                           : test_profiler.lua:35
0.0000041250,   0.00%,       1, foo2                          : test_profiler.lua:31
0.0000000000,   0.00%,       1, stop                          : ./profiler_pure_lua.lua:80
======== profiler end total_time ==========     52.802648159995

c语言版本数据：

use c unction
total time is   2.0
0.000321 (0.01%)           1                 print  =[C]:-1
0.000169 (0.00%)           2                     ?  =[C]:-1
0.000011 (0.00%)           1                 clock  =[C]:-1
0.000021 (0.00%)          10                  nono  @test_profiler.lua:50
0.000010 (0.00%)           1                   foo  @test_profiler.lua:36
0.000000 (0.00%)           1                  foo2  @test_profiler.lua:32
0.000167 (0.00%)           2                  loop  @test_profiler.lua:24
0.035776 (0.68%)         200             call_func  @test_profiler.lua:15
5.232884 (99.31%)     8160000                 deep2  @test_profiler.lua:2
total time is   5.269360

以上例子可以看到，c语言版本运行统计耗时是5.26秒，纯lua的耗时是52.8秒。而实际的运行时间，没有hook的耗时是0.59秒。c语言版本是2秒，纯lua的耗时是22.45秒。

从数据上看，我们并不允许线上产生22秒的卡顿，但是极端情况下，卡2秒可能还是可以接收的。改成c语言实现后，甚至可以在线上使用。