【C语言深入】陷阱:数组溢出导致内存被意外修改

C语言的指针在提供编程便利的同时,却带来了很多潜在的内存安全问题。见以下例子:

#include <stdio.h>

int main() {

	char string_buff[12];
	unsigned int i_not_zero = 0xFF;

	sprintf(string_buff, "Hello,world!");

	printf("i = %x\r\n", i_not_zero);

	return 0;
}

该程序(不正确)的输出为:

i = 0

上述代码使用sprintf修改了string_buff指针所指向的char型数组。但是由于在申明数组长度的时候,没有考虑到字符串结束符’\0’,所以实际写入时不慎篡改了下一位内存地址的内容(此例中为i_not_zero, 使用MinGW gcc)。在实际程序中,此类bug一般很难被发现,尤其是还有其他程序在正常修改该值时,一般先会去排查和该变量有关的程序。

这只是一个因为不慎所导致的内存溢出问题,而在一些极端的黑客代码中,经常会见到通过内存变量和函数的指针地址反向访问、修改堆栈,从而获得系统的权限。可见指针作为C语言的一个重要(但是晦涩的)组成部分,无形中降低了系统的可靠性和安全性,需要挑战programmer的debug能力。

你好,2017!

又是新的一年。岁月匆匆,不给人时间回头看看,就又让人上路了。

今年更新了10篇左右的博文:

  • 新增的”C语言深度”专题来源于我在嵌入式课程助教过程中发现的C语言的错误使用。我觉得C和C++是机器人领域最重要的两个语言,所以希望在这方面增加一些内容。
  • 另几篇博文是介绍新发布的树莓派3代的。今年除了树莓派3,还入手了若干树莓派Zero。因为Zero很难买到,所以一下屯了5、6个。这批Zero准备用在智能家庭的节点中,但是应用场景目前还不明确,所以没有给大家做专题介绍。

明年的工作计划:

  • 承诺的智能家庭系统还没有完全开发完成,全部完成后会找时间公布。目前已完成的部分:中心服务器的部署,NAS,多媒体中心,一个传感器节点部署好了(已经上报了半年的温湿度数据)。二氧化碳、PM2.5传感器和无线组网模块选购好了,但还没有时间调试。另一个困难是控制数据的下发和传感器数据、系统参数的展现,我想基于BS架构(Flask + socket.io / Node.js + Ajax)。我没有网页编程的基础,而且中间涉及Real-time和asynchronous的问题,所以还没有时间解决。
  • More topics on Robotics. 实验室还是以机器人为主题的,去年有点跑偏了(嵌入式系统),今年重新回归到主题上。重点我想要关注的内容点有:增强学习、概率决策、机器视觉、ROS、机器(深度)学习。
  • 博主现在长期在国外生活,关于将网站转换为全英文的想法已有很久。但是还是一直很挣扎,考虑到很多内容对国内的读者会有帮助,所以今年还是保留双语写作。

最后祝大家2017年工作、学习顺利!

【C语言深入】陷阱:数组指针作为函数参数返回

再来看一个指针问题,同样的来自一个本科生的代码。这段代码想要实现将一个全是小写字母的字符串转换成对应的大写字母字符串:

char *covert_to_upper_case(char *string) {
    char p[100];
    int i = 0;
    
    for(; i < strlen(string); i++ ) {
            p[i] = string[i] - ('a' - 'A'); 
    }
    p[i] = '\0';
    
    return p;
    
}

然而这段代码没有能实现期望的功能。原因如下:

  1. 主程序调用convert_to_upper_case()函数后,堆栈为p分配了内存空间;
  2. 函数体正确修改了p对应字符数组的内容,并将p的首地址作为指针返回;
  3. 函数返回后,所有临时变量从堆栈中弹出,包括p[100];
  4. 主程序得到返回的指针,对其进行解析。然而指针指向的字符数组此时已经从堆栈中弹出,解析后的数据无法被定义。

要想正确实现对应的功能,应该将目标指针作为额外参数传递给该函数,并由上层调用者提供内存空间的创建。当然也可以使用malloc()将内存分配在堆中,但是需要注意使用对应的free()释放空间,否则会有内存泄露的问题。

【C语言深入】指针的一个错误赋值

关于指针总是有说不完的故事。

最近给本科的学生带Embedded System课程设计,遇到了一个非常奇怪的bug。有一段代码需要实现I2C通信,核心代码已经由软件库提供了,学生只需要设置结构体后调用API即可。一个学生的代码是这样的:

struct I2C_CONFIG {
  // ...
  char *i2c_buff;
  int length;
  // ...
};

struct I2C_CONFIG cfg;
char *i2c_buff;

void I2C_init() 
{
  // ...
  cfg.buff = i2c_buff;
  cfg.length = sizeof(buff);
  // ...
}

void I2C_send(new_buff)
{
  // ...
  i2c_buff = new_buff;
  I2C_MasterTransferData(LPC_I2C1, cfg);
  // ...
}

初看一下没有什么问题:在I2C_init()函数中首先对结构体cfg进行初始化,而在I2C_send()函数中设置了需要发送的数据指针,之后使用I2C的API发送数据。

因为代码一直无法实现期望的功能,我又仔细看了一下其中的蹊跷。我注意到,这段代码中使用了一个中间变量:char *i2c_buff。在I2C_init()中虽然将cfg.buff指向了i2c_buff,但是因为cfg.buff本身也是指针变量,而非”指向指针的指针”,所以这里只实现了简单的按值传递,即将i2c_buff的值 (初始值为0) 赋给了cfg.buff。之后虽然在I2C_send()中修改了临时变量i2c_buff指向的位置,但却没有影响到cfg.buff中的内容,cfg.buff依然指向之前i2c_buff初始化时指向的内存地址,所以需要发送的缓冲指针new_buff其实并没有传递给之后的I2C_MasterTransferData()函数!为了解决这个问题,必须将更改后的i2c_buff的值再次赋给cfg.buff,即:

void I2C_send(new_buff)
{
  // ...
  i2c_buff = new_buff;
  cfg.buff = i2c_buff;
  I2C_MasterTransferData(LPC_I2C1, cfg);
  // ...
}

另外这段代码还有一个不容易注意的bug,就是在I2C_init()中使用了sizeof()来判断buffer的大小。因为sizeof()函数得到的只是数据类型的大小,所以对于指针char *i2c_buff来说,sizeof(i2c_buff) = 4,而不会返回buffer的实际大小。指针的大小并不等于指针指向缓冲的大小!

The Limitations of Classical PID Controller and Its Advanced Derivations

Since founded by N. Weiner in 1947, the control theory has been evolved for more than 60 years and is still full of challenges and opportunities. The most important principle of the control theory, in my opinion, is the feedback mechanism. Without feedback and closed-loop, almost no algorithm and control technique can be implied. The idea of feedback is that by comparing the reference input and the actual output, an error signal can be obtained and then can be used by the controller to trace and eliminate the difference between the input and the output. Apart from Watt’s steam engine, one could say that the first formally implication of (negative) feedback is the amplifier invented by H.S. Black. It is a genius idea when first came out in 1927 and was proved to be an extremely useful way to solve electronic and control problems. The idea of output feedback has also been extended to state feedback and error feedback to achieve state control and estimation in more advanced control techniques.

Classical control is the foundation of control theory and it is more concentrated on analysing the stability and performance of a controlled plant. However, only linear and SISO systems have been discussed in classical control theory. Although traditional control techniques such as PID controller are still widely used in industry, they cannot handle more complex engineering scenarios such as aerospace, chemistry and biology. Another problem of classical control is that all parameters are designed and tuned based on the current system model, in which case the system will be more vulnerable to further disturbance and parameters varying.

In order to solve these problems of classical PID controller which mentioned before, more advanced approaches have been derived nowadays. If using classical approach to control a MIMO system, one should divide the system into different modes and control each mode separately. However if the system inputs and outputs are coupled with each other, it cannot be decoupled and this method will not be practicable anymore. Here comes the state-space method, which solved the limitation of classical control by using state variables. The advantage of state-space is that it can be represented by matrices and such is very computer-friendly. State-space representation is actually defined in time domain instead of frequency domain and every state can have some extend of physical meaning which gives some clues about what is happening inside a controlled plant. One milestone which makes the state-space method more practicable is the invention of Kalman filter. Kalman filter uses a series of history measurements in the presence of noise to estimate the current state of the system. Kalman filter can work as a state estimator or simply a special filter which uses the physical system model to remove the process and the measurement noise.

Optimal control method such as MPC and LQR is another derivation of classical control. In most circumstances, there are more than one possible control inputs which can drive the system to work properly, but we need is to find the optimal one. Optimal control actually transforms the control problem into an optimal problem which tries to minimise an objective function to get the best outcome. Another advantage of optimal control is that it can take constraints into consideration. One defect of PID controller is that it cannot handle system constraints like actuator saturation or output limitation. In the optimal case control, design a controller with constraints could be feasible.

It is also known that no system is constant and some parameters are likely to vary with time or to the working condition. In classical control, the controller is designed just for the current system model and thus may loss performance or even be unstable due to the system change and uncertainties. In such aspect, adaptive control or robust control may be more applicable. Both adaptive control and robust control are designed to cope with uncertainties. The difference is that adaptive control identifies the system model and changes its parameters in real-time, but robust control fixed its parameters after deployed to the plant. For the truth that adaptive control has to calculate the system model every few periods, it needs much more computational time. What’s more, since the control parameters in the adaptive controller are changing every time, it may be difficult to prove its stability.  On the other hand, the gain of robust controller has already been designed before applied to the system, so it doesn’t need to do additional calculation during the operation. Since robust controller is globally optimised and especially designed to handle uncertainties, it may not have a performance as good as other controllers. But since the real control problems are always not ideal, it is meaningful to take uncertainties and disturbance into the system model.

Some more advanced control techniques such as neural network and expert control are being discussed today. In my opinion, these new approaches have the potential to be the next generation of control theory. With the developing of computer science, it is now possible to model extremely complex networks. This kind of controller can actually take all the possible system states and its corresponding solutions into a database and each time just search for the best solution according to the current system data.  New techniques such as machine learning can also be absorbed into the controller and make the controller more flexible which can handle different control problems using a same configuration.

However, no matter how powerful the control method is, there are rarely situations where we do not need to make trade-offs. As human-beings, we always need to make decisions and balance the income and the expense. Being too greedy is like giving an infinite gain to a helicopter, which may work at the beginning but will suddenly crash whenever there is any disturbance. So push yourself while keep in mind that you have limitation. Take it easy, be adaptive to the environment and always try to get the optimal solution of your life.

REFERENCES

[1] R.C. Dorf & R.H. Bishop, Modern Control Systems (Twelfth Edition), Pearson, USA.

[2] Wikipedia, Harold Stephen Black. Available at: http://en.wikipedia.org/wiki/Harold_Stephen_Black. Last accessed 26th Mar 2014

[3] E.F. Camacho and C. Bordons, Modern Predictive Control, Springer, London, 2003