New C features proposal

Michał ‘mina86’ Nazarewicz | 18 kwietnia 2010

As the committee gathered to discuss how the new C standard will look like I did some thinking of my own. I thought about features that I would love to see in C. I even collected thoughts of my twisted mind and condensed them into a text file.

What is outrageous is that since I believe information and ideas want to be free and shared I have decided to post my concepts on the net instead of imprisoning them in my wicked brain.

Maybe someone will find it useful somehow. Maybe even some committee member will read those and bring them out on the next meeting.

This post has also been sent to comp.std.c newsgroup.

I Default Function Argument

  1. The default value of a function’s argument has proved to be useful in C++ programs as well as in other languages which has such a notion. It allows for more condensed function call in the usual cases still allowing for a configuration when required.
  2. Syntax and semantics of such feature should be the same as in C++.
  3. Example:

    struct node {
        struct node *next;
        int key;
        const char *value;
    };
    
    struct void *get(struct node *first, int key, const char *def = NULL) {
        for (; first; first = first->next) {
            if (first->key == key)
                return first->value;
        return def;
    }
  4. In the current standard, such syntax it is invalid thus it will not affect existing code. Moreover, many C implementations are also capable of accepting C++ code thus implementing this feature would be an easy task in many such cases.

II Named Arguments

  1. It has been observed that calls to functions with many arguments are somehow unreadable since it is hard to keep track of which argument means what.
  2. To mitigate this problem named arguments could be added to the C language with a syntax similar to one used in structures and union initialisation.
  3. Example:

    void message(const char *msg,
                 enum level .level, enum colour .colour);
    
    /* … */
    message("A message", .level = LVL_INFO, .colour = WHITE);
    message("An error", .colour = RED, .level = LVL_ERROR);
  4. The name of the argument would become part of a function’s prototype thus the following would be illegal:

    void f1(int .bar);
    void f1(int .baz); /* different name */
    void f2(int bar);
    void f2(int .bar); /* unnamed vs. unnamed */
  5. If different names (or lack of names) was used in different translation unit behaviour would be still well defined as long as order and types of arguments match.
  6. Similarly, when taking function’s address or casting between pointers to functions the names of arguments would be ignored, hence the following would be still well defined and perfectly legal:

    int cmp(const void *.a, const void *.b)
    
    int arr[10];
    
    qsort(arr, sizeof arr / sizeof *arr, sizeof *arr, cmp);
    cmp(.a = &arr[0], .b = &arr[1]);
    
    int (*foo)(const void *.x, const void *.y);
    foo = cmp;
    foo(.x = &arr[0], .y = &arr[1]);
        /* same as cmp(.a = &arr[0], .b = &arr[1]) */
    
    int (*bar)(const void *.b, const void *.a);
    bar = cmp;
    bar(.a = &arr[0], .b = &arr[1]);
        /* same as cmp(.a = &arr[1], .b = &arr[0]) */
  7. Named arguments would have to be specified using their name, thus, with earlier message() declaration, the following would be invalid:

    message("A message", LVL_INFO, WHITE);
  8. Since the name of the argument would have to be specified, they could appear in any order, intermixed with unnamed arguments. For instance:

    void format(const char *format,
                enum level .level, enum colour .colour,
                …) {
        va_list ap;
    
        /* … do something with level and colour … */
    
        va_start(ap, colour);
        vprintf(format, ap);
        va_end(ap);
        putchar('n');
    }
    
    format("%s is ready", dish_name,
           .level = LVL_INFO, .colour = WHITE);
    format(.level = LVL_ERROR, .colour = READ,
           "%s is burnt", cake_name);
    format("we are out of %s",
           .level = LVL_WARN, .colour = WHITE,
           resource_name);
  9. Proposed syntax is an invalid syntax in the current standard so this change would not impact existing code. At the same time, implementations use such syntax in initialisation list thus it would be simplified to use the same syntax in named arguments.
  10. Named arguments would be especially powerful with conjunction with default value. In particular, format() could be defined as:

    void format(const char *format,
                enum level .level = LVL_INFO,
                enum colour .colour = WHITE,
                …);

    And then, previous calls could be shortened to:

    format("%s is ready", dish_name);
    format(.level = LVL_ERROR, .colour = READ,
           "%s is burnt", cake_name);
    format("we are out of %s",
           .level = LVL_WARN,
           resource_name);

III Inlined Structures and Unions

  1. The new standard draft allow declaration of anonymous structures and anonymous unions in structures and unions:

    struct {
        union {
            int i;
            struct {
                short s;
                char c;
            };
        };
    } a;
    
    /* All valid: */
    a.i;
    a.s;
    a.c;
  2. This technique is used already by some C implementations and has proved useful however is limited to a simple case. A syntax where by using inline keyword one could achieve similar effect for named structures (especially defined elsewhere) or fields that otherwise have name could be handy.
  3. Proposed syntax looks as follows:

    inlined-member:
    structure-or-union-type identifieropt inline inline-listopt
    inline-list:
    ( identifiers-listopt )
    ( identifiers-listopt , )
    identifiers-list:
    identifier
    identifiers-list , identifier
  4. The inline keyword would mean that all the members of the structure or union are accessible from namespace the declaration is in.
  5. An optional list of identifier would make only listed identifiers be accessible from the namespace. An empty list would mean no identifiers would be inlined but this would be useful with casting — read further.
  6. An optional identifier would allow accessing the structure as a whole as well as each individual member via the name.
  7. Example:

    struct s1 { int i1, j1; };
    struct s2 { int i2, j2; };
    struct s3 { int i3, j3; };
    
    struct {
        struct s1   inline;
        struct s2 s inline;
        struct s3 a inline(i3);
        struct s3 b inline(j3);
    } s;
    
    /* To reach each of the fields: */
    s.i1;
    s.j1;
    s.i2; /* same as */ s.s.i2;
    s.j2; /* same as */ s.s.j2;
    s.i3; /* same as */ s.a.s3;
    s.j3; /* same as */ s.b.j3;
    s.a.j3;
    s.b.i3;
  8. This allows to nicely build structures that share some common data. For instance:

    struct header_head {
        unsigned length;
        unsigned type;
    };
    
    struct foo_header {
        struct header_head inline;
        unsigned some_field_1;
        unsigned some_field_2
    };
    
    struct bar_header {
        struct header_head inline;
        unsigned some_field_1;
        unsigned some_field_2
    };
    
    union header {
        struct header_head inline;
        struct foo_header foo;
        struct bar_header foo;
    };
    
    
    union header header;
    if (header.type == FOO_HEADER_TYPE) {
        handle_foo(&header.foo);
    } else if (header.type == BAR_HEADER_TYPE) {
        handle_foo(&header.bar);
    } else {
        error();
    }
  9. This would not be limited to defining structure’s or union’s members but could be used in other scopes. For instance:

    void do_stuff(int use_double) {
        union {
            float f;
            double d;
        } inline;
        if (use_double)
            /* … do stuff using "d" */
        else
            /* … do stuff using "f" */
    }
  10. Proposed syntax is not valid C99 syntax so it would not affect existing code.

IV Casting to Inlined Members

  1. Inlined structures and unions would be especially powerful if implicit casting to an inlined member was allowed. This would allow for an object oriented programming without the need for explicit cast and/or run-time checking.
  2. Example:

    struct point {
        double x, y;
    };
    struct circle {
        struct point inline;
        double r;
    };
    
    void moveBy(struct point *p, double x, double y) {
        p->x += x;
        p->y += y;
    }
    
    void paintCircle(struct circle *c);
    
    struct circle *c;
    for (int i = 0; i < 10; ++i) {
        paintCircle(c);
        moveBy(c, 1.0, 1.0);
    }
  3. Should implicit cast be ambiguous it should be disallowed, ie.:

    struct rectangle {
        struct point tl inline(x, y);
        struct point br inline();
    };
    
    struct rectangle *r;
    moveBy(r, 1.0, 1.0); /* illegal */
  4. For compatibility reasons, an explicit cast to a pointer to a type of one of inlined member would produce different results.
  5. Explicit casting to inlined member would not need to be supported by the language as users could name an inlined member and use & operator to acquire it’s address. For instance:

    moveBy(&r->tl, 1.0, 1.0);
    moveBy(&r->br, 1.0, 1.0);
  6. Implicit casting of a pointer to one structure or unione to a pointer to another structure or union is not valid thus this change would not affect exiting code.

V The _Containerof operator

  1. User may sometimes face the following problem: Having pointer to a member of a structure how do I get pointer to the structure? This would be especially useful with conjunction with inlined members and implicit casting to inlined members (as I will show).
  2. To solve this issue the following construct, similar to one used in the Linux kernel, could be used:

    type *_Containerof(pointer, type, accessor);
  3. Where accessor would be a list of identifiers separated by a dot such that if X was an object of type type then X.accessor would yield an object of type the same as *pointer.
  4. The result of the construct would be a pointer to an object X of type type such that &X->accessor == pointer.
  5. For inlined members, either of the following variants could be used:

    type *_Containerof(pointer, type);
    type *_Containerof(pointer, type, auto);

    (I’ll assume the former in the rest of the text.)

  6. Since identifiers starting with an underscore followed by an upper-case letter are reserved for further version of the standard this change would not affect conforming programs.
  7. A C1x specific header (for instance stdc1x.h) could be provided which would define a containerof() macro, as:

    #define containerof(…) _Containerof(__VA_ARGS)

VI Object Oriented Programming

  1. All of the above features would allow for an object oriented programming with minimal need for explicit casts and/or run-time checking:

    struct point {
        double x, y;
    };
    
    void point_init(struct point *p,
                    double x = 0.0, double y = 0.0) {
        p->x = x;
        p->y = y;
    }
    
    
    struct figure {
        const struct figure_vtab *vtab;
        struct point inline;
    };
    
    struct figure_vtab {
        double (*area)(struct figure *f);
    };
    
    void figure_move_by(struct figure *f, double x, double y) {
        f->x += x;
        f->y += y;
    }
    
    double figure_area(struct figure *f) {
        return f->vtab->area(f);
    }
    
    void figure_init(struct figure *f,
                     double x = 0.0, double y = 0.0) {
        point_init(f, x, y);
    }
    
    
    struct circle {
        struct figure inline;
        double r;
    };
    
    double circle_area_vtab(struct figure *f) {
        struct circle *c = containerof(f, struct circle);
        return M_PI * r * r;
    }
    
    void circle_init(struct circle *c, double x = 0.0, double y = 0.0,
                     double r = 1.0) {
        figure_init(c, x, y);
        c->r = r;
    
        c->vtab = &(const struct figure_vtab){
            .area = circle_area_vtab,
        };
    }
    
    
    struct rectangle {
        struct figure inline;
        double width;
        double height;
    };
    
    double rectangle_area_vtab(struct figure *f) {
        struct rectangle *r = containerof(f, struct rectangle);
        return r->width * r->height;
    }
    
    void rectangle_init(struct circle *c, double x = 0.0, double y = 0.0,
                        double w = 1.0, double h = 1.0) {
        figure_init(c, x, y);
        c->width = w;
        c->height = h;
    
        c->vtab = &(const struct figure_vtab){
            .area = rectangle_area_vtab,
        };
    }

VII Function Overloading

  1. Function overloading is a feature that may help programmer concentrate on the general idea of the code he or she is developing without thinking too much about types that are being used.
  2. Function declared in math.h are perfect example of such situation. Existence of functions such as sin`, `sinf and sinl force programmer to think about underlying type whereas it may not be that important.
  3. The situation goes worse when one decides that a different type is to be used. In such situation, all calls to type-dependent functions have to be traced and replaced. This operation is rather boring and error-prone.
  4. Situation gets even more complicated when one wants to develop a code where the type used is dependent on a macro definition or a typedef. This may be even more tricky if the macro/typedef comes from the external library and there’s no clear way of telling what type it is during prepossessing stage.
  5. All such situations would be trivial if function overloading was allowed.
  6. To maximise compatibility with older implementations a syntax taken from C++ could be used for functions with external linkage:

    extern "C++" float mysin(float x) { return sinf(x); }
    extern "C++" float mysin(double x) { return sin(x); }
    extern "C++" float mysin(long double x) { return sinl(x); }

    and analogous syntax far functions with internal linkage:

    static "C++" float mysin(float x) { return sinf(x); }
    static "C++" float mysin(double x) { return sin(x); }
    static "C++" float mysin(long double x) { return sinl(x); }
  7. In the current standard, function overloading is invalid thus it will not affect existing code. Moreover, many C implementations are also capable of accepting C++ code thus implementing this feature would be an easy task in many such cases.

VIII Type Agnostic Limits Functions

  1. Macros defined in the limits.h header suffer from the same problems described above. It would be nice to have constructs such as _Type_min, _Type_max, etc. which, the same as sizeof, take an expression or type as an argument and return the smallest, biggest etc. value representable by the type.
  2. Identifiers starting by an underscore followed by capital letter are reserved thus this feature would not affect existing code.

IX Nested Static Functions

  1. It is often desirable to limit a scope of an identifier both to avoid erroneous usage from outside of the scope or simply not to pollute the outer scope.
  2. User may define static variables in a local scope so why not let them define static functions in local scope. For instance:

    struct foo;
    void sort_foo(struct foo *foos, size_t n) {
        static int cmp(const void *_a, const void *_b) {
            const struct foo *a = _a, *b = _b;
            /* … */
            return /* … */;
        }
        qsort(foos, n, sizeof *foos, cmp);
    }
  3. Such a function would behave as if it was a normal function expect it could not be accessed via name from outside of to outer function and it would have access to all other static functions and variables defined in the same scope prior to itself.
  4. Current standard does not allow function nesting thus this change would have no effect on existing code.

X Compound Functions

  1. The current standard have introduced compound literals for defining unnamed static objects. The same syntax could be used to define functions:

    struct foo;
    void sort_foo(struct foo *foos, size_t n) {
        qsort(foos, n, sizeof *foos,
              (int ()(const void *_a, const void *_b)){
            const struct foo *a = _a, *b = _b;
            /* … */
            return /* … */;
        });
    }
  2. This change would prevent namespace pollution and allow for shorter notation for callback functions. It would also look quite familiar for users familiar with functional programming.
  3. In the current standard such syntax is invalid thus this change would not affect existing code.

XI Unnamed Arguments

  1. It sometimes happens that a function takes an argument but does not use it. This most common for callback functions or “virtual methods” when simulating object oriented programing.
  2. At the same time, some implementations allow to turn diagnostics informing of unused variables on. Such diagnostics may help in identifying problems in the code.
  3. However, when the two above combined, warnings about unused arguments are produced when it is user’s intend no to use them. Common technique is to cast such an argument to void at the beginning of a function to silence the warning.
  4. C++ allows for a name of an argument to be omitted in function definition which solves the above problem. The same could be implemented in C.
  5. The current standard requires names to be present for all function’s arguments in function definition so this change would not affect conforming code. At the same time, many C implementations are also capable of compiling C++ code thus implementing this feature could be easy and straightforward.

XII The alignof operator

  1. The new C standard draft proposes an alignof operator that takes type as it’s operand. Why cannot it take an expression as it’s operand the same way sizeof does?
  2. The alignof operator is added in the new standard thus this change would not impact existing code any further then adding the alignof did itself.
  3. Since in the current standard, alignof is perfectly valid identifier, why not define an _Alignof operator and add a stdc1x.h header with the following macro:

    #define alignof _Alignof

XIII The Tags

  1. Even though structures, unions and enumerations use different word for identifying the type, they still use the same namespace for tags. In particular, the following is invalid:

    struct foo { int a; };
    union  foo { int b; char c; };
    enum   foo { A, B, C };
  2. How about using different namespaces for those tags?