Opinions On A New Programming Language

Programming, for all ages and all languages.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

So, this is just something I noticed, but my ownership model actually parallels true object-oriented programming and CSP (communicating sequential processes).

Objects (functions) can only share data with other objects if the data is immutable. The exception is that data can be sent (transferred) instead of shared. Once data has been sent, it can no longer be mutated by the sender, unless the receiver sends it back.

Traditional pass-by-value semantics entails creating a copy of data and then sending the copy. The use of "const" implies that data is immutable to both the callee and the caller for the lifetime of the callee. Problems arise when the callee shares the const data with another object that outlives the callee; because the data becomes mutable to the caller again and the object assumes that data is immutable when it's actually now volatile.

One possible solution to this problem is to use a reference counter, and not allow the object to become mutable to the caller again until there are no more references to the object.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Opinions On A New Programming Language

Post by Schol-R-LEA »

Wajideus wrote:So, this is just something I noticed, but my ownership model actually parallels true object-oriented programming and CSP (communicating sequential processes).

Objects (functions) can only share data with other objects if the data is immutable. The exception is that data can be sent (transferred) instead of shared. Once data has been sent, it can no longer be mutated by the sender, unless the receiver sends it back.

Traditional pass-by-value semantics entails creating a copy of data and then sending the copy. The use of "const" implies that data is immutable to both the callee and the caller for the lifetime of the callee. Problems arise when the callee shares the const data with another object that outlives the callee; because the data becomes mutable to the caller again and the object assumes that data is immutable when it's actually now volatile.

One possible solution to this problem is to use a reference counter, and not allow the object to become mutable to the caller again until there are no more references to the object.
If you think about it, this is a very similar problem to two other common issues in language design and semantics: the 'upward local reference problem', where a reference to a local variable is returned by the function or inserted into a mutable argument; and the 'upward funarg problem' for nested functions, in which returning a reference to a nested function creates a closure, capturing the outer function's state, which then has to become persistent.

So what you are actually talking about is capturing a part of the callee's state - but in this case, it isn't the variable, but the condition of the variable. However, since here you are trying to avoid capturing the state accidentally, it simplifies things a bit in terms of runtime handling, though possibly at the cost of some added compile-time analysis.

Now, as a not-very-brief digression, I should mention that this gets a bit involved when talking nested functions, hence why it is called the 'upward funarg problem' - the entire nested state needs to be captured to avoid dangling references in the enclosed function, which could involve capturing several layers of function environments (especially in functional languages and the Lisp family languages, where things like blocks and loops are often syntactics sugar over anonymous functions).

This presents problems in terms of both run-time and compile-time support, and in particular, naive approaches to run-time support either require that all function environments be in heap rather than on the stack, which presents serious problems WRT memory management, or else involve added a lot of behind-the-scenes 'spooky action at a distance' which the programmer wouldn't be able to see. Similarly, the compile-time handling can be pretty complicated, and can also complicate or hamper some optimization rules the compiler designer might want to apply.

This is why C simply forbids nested functions - supporting closures would be a serious break from the rest of the language's semantics, and the compile-time techniques needed to handle it efficiently would have added far more complexity to the compiler than the language designers wanted (especially in 1969, when some methods weren't available to them yet). Some compilers (including GCC) support them, but it is very much a non-standard extension and usually the compiler requires you to explicitly set an option to allow them. Most languages whose syntax and semantics are partially derived from C, such as C++, Java, C#, JavaScript, etc., follow suit for the general case, but many have added (either from the outset or later on) special cases for things like lambda functions, and/or allow nesting but have rules regarding returning function references.

End of digression.

Fortunately for you, this doesn't apply to this specific problem, which is closer to the problem of dangling references to local variables. Still, the compile-time solutions to all three of them are somewhat similar (for languages which do permit upward funargs, and thus support closures), which is why I mentioned it.

One model in particular might help here: copy-on-return semantics, which is similar to copy-on-write semantics but involves making a copy of all or part of the local state from the local environment to the upward one, or giving an error which requires the programmer to make such a copy in order to except the conflict. You probably don't want to have the compiler this automatically in a systems language (more on this below), but you could make it an idiom of the language, possibly enforced by compiler error checking.

The solution that seems most promising to me would be to add a compile-time check to see if references to const arguments might be returned or added to object which has a persistence lifespan beyond that of the function call.

Since (AFAIK) as a practical matter you cannot determine this absolutely through static analysis, you probably want to be conservative and take any case where you can't show it won't be (which should be easier to do in static analysis) as a case where it will occur.

You as the language and compiler designer have a few different options about what to do when it does find such a conflict:
  1. Silently insert an operation that casts away const-ness in the upward reference, possibly by copying the data. This is a simple approach from the programmer's point of view, but it involves spooky-action-at-a-distance - the programmer doesn't know what the compiler and the run-time code are doing. It also adds some compiler complexity and overhead, and may add run-time overhead as well. It is a decent approach for application programming, where that is the kind of unnecessary detail which the language might want to suppress, but in a systems language it is unacceptable.
  2. Make it a error, and require the programmer to explicitly change the code to eliminate the conflict, probably by explicitly casting away const-ness but possibly in some other way. it is a more flexible approach than the previous one, giving the programmer more control over how it is handled, avoids spooky-action-at-a-distance, and doesn't require as much compiler cleverness. This parallels Brendan's favored solution for detecting and resolving other kinds of potential run-time conflicts such as those for array bounds and type ranges, and is a solid model, though it does require additional programmer effort.
  3. Take the previous method as a base case, but give additional syntax and semantics for abstracting it away for some common cases, which would somewhat reduce the cognitive load for the client-programmers while still allowing flexibility and transparency, but at the cost of additional (possible significant) compiler complexity. This can make sense for bounds checking in a type definitions (and is pretty much my own planned solution for those issues), but I am not sure how you would do this in this instance, as it is sort of a case-by-case issue. Still, it is something to think about I suppose - maybe you could add parameter modifiers that tell the compiler how to resolve it, and then give an error message if a conflict exists that doesn't have an applicable modifier so the programmer can either add one or address it in some other way.
  4. Warn about it, but don't require the programmer to handle it. This is a very unsafe approach, but unfortunately is the kind of ostrich attitude seen all too often in existing languages. I would recommend staying away from this sort of wishful thinking.
Note that for strict-FP languages, where data mutability isn't allowed anyway, this wouldn't really come up. It also wouldn't apply non-strict Lisps, as well, as all function arguments follow pass-by-value semantics as a rule regardless of the implementation (which would be up to the compiler to resolve) - if you need to mutate an argument in Lisp, you need to use a macro (or a fexpr, in dialects that support them - though like with macros, it is always possible to add them by writing a pass-through preprocessor, then writing wrapper functions for the read, eval, and compile forms which apply that preprocessor, and finally writing your own REPL that calls your preprocessed reader instead of the standard one).
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

One model in particular might help here: copy-on-return semantics, which is similar to copy-on-write semantics but involves making a copy of all or part of the local state from the local environment to the upward one, or giving an error which requires the programmer to make such a copy in order to except the conflict
The point of ownership semantics is to avoid having to copy by communicating data instead of allowing random access.

Imagine you have a book and a pen; and you can only talk with me by writing. There are 2 scenarios that can unfold.

1. You write your message in the book, and then give it to me. I can show the book to whoever I want; even put it on public display, however you're the only one with a pen so no one else can write to it. Also, you can't write to it either until the book is solely in your possession again. In this case, the book represents immutable data.

2. You write your message in the book, and then give me both the book and the pen. We've now reversed roles. I can write in the book and give you both the book and the pen back, or just the book. In this case, the book represents mutable data that is shared by communication.


This technique is more robust than const-ness because it doesn't allow the caller to change the value of an object that's passed by reference, which in turn prevents dangling pointers caused by the caller destroying an object that's referenced elsewhere because the callee promoted it's scope.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Opinions On A New Programming Language

Post by Schol-R-LEA »

Hmmn, OK. I don't think I have fully understood what you had been saying before about that; I still not sure if I do now, and I will have to review your earlier posts again. This sounds interesting.
Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
User avatar
Schol-R-LEA
Member
Member
Posts: 1925
Joined: Fri Oct 27, 2006 9:42 am
Location: Athens, GA, USA

Re: Opinions On A New Programming Language

Post by Schol-R-LEA »

I meant to get back to this earlier, sorry.
Wajideus wrote:It's definitely better in terms of flexibility and simplicity. It's one of the things I love about languages like Lisp and Self (Smalltalk).
I assume that by that parenthetical that you are explaining that Self is a descendant of Smalltalk, not that it is a version of Smalltalk. The latter would be like saying that C# and Java are the same language (actually, while Self uses the Smalltalk message syntax, semantically they are probably more different than Java and C# are). Just sayin'.

(As a related aside, the tendency of both Lispers and non-Lispers to speak of the Lisp family as just 'Lisp', is... problematic. The differences between the languages in the family - Common Lisp, Scheme, Racket, Clojure, older versions of Dylan, nuLisp - are as great as those between C, C++, C#, Java, JavaScript, PHP, Perl, and the other Algol-family languages which use C's bracket syntax as their structural basis. However, there's a long history of seeing Lisp as a continuum rather than a single language, even among Lisp programmers at least up until the late-1980s, and the one thing they all have in common - using sexprs to represent both built-in forms and functions in a mostly-uniform syntax - means that you can mostly-imitate any other Lisp just with a few functions, macros and read-macros, or even by slapping together a rump meta-circular interpreter for the other language that you can call, so the lines can easily get blurred. steps off of soapbox)
Wajideus wrote:The downside though is that you pay for the overhead of runtime compilation and dynamic binding. This specific language is intended for the same niche as C/C++, so that's a tradeoff I'm not willing to make.
Yeah... well, that trade-off isn't a sharply divided as most people get the impression it is, and often depends less on the languages and more on the implementations. I've mentioned the Stalin∇ before, and while the language it is for (Vlad) isn't quite the same as any standard Scheme, it is definitely a Lisp; I see no reason why a similar super-optimizing finishing compiler couldn't be applied to any other type of language, though maybe not to any specific existing language. The key to this is in separating semantic model from implementation model in a way that let's you use different implementations so long as it results in the same the expected results.

More to the point, just because the language defaults to doing something one way, doesn't mean it can't have pragmas, modifier keywords, and specializing syntax that allow the programmer to tell (or at least request or suggest to) the compiler to use some other semantics. A simple example (if no long out of date) is the register keyword in C - yeah, it is pretty much a no-op today, but that's more to due with improvements in register painting algorithms since C was designed than a flaw in the idea of hints. The same holds as a runtime operation in the System.gc() method in Java (or the similar System.GC() in .Net languages) - the garbage collector is free to ignore the request, if it concludes that there's no need to collect garbage right now, but in most instances, it will take the hint and collect it.

An analogy - not a strong one, but still - could be made between this and using an instance of an abstract class. The parent class defines the expectation - that the variable accepts a message or method invocation with a given signature, in accordance with LSP, and that the method at least generally follow the intended semantics (e.g., a 'move' method causes the object to move, somehow) - but leaves the actual implementation to the child class. The child class's designer is still free to implement it as they choose, and more t my point here, can use overloading to have a method whose semantics are related, but signature is more specialized than that defined by the parent class, indicating that it should use a specific form of the idea rather than the generic one.

Here, the same relation holds between the language and the implementation - the language doesn't, and perhaps shouldn't, define how the compiler implements something, but probably should have modifiers, pragmas, etc. that say, "do it this way, please" with varying degrees of forcefulness.
Wajideus wrote:
Schol-R-LEA wrote:Maybe this is a Lisp-ish view, as non-toy Lisp compiler implementations are full of this sort of thing, and some are even used by (non-toy) interpreters. The highly dynamic approach inherent to Lisp makes them necessary for anything like acceptable performance. Even things as basic to the language as lists are often not what they seem to be - nearly ever serious Lisp interpreter since the days if the MIT AI Lab (circa 1965) has used 'CDR-coding' to snap the pointers for lists when the are created, turning the linked lists into tagged arrays of elements (which could be addresses to a datum or another list, or any tagged datum that would fit into the same space as an address, or in some cases more exotic things such as tagged indexes into blocks of homogeneous data elements which could be treated as a unit by the garbage collector) as a way of reducing both space and time overhead.
It sounds more or less like it's just an optimization for linked lists that happens to work because the lists in Lisp are value types (immutable).
While it is definitely an optimization, I should point out that lists aren't immutable in most Lisp family languages. The forms set-car! and set-cdr! (or replaca and replacd in Common Lisp and many older dialects) allow you to change the car (the first element of the list) and cdr (the remaining sub-list following the car), respectively. The actual semantics vary from language to language, though.
Wajideus wrote:
Schol-R-LEA wrote:In my view, it would make sense to have virtual (or at least potentially virtual) be the default, and have some modifier indicate when something has to be resolved early. Similarly, a variable in a 'parent class', 'interface', 'template', or 'typeclass' could be resolved as the actual type/class if the compiler can determine early on that it can only be an object of a certain sub-class, even if the programmer can't be certain.
Yeah, as I mentioned before, it's certainly debatable. Being virtual never causes problems, but being non-virtual makes a system more rigid and hard to change. That being said, I also think it's important to base the design of the language on what we do empirically rather than a one-size-fits-all approach. It would get old rather quickly if I had to constantly sprinkle keywords all over the place to improve performance because the compiler is making assumptions that are rarely the case.
OK, that is a fair point. The matter of defaulting things in a meaningful way is pretty much undecidable, because 'reasonable' is going to be different for different circumstances. Mind you, this is one of the reasons I am thinking in terms of have separate out-of-band modifiers, and I've been giving thought to ways to abstract them out as a matter of code as well - a kind of quasi-modal system where the default can be reset per module or even per-library, without it spilling over into other modules (well, not too much, at least - leaky abstractions and all that).

It's sort of like something I say sometimes about Linux versus Windows versus MacOS - they all suck, but with Linux, the existence of different distros, alternate file systems and system initialization models, redundant and competing GUIs and applications, etc. give you some measure of ability to choose the way your particular system sucks to let you keep the suckiness to a manageable level for yourself. Of course, most people would find any amount of Linux to be too much suck, and unless you are a programmer yourself, the consistency of Windows or MacOS is likely to count more than the flexibility of Linux (which is why I tend to see them as complementary rather than rivals - though they all still suck).

Rev. First Speaker Schol-R-LEA;2 LCF ELF JAM POEE KoR KCO PPWMTF
Ordo OS Project
Lisp programmers tend to seem very odd to outsiders, just like anyone else who has had a religious experience they can't quite explain to others.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

Took a break from language design for a while, but I've got another update.


I've more or less figured out how I want error handling to work. It's sort of a hybrid of using error codes and exceptions. Below is an example of a new type of assert/else statement used to generate an error (ignore the `$` here, it's just a dummy syntax for a parametric type):

Code: Select all

func getValue(values: [$t], index: uint) {
    assert index < countOf(values) else OutOfBounds;
    return values[index];
}
The `else` part of the statement is optional and specifies a type of error. This error gets added to the metadata of the function under the `failsIf` attribute. Much like how Java functions have an explicit `throws` attribute. To handle an error, there's a try-case statement:

Code: Select all

var a = [int] {1, 2, 3}
try a.getValue(0)
{
    case OutOfBounds:
        print "fail";
}
Like Java, any unhandled errors are propagated up the call stack.

The first and most obvious difference between this new syntax and exceptions is that error handling is done on function calls, not blocks.

The second difference is that errors are just numeric values. These values are indices into the `failsIf` error list in the function's metadata. For example, if a function `failsIf OutOfBounds, OutOfMemory`, then the `OutOutBounds` error returned by that function will have a value of 0 and the `OutOfMemory` error will have a value of 1. This means that the OutOfBounds returned by one function is not numerically the same as the OutOfBounds returned by another function.

The third difference between this mechanism and exceptions and that error codes are returned to the caller via a hidden function parameter instead of jumping.


The only caveat with this error handling system is that when the `failsIf` attribute of a function changes, all functions that call that function need to be updated. Granted, it's uncommon for functions to add new error codes out of the blue, and the callers would need to be updated anyhow to properly propagate the new errors upstream.


----

EDIT:

Just a little something I wanted to add which isn't particularly important. I'm pondering the idea of splitting the `enum` type into two separate types for states and flags and splitting the `uint` type into two separate types for cardinal and ordinal numbers. Intrinsically they're the same, but semantically, they're different.

----

EDIT 2:

I'm admittedly discontent with the declaration syntax because it's a bit more verbose than I want it to be. I'd really like it to be more C like because it's closer to my comfort zone, but that creates the following problem:

Code: Select all

print x, y;     // function call
int i, j;          // declaration
The only way to differentiate between the two is to either require parentheses around function parameters all the time (which I think is stupid) or to alter the declaration syntax by adding a keyword and/or punctuation.

Another annoyance I have is that functions and structures are declared with typenames on the left, whereas everything else is declared with typenames on the right:

Code: Select all

func myfn() {
     var x: int;
}
Technically speaking, this is shorthand for `const myfn: func() {...}`, but it still annoys me.

There's another solution though. If I replace block-scope with function scope, identifiers can be implicitly declared. This requires though that identifiers inside of functions are dynamically typed. It's basically the same thing that Python does.

Code: Select all

func val(array<$t> a, uint i) -> t {
    if i < a.count {
        value = a[i];
    }
    return value;    // error, value is either `t` or undefined
}
This would actually be kind of interesting I think. A hybrid static/dynamic approach.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

So, it's been quite a while, but I have some interesting new stuff. First, a snippet:

Code: Select all

Allocator Obj
{
    allocate Func*(count Int = 1, instance %T!) Bool
    {
        object Obj* = \type_of(instance);
        if (object.load_address == NULL)
            load_object(object);
        object.instance_count++;
        instance = NULL;
        return TRUE;
    }

    free Func*(instance %T!)
    {
        object Obj* = \type_of(instance);
        object.instance_count--;
        if (object.instance_count == 0)
            unload_object(object);
    }
}

allocator Allocator = new Allocator();
This has a lot of interesting stuff going on, so I'm going to break it down.

The first thing is that unlike in C, objects (structs) are reference types. This behavior can be overridden using the ^ punctuator, which is semantically equivalent to doing a shallow copy.

Code: Select all

v1 Vector3;
v2 Vector3 = v1;
v2.x = 5;         // this changes v1.x to 5
v2 = v1^;
v2.x = 7;         // this only affects v2.x
The second thing is that the ! punctuator is used to specify a parameter that's initialized by the function. This means that function itself treats the parameter as though it hasn't been initialized yet (preventing a read until after the first assignment) and that the parameter must be initialized before the function returns. Because the language mandates that you cannot pass a reference as a parameter unless that reference has been initialized, the language understands that this is valid:

Code: Select all

x Int*;
allocator.allocate(x);
defer allocator.free(x);
and that this is invalid:

Code: Select all

x Int*;
allocator.free(x);
allocator.allocate(x);
The third interesting bit is the final syntax for functions, which makes function pointers waaayy easier to read than in C/C++ and allows them to be initialized.

The final interesting bit is the use of the % and \ punctuators. Just like printf substitution, a % is used to specify a parameter and \ does immediate evaluation. This means that `\type_of` is evaluated at compile time. In this particular case, `type_of` is a built-in function, but this can be done with any function whatsoever.

The allocator code snippet aside, the language does support pointer arithmetic, however brackets are not used.

Code: Select all

x Int;
y Int* = x*;
y* + 1 = 6;
y*++;
if (y != 6)
    print("it should be!");
Part of the reasoning behind not using brackets is that the semantics for pointers is different than the semantics for arrays. Pointers are not bounds checked and use offsets which begin at 0. Arrays are bounds checked and use indices which begin at 1. Arrays can also use negative indices to indicate values at the end of the array.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

Important
About 2 weeks ago (as of 01/31/2018), I started having a huge number of breakthroughs that stemmed from a complete shift in my ideology and how I approach language design. This has completely changed the language 100% for the better, and everything mentioned before this post is thus invalidated.

Design Guidelines (Summary)
  • Don't make coders repeat themselves.
  • Don't try to prevent mistakes.
  • "Correct" means "deterministic".
Design Guidelines (Details)
  • I assert that the primary factor for deciding on the syntax of a language and adding new features is that coders shouldn't have to repeat themselves. Repetition is noise, which makes code difficult to read, write, and change.
  • I assert that trying to prevent coders from making mistakes is a bad thing. Preventative measures are always overly obtuse, inefficient, and difficult to comprehend and execute in practice. The reduction of errors should come naturally as a side-effect of the simplicity of the language and it's ability to express the algorithm which is being written.
  • I assert that mathematical correctness is not true correctness in the field of programming; rather, true correctness is determinism based on well-defined behavior. In the context of unsigned 8-bit integers, "2 + 255 = 1" is a correct expression because it's defined to be so.

Modules

Adhering to the first design guideline (the DRY principal), I wanted to allow the ability to import multiple modules at the same time, and I also didn't want the symbols from libraries to automatically pollute the current scope. So I decided that I would borrow the use of `.*` from Java to explicitly indicate this.

Code: Select all

import {
    io.*;
    math.*;
};
I also decided that I wanted a way of bundling modules together, which didn't depend on namespace nesting like in C++

Code: Select all

import std {
    io.*;
    math.*;
};

Cleaning Up Resources

A very common issue, and one of the remaining 3 justifications for the existence of goto, is cleaning up resources when you have multiple points of failure. For that, I decided to borrow the `defer` statement from Go, and a defer would push the statement onto a stack so that they're executed in the opposite order of which they're deferred. So in the below code:

Code: Select all

FILE *f1, *f2;
f1 = fopen("file1", "rb");
if (!f1)
    return;
defer fclose(f1);
f2 = fopen("file2", "rb");
if (!f2)
    return;
defer fclose(f2);
return;
the final return statement will execute `fclose(f2)` before `fclose(f1)`.


Unconditional Loops

A very common idiom is the use of an unconditional loop. Not just the `while(true)` or `for(;;)`, but C has a `do/while` statement that basically creates an unconditional loop with an `if (condition) break;` statement at the end of it. I intended to fix this, and eliminate the second justification for goto by adding a `recur` statement:

Code: Select all

recur {
    getEvents();
    if (quit);
        break;
    update();
    render();
}

Labelled Loop

To kill the 3rd and final justification for goto, which is to break/continue in nested loops, I decided to keep labels for loops and switch statements.


Bounded Arrays

I wanted to have bounded arrays, but I just couldn't decide on what the maximum size of a bounded array should be. Turns out, it's very dependent on what you're doing. So then I thought, why not do this:

Code: Select all

uint len;
int val[&len];
No allocation is being performed, we're just saying that `val` is a pointer which is bounded by `len`; and unlike the fixed-length arrays in C, `val`can be redirected.


Use All The Things

Two particular things I wanted to deal with were the ability to use the symbols of a module in a specific scope, and allow for compositional polymorphism of structures by flattening the namespaces. For this, I added a `use` keyword:

Code: Select all

struct vec2 { float x, y; };

struct vec3 {
    use vec2;
    float z;
};

vec3 v;
v.x = 5;
As I thought about this idea more, I began to wonder, what if you could `use` an anonymous procedure to overload the [] and () operators of the struct?

Code: Select all

struct Table {
    int val;
    use proc(Table *this, char *key) int * {
        return &this.val;
    }
}

Table table;
table["test"] = 6;


struct Reader {
    FILE *f;
    use proc(Reader *this, int len, void *buf) int {
        return fread(buf, 1, len, this.f);
    }
}

Reader read;
int i;
read.f = fopen("test.txt", "rb");
defer fclose(read.f);
read(sizeof(i), &i);
This is an extremely powerful abstraction that afaik, has never been used before. But we're only getting started!


Like Closures...But Waaaaayyy Better

Before I came up with the idea for using procs in structs, I was thinking hard about how you could bind data to a procedure, and came up with this idea:

Code: Select all

proc Read(Reader *reader, int len, void *data) int;

Reader reader;
int i;
var read = Read(reader, ..);
read(sizeof(i), &i);
What's happening under the hood is this:

Code: Select all

var read = proc(int len, void *data) int { return read(reader, len, data); }
In other words, the `..` causes all the previous arguments to be captured, and a new anonymous function is created which closes over the function being called and the captured values. This also means that you can do this:

Code: Select all

Read(reader, ..)(sizeof(int), ..)(&i);
Turns out, this can be used to create iterators, coroutines, generators, and trampolines. I eventually found out there was a technical name for it called currying. I'm currently exploring the idea of using the `=>` operator together with currying as a way of creating pipelines.

Despite the fact that we can easily create trampolines, I do want to mention that tail-call recursion will still be a feature.


Stateful Machines

Another one of the problems on my todo list was to fix the broken switch statements in C. First (and least important) is the name. `switch` and `case` are very common words. I decided to replace them with `when`, `is`, and `or`. The `or` keyword doesn't mean `logical or` btw. It's a fallthrough case; because the `when` statement breaks by default.

But I wasn't necessarily done with switch statements yet. I wanted some way to be able to build state machines out of `when` statements. After much pondering, the idea hit me of having the value you're checking being like a tape. You read a value and either reach a breaking state or you `shift` to the next state:

Code: Select all

when ("hello") {
    is 'h':
    or 'l':
        print("consonant");
        shift;
    is 'e':
    or 'o':
        print("vowel");
        shift;
}
With the newly changed semantics, I decided to replace the `continue` keyword with `shift`.


EDIT:

I forgot to mention, procedures are const by default, which was the original reason for having a `var` keyword. To create a function pointer, you'd use `var proc`. Both `const` and `var` can be used as either a modifier or a way to declare an identifier with type inference.

Code: Select all

var proc glBindBuffer(uint target, uint buf);

...
glBindBuffer = dlsym(dl, "symbol");


const pi = 3.1415926535;
var text = "stuff";

EDIT #2:

A proposal for the order of operations:

Code: Select all

    • Unary Suffix Operators  () [] . ++ --
    • Unary Prefix Operators  & * ! ~ ++ -- + -
    • Multiplication          *
    • Division & Remainder    / %
    • Addition & Subtraction  + -
    • Bitmasking Operators    & | ^
    • Bitshifting Operators   << >>
    • Relational Operators    == < > != <= >=
    • Logical Operators       && || ^^
    • Ternary Conditions      ?:
    • Assignment Operators    = += -= *= /= %= <<= >>= &= |= ^=
There are a lot of differences between this and the C order operations. For example:

Code: Select all

1 ^ 4 << 1 == 10         // in C, the right hand side would be `9`

x & y == 7              // in C, this would be evaluated `x & (y == 7)`

4 / 2 * 2 == 1          // in C, this would be evaluated `(4 / 2) * 2 = 4`

false || true && false  // in C, this would be evaluated `false || (true && false)`
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

I mentioned this slightly before, but now I'm going to go into more detail.


Because the language has bounded arrays, and because we can bake parameters into procedures, we can do something like this:

Code: Select all

proc Order($T lval, T rval) int;

proc qsort(Order order, int len, $T val[&len]) int, T[&0];


proc run() {
    int lpeople_len, rpeople_len;
    char *lpeople[&lpeople_len] = {
        "Bob", "Jill", "Tom", "Sue", "Joe"
    }
    char *rpeople[&rpeople_len];
    lpeople => qsort(by_name, ..) => rpeople;
}
There are a couple of problems with the syntax, which is why I said I'm still working on it. That aside, the `qsort` procedure has the `by_name` Order baked into it, which changes the type signature into `proc(int len, $T val[&len]) int, T[&0]`; which means that it takes a bounded array and returns a bounded array.

We can use `=>` to compose procedures like this into a pipeline that does mapping, filtering, and reducing operations; and then dumps the result into an identifier.


EDIT:

Also, something I forgot to mention about struct operator overloading yesterday:

Code: Select all

my_struct[x] = 5;
is equivalent to:

Code: Select all

*my_struct(x) = 5;
which is why the anonymous `proc` returned a pointer to an `int`.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

Something fairly neat that I just realized you can do with my language's semantics:

Code: Select all

struct File {
    FILE *fp;
    Reader read;
    Writer write;
};

struct Reader {
    FILE *fp;
    use proc(use Reader reader, int len, void *data) int {
        return fread(data, 1, len, fp);
    }
}

struct Writer {
    FILE *fp;
    use proc(use Writer writer, int len, void *data) int {
        return fwrite(data, 1, len, fp);
    }
}

proc construct(use File file, str name, str mode) {
    fopen(name, mode) => fp => read.fp => write.fp;
}

int i;
File file("test.txt", "rb");
file.read(&i, sizeof(int));
It's like having methods, except the methods are more like traits in that they have their own state and can behave like arrays if their usage proc returns a pointer.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

Another new feature for overloading functions based on type signature:

Code: Select all

struct Entity {
    use Vector3 position;
};

struct Crate {
    use Entity entity;
};

interf Updateable {
    proc update();
};

impl Updateable (Crate crate) {
    update = update_crate(crate, ..);
};

proc update(Updateable Entity entities[n], int n) {
    for (entities) {
        it.update();
    }
}

proc update_crate(Crate crate) {
}

proc run() {
    Crate crate;
    update(&crate, 1);
}
What's happening here is that an `interf` declares a vtable type, and an `impl` defines a const vtable associated with a type. Unlike C++, the data structure itself doesn't contain a pointer to the vtable. The pointer only gets passed when we use the interface as a modifier; as there is a cast from `Crate` -> `Updateable Crate`. The implementation of an updateable crate basically just bakes the crate instance into the first parameter of the `update_crate` function.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

A huge semantic shift I want to talk about here. Unlike C/C++, this language is actually prototypal in nature. Any const non-primitive value (this includes enums, unifs, structs, and procs) can be used as a prototype to declare a new identifier. so when you type:

Code: Select all

struct Person {
    var name = "bob";
}
This is more than a constant, it's a prototype which lets you do:

Code: Select all

Person person;
person.name = "tom";
The existence of the `var` keyword in the language is not only for type inference, but also to override the fact that anything declared with `enum`, `unif`, `struct`, and `proc` is const by default.


Now that I've touched that base, I'm going to cover the next bit: first-class types and generics (parametric polymorphism).

Code: Select all

struct vec3(proto t) {
    t x, y, z;
}

vec3(float) position;
The `proto` keyword is a prototype that represents a prototype. It can also be used to do overloading:

Code: Select all

proc add(proto t, t x, t y) t {
    return x + y;
}

const addf = add(float, ..);
This would work like C++ templates, where only the necessary code paths for `add` are generated, and the execution of `add` is inlined. Aside from generics, I'm also thinking about using it to provide type introspection, but haven't quite fleshed out the idea yet. Still, I think this is an immense syntactic improvement over C++.



Edit:

So, now that there's support for default parameter values, first-class types, and interfaces; I'm really leaning away from supporting function overloading. I just can't think of a good argument for having it anymore, and there's a huge bonus for not having it, which is that I can avoid name mangling and make this language as a perfect drop-in replacement for C that anything can interop with.


Edit #2:

I just noticed something really awesome about prototypes that you can do:

Code: Select all

interf updateable {
    proc update();
}

impl updateable (crate c) {
    update = update_crate(&c, ..);
}

proc update_crate(crate *c) {
}

proc doit(updateable proto t, t things[n], int n = 1) {
    for (things) {
        it.update();
    }
}

proc run() {
    crate mycrate;
    doit(crate, &mycrate);
}
Here, we're using interfaces to constrain the prototype `t` of `doit` to things that are `updateable`.
Wajideus
Member
Member
Posts: 47
Joined: Thu Nov 16, 2017 3:01 pm
Libera.chat IRC: wajideus

Re: Opinions On A New Programming Language

Post by Wajideus »

Touching up on pipelines a bit more:

Code: Select all

proc Get(char c[n], int n) int;

interf getable {
    Get get;
};

impl getable (char *s) {
    get = proc(char **sp = &s, char c[n], int n) int {
        char *s = *sp;
        int total = 0;
        for (c) {
            if (!*s)
                return total;
            total++;
            it = *s++;
        }
    }
}

proc makewords(Get get) Get;
proc lowercase(Get get) Get;
proc sort(Get get) Get;
proc unique(Get get) Get;
proc print(Get get);

proc run() {
    getable char *s = "an interesting example of a pipeline";
    makewords(s.get) => lowercase => sort => unique => print;
}
This one is a bit crazy at first to wrap your mind around, but basically, we defined an interface so that we could bind a vtable to a string. Then, we used the `get` method of the vtable as a parameter to `makewords` to start the pipeline. After that, each `=>` operator passes the return value of the previous expression as an argument to the procedure on the right. By the time the `print` function is reached, we have a stack of getters, where each one closes over the previous one.

So `print` calls the getter returned by `unique`, which calls the getter returned by `sort`, etc. And this continues until `s.get` returns 0 by reaching the end of the string.

One thing to note is that this system doesn't take buffering into account. It's also really complex to implement, which is a code smell imo. I may consider supporting channels as first-class types instead.


Also, I never mentioned this, but unlike pretty much all other languages in existence, this language treats parameters and struct members as sets, represented by a directed acyclic graph. This means that they can cross reference each other to express meaningful relationships, as long as there is no recursion. eg. you can't do:

Code: Select all

proc add(proto t = typeof(x), t x, t y) t;
because `t` depends on the `typeof(x)`, which depends on `t`.
alexfru
Member
Member
Posts: 1112
Joined: Tue Mar 04, 2014 5:27 am

Re: Opinions On A New Programming Language

Post by alexfru »

Sorry, didn't read the whole thing. But I wonder how friendly is your language w.r.t. reading code written in it. 'cause way too often in order to write new code (or change existing code) we first need to read lots of code written by someone else. From my experience dealing with large C++ projects I can tell that C++ is not at all friendly, unless the reader's brain has a special C++ compiler lobe. Like, when I see foo or foo() I don't know what's behind it, where to even look for it. Is it a macro? Or a global? Or a class member? Something overloaded? And then there's something truly ugly, e.g. the "one definition rule", meaning if you accidentally define something that happens to be already defined elsewhere in the thousands of lines of unfamiliar code, the compiler will not even try to warn you and you'll end up with broken code, which you may not notice for a while. It looks like the modern C++ is an exercise in expressive text compression, good until the time you need to decompress code written in it in order to understand it. IDEs don't cut it because they don't really "understand" C++ code or don't have the full build context (which files are in the project, what macros are predefined, etc). Where does your language stand w.r.t. this problem? Is it for machines or for people?
User avatar
Solar
Member
Member
Posts: 7615
Joined: Thu Nov 16, 2006 12:01 pm
Location: Germany
Contact:

Re: Opinions On A New Programming Language

Post by Solar »

You can write bad code in any language. I feel the C++ bashing is misplaced and OT here.
Every good solution is obvious once you've found it.
Post Reply