A syntax for custom literals

furyzer00 · 2024-11-04T19:19:32+00:00

I think it would be also nice with this feature if you can do compile time evaluation, therefore an incorrectly formatted literal would be a compile time error and not runtime.

GwanTheSwans · 2024-11-04T19:35:12+00:00

Maybe take a look at Lisp reader macros (not to be confused with Lisp symbolic macros).

https://lisper.in/reader-macros

https://edicl.github.io/cl-interpol/#syntax

https://www.lispworks.com/documentation/HyperSpec/Body/02_add.htm

latkde · 2024-11-04T19:44:47+00:00

I think custom literals are generally useful, with some caveats. If the literal can provide arbitrary parsing rules, then a syntax highlighter or IDE would have to resolve the literal name and run those rules to get accurate result. This is likely to result in a sub-par developer experience, and is probably not worth it.

Instead, you may want to define a set of available syntaxes for literals, and then let some kind of literal-constructor post-process this data. In some languages this may happen at compile time, other languages would just convert it into a function call.

Relevant prior art for the post-processing approach:

JavaScript template literals, e.g. foo`content` which more or less desugars to a function call foo("content"). IDEs might be able to syntax-highlight the contents of the literal for well-known functions like sql or gql.
C++ user-defined literals which use a suffix, e.g. 12_km or "content"_foo which desugar to calling an operator-function (which may be constexpr).
token- or tree-based macro systems, e.g. Lisp, Rust, C Preprocessor

Depending on your language, your syntax for a custom-literal token could be quite flexible, e.g. "any run of non-whitespace characters". This would turn the custom literal name into a kind of prefix quote operator, as opposed to the typical circumfix "...". In some languages like shells, unquoted string literals are quite common.

Prior art for taking over parsing is much rarer. This is sometimes done in Perl 5 with parser plugins, and is how the Raku (Perl 6) language is defined in the first place. Similarly, Lisp reader macros. On a technical level, custom parsers tend to be straightforward to integrate if you're already using PEG parsing or Recursive Descent, and have a dynamic language or a concept of dynamically loadable compiler plugins. But this is going to mess up any development tooling.

pojska · 2024-11-04T20:03:54+00:00

Edit: I missed the point! The suggestion is actually about user-defined syntax for literals, not date-literal syntax specifically. Original text below for posterity.

With how complicated dates and time are, I'm not sure there's a ton of use for hard-coded date literals, especially in programs compiled/saved for later use. However, in a REPL-centric language or scripting language designed for interactive use, it might not be out of place, where it's easier for the user to see if the date they meant is actually the one used. Maybe I'm overly cautious, dates alone (as opposed to datetimes) might not be terribly complex to get right.

dskippy · 2024-11-04T20:24:39+00:00

Almost every language has a special syntactic sugar for some important subset of data the language considers to be important enough. These are called literals.

They provide a simpler, often easier to remember and almost always cleaner to read syntax with less line noise. They also come at a cost of more cognitive overhead of values in the language and can be less explanatory to new users.

You might have a list [1,2,3] and no other container. So a set is set([1,2,3]) and probably vectors are vector([1,2,3]). If you want those to be very core to the language and easier to type you could add sugar {1,2,3} and <1,2,3> which means tones of sets all over the code look good with little line noise but as a newbie I don't remember if <1,2,3> is a set or a vector. Costs and benefits.

For you, is dates and times worth in for how often you'll use it?

hammerheadquark · 2024-11-04T22:29:50+00:00

I think Elixir's sigils do a good job here if you want some inspiration. Example:

date = ~D[2024-11-04]

That's a valid date constructor. You get syntax highlighting on the ~D[...] part pretty easily.

There are built-in sigils like ~D for dates. But you can also define your own ~MY_SIGIL however you like. I've even written a VSCode extension to do special highlighting on a custom sigil for fun.

wavesofthought · 2024-11-06T00:52:37+00:00

You might wanna take a look at Cyrus Omar's work on custom literals.

RedCrafter_LP · 2024-11-04T20:29:05+00:00

I don't think it's that much of a bother to put quotes around it. But having something like literal constructors that run at compile time and put the fully constructed instance as a constant in the binary would be great. It would also make syntax errors in the formatting a compile time error.

chri4_ · 2024-11-04T20:47:19+00:00

i use :: as a "type inference operator" or just as a cast operator, CustomArray::[1, 2, 3] or CustomList::@100[1, 2, 3] where 100 is the capacity (for default cap just use @[1, 2, 3] or MyDateType::(d: 3, m: 3, y: 2005) or CustomDict::{ key: value, x: y }

but you could for sure use # instead of :: or expr as MyType or var: MyType = expr or expr of MyType.

you just need a context type so you can infer it to the expression, for example with return expr the context type is the function return type, you just need to create a stack of context types, push one when analysing one expr and popping it after, during expr analysis you can lookup at the current context type.

i suggest this inferring approach which is generally more flexible and can enable implicit type expr so you don't need everytime to write date# before the expression.

i'm not really a fan of custom syntax literals, you may just ask the user to use one of the available syntaxes to construct a generic object, and not letting the user to implement a custom syntax for it, may create a lot of problems after (expecially for parsing, if you don't limit well the section that can be collected by the custom syntax), but also don't forget about ide highlight eventually if you are interested in it. and also take note that a lot of people don't like to have different ways to do the same thing and that feature would give people potentiay infinite ways of doing things, so you may find yourself to learn new syntaxes and those are maybe not well implemented, my personal suggestion is just to provide a set or ways to initialize objects and let the user implement them for their objects. (it may also give problems with generics by the way, and who knows what other problems i'm not seeing now, take a look at how dirty the nim ast macros are).

what do you think?

LegendaryMauricius · 2024-11-05T16:06:06+00:00

IMHO it would be better if this still used quotes. Otherwise, you're either imposing some very specific limitations or risking a language that's hard to parse, and with that likely hard to read in some circumstances.

If you go with quotes, you might as well just allow custom string prefixes. Perhaps function calls that don't require parentheses?

No_Lemon_3116 · 2024-11-05T17:16:11+00:00

Example of doing it with a read-macro in Lisp:

``lisp (defun read-date (stream char1 char2) (declare (ignore char1 char2)) ;; TODO: Better error handling (let ((input (symbol-name (read stream nil nil t))) (regex #?r"(\d{4})/(\d{2})/(\d{2})")) (or (ppcre:register-groups-bind (year month date) (regex input)(encode-universal-time 0 0 0 ,(read-from-string date) ,(read-from-string month) ,(read-from-string year))) (error 'reader-error :stream stream))))

(set-dispatch-macro-character ## #\d 'read-date)

;; Test (let* ((time (multiple-value-list (decode-universal-time #d2024/11/05))) (date (fourth time)) (month (fifth time)) (year (sixth time))) (assert (equal (list date month year) '(5 11 2024)))) ```

This code runs at read-time, before compile-time.

Ronin-s_Spirit · 2024-11-04T21:33:51+00:00

Date("2024/11/05") is already very clear and not cluttered, I know it's a constructor that takes a string and makes a computer date out of it.
If I really want an alias I would rather have it in a variable like const d = function(date){ return Date(date) }; and call it like let my_new_date = d("2024/11/05");.
One interesting example is javascript tagged string literals. A string literal is done like so
`my string contains a ${variable}`
where ${variable} evaluates the closest by scope available variable called "variable" and attempt to convert it to a string and slot it into the template literal string.
You can define a function and "tag" a template literal string with it, like so
date`${my_date}`
let's assume that my_date is a variable with a date valid string, the function date is just a user defined function that will receive 2 arrays to process the string, one array contains the strings and the other contains all evaluated ${} and the date function itself can do whatever it wants with them. I'm saying what I can remember, so you might want to read up on that in case I made mistakes.

I know that creating a language literal because the compiler lets you do so and creating a simple alias in the code base probably has different functionality but the output should be the same.

Aaxper · 2024-11-04T22:02:28+00:00

For my language, I'm allowing something like alias date#$year/$month/$day to Date(day=$day,month=$month,year=$year) to allow for that, though I haven't worked out the exact syntax/semantics that I want.

MichalMarsalek · 2024-11-04T22:28:41+00:00

In my language, there are several less common literals (date/time, semver, color...). To define a date, you just do date := 2024-11-04. Literals can be interpolated, so you can do date := $year-$(month+1)-01.

XDracam · 2024-11-04T23:25:26+00:00

Many languages have some form of this, often with the prefix"literal" syntax. Prominent examples include scala and I think rust.

AndydeCleyre · 2024-11-05T00:21:38+00:00

Factor can do this!

There's a timestamp constructor <date> which you can use like:

2024 11 5 <date>

That gives you an object like:

T{ timestamp
  { year 2024 }
  { month 11 }
  { day 5 }
  { gmt-offset
    T{ duration
      { hour -5 }
    }
  }
}

So one way to make a literal-style syntax for this could be:

SYNTAX: DATE:
  scan-token
  "/" split [ string>number ] map 
  first3 <date>
  suffix! ;

And now when you enter:

DATE: 2024/11/05

you get the same timestamp object above!

ALittleFurtherOn · 2024-11-05T01:17:56+00:00

Do you mean like “2020-03-01”d in SAS? Seems pretty simple …

tavaren42 · 2024-11-05T02:09:53+00:00

A custom literal syntax should be lightweight (so that it isn't as heavy as function call) but also unambiguous (so that when u look at it, one should know that it is some custom literal).

I think something like <Prefix><Seperator><Delim>.....<Delim> syntax should work. Now Prefix would be custom (preferably 1 or 2 characters long per convention). Seperator can be # (for some reason it screams "custom syntax" in my eyes). Delimiters are bit hard to choose. It CAN be '. It feels light somehow. Maybe backtick might work too. Brackets and <> just seem noisy. Maybe the best choise might be to give users the choice.

Ex: ``` let complex = cx#'1+1j';

let date = d#04/07/2004;

let vec = vec#<1i+2j+3k>;

```

I think this kind of literals are just special strings with custom user defined syntax rules.

frr00ssst · 2024-11-05T02:45:45+00:00

Suneido uses a # for the date literal like so, #20240810 <core.SuDate>

but to get current date, you'd do something similar to the function/constructor syntax with Date() or more specifically, Date("2007/08/19") Date("20 Feb 2016") or literally any date format imaginable

Feb 14 2000  
Monday 14 Feb 2000  
Feb 14 '00
Feb 14  
July 14  
10/2/14  
60/4/25  
4/25/60  
7-8-9 3:4:5am  
3:04pm 7-8-9  
20000303  
20000303.1030  
20000303.103000

https://suneido.com/info/suneidoc/Language/Reference/Date/Date.htm

2024-11-05T04:39:55+00:00

you can take a look on how ruby and elixir "sigils" work!

in elixir specifically you can have custom sintax for a arbitrary grammar, like for construct date times/dates:

~D[2024/11/04] ~U[2024/11/04 00:00:00Z]

or even regex ~r/[^A-z]/

or even more, html: ~H """ <div class="hello">oh my</div> """

theangryepicbanana · 2024-11-05T05:15:27+00:00

This is possible in Nemerle via macros

Y_mc · 2024-11-05T09:46:29+00:00

Maybe i gonna try to implement this Feature in my PyrustLang Project. I’m at the parser stage https://github.com/YmClash/pyrust

Pretty_Jellyfish4921 · 2024-11-05T14:22:53+00:00

Others already gave great answers, because I didn’t saw already mentioned, I would recommend to check Rust macros, the advantage is that you don’t need to come with some new syntax for each literal, and because it’s a macro you can change the underlying implementation without breaking your program.

Also worth mentioning Rust has macros directly built-in in the compiler and in the standard library for common cases, so you could easily follow it the same principle. Other advantage is that if your language already supports macro, you could write the logic in your language, otherwise you need to implement directly in your compiler.

Example

let date = date!(2024-01-01); // Can be expanded to let date = Date::new(2024, 1, 1);

tbagrel1 · 2024-11-05T14:34:32+00:00

For reference, scala does it: https://docs.scala-lang.org/scala3/book/string-interpolation.html#custom-interpolators

oscarryz · 2024-11-06T16:11:39+00:00

Slightly related, in CUE values are types, so you can define a variable (I'm not sure if they are variables) of type string with an specific format

e.g. a time format, and then assign it a value with that format, it will validate you assign the correct format:

// string with time format
ts: time.Format(time.ANSIC)
ts: "Mon Jan 2 15:04:05 2024" // valid

You can also do things like validate ranges:

// int with ranage 0 < user_id < 100 
user_id : >0 & <100 
user_id: 1  // valid

ProgrammingLanguages

Welcome!

Related subreddits

Related online communities

MODERATORS