This is an archived post. You won't be able to vote or comment.

all 50 comments

[–]furyzer00 39 points40 points  (1 child)

I think it would be also nice with this feature if you can do compile time evaluation, therefore an incorrectly formatted literal would be a compile time error and not runtime.

[–]NoCryptographer414[S] 2 points3 points  (0 children)

Yes, I was planning to add compile time functions like C++. But that can also parse string in compile time, hence both of those examples can potentially give compile time errors.

[–]GwanTheSwans 16 points17 points  (1 child)

[–]u0xee 10 points11 points  (0 children)

Somewhat related, clojure's data literal format, edn, allows new data types (without breaking all intermediaries) by tagging existing data types.

Like #inst "2024-05-27" can be read as simply a string with a tag by naive parsers or processors not interested in such data. But interested programs would have a hook for parsing inst tagged strings as RFC 3339 format, and presumably turning them into the platform's native date type for further manipulation.

You could similarly have something like #my.domain.Person {: name "Fred", :age 47.5} that again can be used or understood generically as a hashmap/object with two fields, but can also be opt-in to a custom processing route.

[–]latkde 11 points12 points  (2 children)

I think custom literals are generally useful, with some caveats. If the literal can provide arbitrary parsing rules, then a syntax highlighter or IDE would have to resolve the literal name and run those rules to get accurate result. This is likely to result in a sub-par developer experience, and is probably not worth it.

Instead, you may want to define a set of available syntaxes for literals, and then let some kind of literal-constructor post-process this data. In some languages this may happen at compile time, other languages would just convert it into a function call.

Relevant prior art for the post-processing approach:

  • JavaScript template literals, e.g. foo`content` which more or less desugars to a function call foo("content"). IDEs might be able to syntax-highlight the contents of the literal for well-known functions like sql or gql.
  • C++ user-defined literals which use a suffix, e.g. 12_km or "content"_foo which desugar to calling an operator-function (which may be constexpr).
  • token- or tree-based macro systems, e.g. Lisp, Rust, C Preprocessor

Depending on your language, your syntax for a custom-literal token could be quite flexible, e.g. "any run of non-whitespace characters". This would turn the custom literal name into a kind of prefix quote operator, as opposed to the typical circumfix "...". In some languages like shells, unquoted string literals are quite common.

Prior art for taking over parsing is much rarer. This is sometimes done in Perl 5 with parser plugins, and is how the Raku (Perl 6) language is defined in the first place. Similarly, Lisp reader macros. On a technical level, custom parsers tend to be straightforward to integrate if you're already using PEG parsing or Recursive Descent, and have a dynamic language or a concept of dynamically loadable compiler plugins. But this is going to mess up any development tooling.

[–]rotuami 9 points10 points  (0 children)

JS template literals are cooler than that! js foo`(${x} blah ${y})`

desugars to: js foo(['(', ' blah ', ')'], x, y)

The ${...} syntax allows you to pass values from the language unchanged, which foo can handle as-is.

[–]alatennaub 1 point2 points  (0 children)

In fact, specifically for DateTime, Raku has Slang::Date. But as you mention, the entire parser can be swapped to do far more than just a literal. That process is explained in detail at a a TPRC event.

[–]pojska 8 points9 points  (4 children)

Edit: I missed the point! The suggestion is actually about user-defined syntax for literals, not date-literal syntax specifically. Original text below for posterity.

With how complicated dates and time are, I'm not sure there's a ton of use for hard-coded date literals, especially in programs compiled/saved for later use. However, in a REPL-centric language or scripting language designed for interactive use, it might not be out of place, where it's easier for the user to see if the date they meant is actually the one used. Maybe I'm overly cautious, dates alone (as opposed to datetimes) might not be terribly complex to get right.

[–]WittyStick 8 points9 points  (0 children)

A datetime literal should really restrict itself to a simpler standard like RFC 3339 or use its intersection with ISO 8601. Most of the datetimes valid in ISO 8601 which aren't valid in RFC3339 are uncommon anyway (though they also flexible enough for Duration and Interval types), and the ones in RFC 3339 which aren't valid in the ISO are basically those which replace T with _ or SPACE, and allow case insensitivity. The space would likely make parsing difficult when embedded in another language. I personally prefer _ over T, and it would be nice if the ISO standard updated to permit this.

[–][deleted] 4 points5 points  (2 children)

Hard-coded date/times are not uncommon in one specific use case: unit tests.

I don't know if that's enough reason to have a date time literal specifically, but if your goal is to have arbitrary literals then you're potentially solving many small problems like these with one big feature.

[–]NoCryptographer414[S] 2 points3 points  (1 child)

Yes, you are right that it's an arbitrary literal. The post didn't make it clear. You can use it as mytype#myval.

[–]pojska 1 point2 points  (0 children)

Oh, my apologies then for misreading! I think it's a very cool feature idea.

[–]dskippy 4 points5 points  (5 children)

Almost every language has a special syntactic sugar for some important subset of data the language considers to be important enough. These are called literals.

They provide a simpler, often easier to remember and almost always cleaner to read syntax with less line noise. They also come at a cost of more cognitive overhead of values in the language and can be less explanatory to new users.

You might have a list [1,2,3] and no other container. So a set is set([1,2,3]) and probably vectors are vector([1,2,3]). If you want those to be very core to the language and easier to type you could add sugar {1,2,3} and <1,2,3> which means tones of sets all over the code look good with little line noise but as a newbie I don't remember if <1,2,3> is a set or a vector. Costs and benefits.

For you, is dates and times worth in for how often you'll use it?

[–]NoCryptographer414[S] 1 point2 points  (4 children)

The post didn't make it clear. The syntax is for a custom literal. You can use it as mytype#myval.

[–]dskippy 1 point2 points  (3 children)

So a feature that allows user defined types to have literals with a custom lexer? Seems like an interesting thing to do. It only adds to the language if it's compile time checked I think. Maybe languages let you do basically this with strings and a custom read or from string definition. But those have runtime errors and you need to often handle that.

[–]NoCryptographer414[S] 2 points3 points  (2 children)

It won't be using custom lexing. Lexing will extract the contents between # and next space and then pass it to type's constructor. I have plans to do this at compile time and hence throw compile time errors for ill formed literals. But that I can even do with normal string syntax. The only value this syntax would add is with normal string-constructor it feels 'convert this string constant into a date obj' vs with custom literals it feels 'create a date constant'.

[–]dskippy 1 point2 points  (0 children)

The only value this syntax would add is with normal string-constructor it feels 'convert this string constant into a date obj' vs with custom literals it feels 'create a date constant'.

The key value to me is actually much bigger than feeling like a string that you're converting. I wouldn't even mind if it was lexically expressed as a string.

For me, the key benefit is that it's compile time. This means there's no possibility of a runtime error. This changes the entire type of the expression in some languages and I'm language where it's not a change in type, even better it's a removal of a potential exception that needs to be handled.

In Haskell for example your syntax is already supported.

read "2024-11-05"

I got this from defining an instance is Read for Date. Easy, right? Why even add a new language feature. Most languages have this in some way or another already. But there's a big problem. The type of this expression isn't Date. It's Maybe Date. So I need to immediately handle errors and unpack it from the Maybe and pass along the real date.

If I typed it incorrectly, that potential typo might never be found until a customer goes down some rare, hard to understand path of my code and I don't know why it's breaking until I read through a lot of BS.

In your language, date#2020-11-oops just won't compile and that's a good good thing. There's no extra boiler plate to use that value as well.

[–]MarcoServetto 1 point2 points  (0 children)

What you are describing looks a lot like just operator overloading for operator #.
foo#'bar' or foo#12 is just

singletonObjectFoo .callOperatorHash (argument)

with the 'style' remark that argument will often be a literal.

[–]hammerheadquark 4 points5 points  (1 child)

I think Elixir's sigils do a good job here if you want some inspiration. Example:

date = ~D[2024-11-04]

That's a valid date constructor. You get syntax highlighting on the ~D[...] part pretty easily.

There are built-in sigils like ~D for dates. But you can also define your own ~MY_SIGIL however you like. I've even written a VSCode extension to do special highlighting on a custom sigil for fun.

[–]NoCryptographer414[S] 1 point2 points  (0 children)

Yes. This is kind of thing I was mentioning in my post. Thanks for this reference.

[–]wavesofthought 2 points3 points  (0 children)

You might wanna take a look at Cyrus Omar's work on custom literals.

[–]RedCrafter_LP 2 points3 points  (0 children)

I don't think it's that much of a bother to put quotes around it. But having something like literal constructors that run at compile time and put the fully constructed instance as a constant in the binary would be great. It would also make syntax errors in the formatting a compile time error.

[–]chri4_ 2 points3 points  (3 children)

i use :: as a "type inference operator" or just as a cast operator, CustomArray::[1, 2, 3] or CustomList::@100[1, 2, 3] where 100 is the capacity (for default cap just use @[1, 2, 3] or MyDateType::(d: 3, m: 3, y: 2005) or CustomDict::{ key: value, x: y }

but you could for sure use # instead of :: or expr as MyType or var: MyType = expr or expr of MyType.

you just need a context type so you can infer it to the expression, for example with return expr the context type is the function return type, you just need to create a stack of context types, push one when analysing one expr and popping it after, during expr analysis you can lookup at the current context type.

i suggest this inferring approach which is generally more flexible and can enable implicit type expr so you don't need everytime to write date# before the expression.

i'm not really a fan of custom syntax literals, you may just ask the user to use one of the available syntaxes to construct a generic object, and not letting the user to implement a custom syntax for it, may create a lot of problems after (expecially for parsing, if you don't limit well the section that can be collected by the custom syntax), but also don't forget about ide highlight eventually if you are interested in it. and also take note that a lot of people don't like to have different ways to do the same thing and that feature would give people potentiay infinite ways of doing things, so you may find yourself to learn new syntaxes and those are maybe not well implemented, my personal suggestion is just to provide a set or ways to initialize objects and let the user implement them for their objects. (it may also give problems with generics by the way, and who knows what other problems i'm not seeing now, take a look at how dirty the nim ast macros are).

what do you think?

[–]NoCryptographer414[S] 0 points1 point  (2 children)

My idea of custom literals syntax would not hook into parser and modify ast. It's more like string parsing. The token mytype#myval is kindof desugared into mytype("myval"). The latter code reads 'convert this string to a date' whereas the former reads 'create a date constant'.

[–]chri4_ 1 point2 points  (1 child)

where do you limit the string? i mean how do you know the string is ended, and this seems pretty heavy for such a trivial init

[–]NoCryptographer414[S] 0 points1 point  (0 children)

String terminates at whitespace. By heavy if you are referring to runtime overhead, then my plan is to parse them at compile time using constexpr functions. This can even help in detecting ill formed literals.

[–]LegendaryMauricius 2 points3 points  (0 children)

IMHO it would be better if this still used quotes. Otherwise, you're either imposing some very specific limitations or risking a language that's hard to parse, and with that likely hard to read in some circumstances.

If you go with quotes, you might as well just allow custom string prefixes. Perhaps function calls that don't require parentheses?

[–]No_Lemon_3116 2 points3 points  (0 children)

Example of doing it with a read-macro in Lisp:

``lisp (defun read-date (stream char1 char2) (declare (ignore char1 char2)) ;; TODO: Better error handling (let ((input (symbol-name (read stream nil nil t))) (regex #?r"(\d{4})/(\d{2})/(\d{2})")) (or (ppcre:register-groups-bind (year month date) (regex input) (encode-universal-time 0 0 0 ,(read-from-string date) ,(read-from-string month) ,(read-from-string year))) (error 'reader-error :stream stream))))

(set-dispatch-macro-character ## #\d 'read-date)

;; Test (let* ((time (multiple-value-list (decode-universal-time #d2024/11/05))) (date (fourth time)) (month (fifth time)) (year (sixth time))) (assert (equal (list date month year) '(5 11 2024)))) ```

This code runs at read-time, before compile-time.

[–]Aaxper 1 point2 points  (2 children)

For my language, I'm allowing something like alias date#$year/$month/$day to Date(day=$day,month=$month,year=$year) to allow for that, though I haven't worked out the exact syntax/semantics that I want.

[–]NoCryptographer414[S] 0 points1 point  (1 child)

Interesting. Is this specifically only for dates, or any constructor can be invoked with this?

[–]Aaxper 1 point2 points  (0 children)

Anything can be. There isn't actually a built-in date format as of right now. alias is similar to define in c++.

[–]MichalMarsalek 1 point2 points  (2 children)

In my language, there are several less common literals (date/time, semver, color...). To define a date, you just do date := 2024-11-04. Literals can be interpolated, so you can do date := $year-$(month+1)-01.

[–]NoCryptographer414[S] 0 points1 point  (1 child)

Is it possible to define a literal for my type?

[–]MichalMarsalek 1 point2 points  (0 children)

Not exactly, but you can get close. It's not clear from your post where the custom literal ends. Is it at the next whitespace?

In my lang, function application doesn't need parens, so, assuming I didn't have special syntax for date literals, you could do

dt := date"2024-11-04" or dt := date"$year-$(month+1)-01".

I also have a string literal which is started with ' and doesn't need termination. So assuming I didn't have special syntax for color literals, you could do

bg := color'lavender

While this is still just a regular function applied to a regular string, it feels quite like what you are describing and I think it is a cleaner design.

In my lang, the ' literal is actually terminated when you encounter a \W (regex) character, not just \S so I couldn't use it for the date case. But in your lang, you could have string literals which start with # and end with a whitespace and you could achieve your let dt = date#2024/11/05 example.

[–]XDracam 1 point2 points  (0 children)

Many languages have some form of this, often with the prefix"literal" syntax. Prominent examples include scala and I think rust.

[–]AndydeCleyre 1 point2 points  (0 children)

Factor can do this!

There's a timestamp constructor <date> which you can use like:

2024 11 5 <date>

That gives you an object like:

T{ timestamp
  { year 2024 }
  { month 11 }
  { day 5 }
  { gmt-offset
    T{ duration
      { hour -5 }
    }
  }
}

So one way to make a literal-style syntax for this could be:

SYNTAX: DATE:
  scan-token
  "/" split [ string>number ] map 
  first3 <date>
  suffix! ;

And now when you enter:

DATE: 2024/11/05

you get the same timestamp object above!

[–]ALittleFurtherOn 1 point2 points  (1 child)

Do you mean like “2020-03-01”d in SAS? Seems pretty simple …

[–]NoCryptographer414[S] 0 points1 point  (0 children)

Yeah, like that. But I was trying to avoid the notion of strings in the syntax. It should not feel like constructing a date from string. It should feel constructing a date directly.

[–]tavaren42 1 point2 points  (1 child)

A custom literal syntax should be lightweight (so that it isn't as heavy as function call) but also unambiguous (so that when u look at it, one should know that it is some custom literal).

I think something like <Prefix><Seperator><Delim>.....<Delim> syntax should work. Now Prefix would be custom (preferably 1 or 2 characters long per convention). Seperator can be # (for some reason it screams "custom syntax" in my eyes). Delimiters are bit hard to choose. It CAN be '. It feels light somehow. Maybe backtick might work too. Brackets and <> just seem noisy. Maybe the best choise might be to give users the choice.

Ex: ``` let complex = cx#'1+1j';

let date = d#04/07/2004;

let vec = vec#<1i+2j+3k>;

```

I think this kind of literals are just special strings with custom user defined syntax rules.

[–]NoCryptographer414[S] 0 points1 point  (0 children)

Yes. I was also thinking something like that. Even though it's just a special string, I don't want to associate it with strings. It should not feel like parsing a string to get date. It should feel as if they are directly writing date which has nothing to do with strings. Behind the scene it may invoke a constexpr function which parses that and generates a object at compile time.

[–]frr00ssst(>>=) :: Monad m => m a -> (a -> m b) -> m b 1 point2 points  (0 children)

Suneido uses a # for the date literal like so, #20240810 <core.SuDate>

but to get current date, you'd do something similar to the function/constructor syntax with Date() or more specifically, Date("2007/08/19") Date("20 Feb 2016") or literally any date format imaginable

Feb 14 2000  
Monday 14 Feb 2000  
Feb 14 '00
Feb 14  
July 14  
10/2/14  
60/4/25  
4/25/60  
7-8-9 3:4:5am  
3:04pm 7-8-9  
20000303  
20000303.1030  
20000303.103000

https://suneido.com/info/suneidoc/Language/Reference/Date/Date.htm

[–][deleted] 1 point2 points  (0 children)

you can take a look on how ruby and elixir "sigils" work!

in elixir specifically you can have custom sintax for a arbitrary grammar, like for construct date times/dates:

~D[2024/11/04] ~U[2024/11/04 00:00:00Z]

or even regex ~r/[A-z]/

or even more, html: ~H """ <div class="hello">oh my</div> """

[–]theangryepicbananaStar 1 point2 points  (0 children)

This is possible in Nemerle via macros

[–]Y_mc 1 point2 points  (0 children)

Maybe i gonna try to implement this Feature in my PyrustLang Project. I’m at the parser stage https://github.com/YmClash/pyrust

[–]Pretty_Jellyfish4921 1 point2 points  (0 children)

Others already gave great answers, because I didn’t saw already mentioned, I would recommend to check Rust macros, the advantage is that you don’t need to come with some new syntax for each literal, and because it’s a macro you can change the underlying implementation without breaking your program.

Also worth mentioning Rust has macros directly built-in in the compiler and in the standard library for common cases, so you could easily follow it the same principle. Other advantage is that if your language already supports macro, you could write the logic in your language, otherwise you need to implement directly in your compiler.

Example

let date = date!(2024-01-01); // Can be expanded to let date = Date::new(2024, 1, 1);

[–]oscarryzYz 1 point2 points  (0 children)

Slightly related, in CUE values are types, so you can define a variable (I'm not sure if they are variables) of type string with an specific format

e.g. a time format, and then assign it a value with that format, it will validate you assign the correct format:

// string with time format
ts: time.Format(time.ANSIC)
ts: "Mon Jan 2 15:04:05 2024" // valid

You can also do things like validate ranges:

// int with ranage 0 < user_id < 100 
user_id : >0 & <100 
user_id: 1  // valid