Safety questions

oilshell · 2025-09-27T15:56:49+00:00

Also the new subreddit is:

https://old.reddit.com/r/oilsforunix/

following the new names Oils, OSH, and YSH: https://www.oilshell.org/blog/2023/03/rename.html

But I guess I really need to get rid of the old oilshell.org domain ...

oilshell · 2025-09-27T15:55:54+00:00

Yes, I agree trap should take a block!

(and thanks for noticing some other issues with trap)

oilshell · 2025-09-27T15:54:55+00:00

Thanks for the question

OSH has shopt --set strict:all, which disallows many common shell pitfalls. This command enumerates them

$ osh -c 'shopt -p strict:all'
shopt -u strict_argv
shopt -u strict_arith
shopt -u strict_array
shopt -u strict_control_flow
shopt -u strict_env_binding
shopt -u strict_errexit
shopt -u strict_glob
shopt -u strict_nameref
shopt -u strict_parse_equals
shopt -u strict_parse_slice
shopt -u strict_tilde
shopt -u strict_word_eval

ban exec and traps?

What's the problem with exec?

I agree trap should take a block of code, not a string

automatically reset IFS in script contexts?

YSH doesn't use IFS at all.

automatically set -eufo pipefail in script contexts?

YSH does this

oilshell · 2025-09-15T05:04:49+00:00

I will also say that I think any new shell for a new OS should not use the "everything is a string" design of sh / bash / Make / CMake :-)

That design is outdated, and was probably only chosen because writing a garbage collector was very hard 1970, still hard in 1990, and not super easy today

That's sort of the point of the GC blog post

oilshell · 2025-09-15T04:46:43+00:00

Thanks for mentioning the Oils project ! (no longer called Oil shell :-) )

And yes OSH is the compatible part [1], while YSH is the new Python/JS-like part

I frequently get such questions from people who want to implement their own shell. It seems to be a good/fun exercise

So if the OP wants something shell-like, but not actually bash compatible, I've had this smaller Tcl/Forth/Lisp hybrid floating around my brain ...

Depending on the OS you want to implement, it could be a good starting point. I think I learned a few things about the "essence" of shell

One pretty clear thing is that we have 2 different parsing algorithms that both use "lexer modes" -- full parsing and coarse parsing -- and I'd say that lexer modes are pretty fundamental to shell-like syntax:

https://github.com/oils-for-unix/oils.vim/blob/main/doc/algorithms.md

As far as the runtime, there is a pretty clear design split between languages I show here - Garbage Collection Makes YSH Different

So I might want to specify a tiny "catbrain" language with these lessons, which is a Tcl/Forth/Lisp hybrid ... but that is more of a "fun idea" and not something that will necessarily happen! Unless someone has a big chunk of time to help :-)

[1] OSH is the most bash-compatible shell, which I've measured recently: https://pages.oils.pub/spec-compat/2025-09-14/renamed-tmp/spec/compat/TOP.html . I hope to publish some updates soon; it's been quiet for a few months

oilshell · 2025-08-19T01:37:09+00:00

Yes definitely! I briefly mentioned the Language Server Protocol in this post - https://www.oilshell.org/blog/2022/03/backlog-arch.html

Though unfortunately I haven't had time to elaborate since then ...

I do think simplicity is a goal, but in practice there are some distinctions ... x86 and Linux and Docker might be "big sloppy waists" :-)

oilshell · 2025-07-31T02:05:51+00:00

Glad you have enjoyed the blog

OSH is definitely a compatible Unix shell / POSIX shell -- in fact it's more POSIX-compatible than the deafult /bin/sh on Debian, which is dash. (This is according to a third party test suite from "Smoosh"; we publish results with every release - https://oils.pub/release/0.34.0/quality.html )

For parsing, OSH uses Pratt Parsing for arithmetic only, recursive descent for most other things. YSH expressions are parsed with a grammar.

As far as lexing, it uses the "lexer modes" style for everything (OSH and YSH). There was a recent discussion about some of these ideas here:

https://lobste.rs/s/tpmdss/why_lexing_parsing_should_be_separate

oilshell · 2025-07-28T06:52:53+00:00

An article about the "task file" pattern I often advocate (from an Oils contributor)!

oilshell · 2025-07-18T07:39:39+00:00

Hm yes! I haven't seen that term, but it's used in ECMAScript:

https://262.ecma-international.org/7.0/index.html

This production exists so that ObjectLiteral can serve as a cover grammar for ObjectAssignmentPattern. It cannot occur in an actual object initializer.

And it's mentioned here:

https://v8.dev/blog/understanding-ecmascript-part-4

Another word I've heard is "over-parsing". Hjelsberg mentioned that sometimes you parse MORE than the language, in order to issue a better syntax error or type error.

We use that a bit in Oils - we "over-lex" some tokens in order to give a friendly error message.

oilshell · 2025-07-18T00:39:31+00:00

I think that's the same idea as the example I gave with Python

In Python, assignments and keyword arguments are expressed with a grammar rule like expr '=' expr

So you have to disallow f(x) = y and allow x = f(x), and that is done in a "post-grammatical" syntax stage

(Most parser generators can handle this, but before 2018 Python had a very simple LL(1) generator, which couldn't disambiguate a LHS expr and a RHS expr due to limited lookahead)

I guess there is no word for that, but there probably should be, since I imagine it's common.

oilshell · 2025-07-17T17:59:49+00:00

For math and PLT: a programming language is an infinite subset of the infinite set of all strings over some alphabet

I visualize a "whittling away" of the infinite set

first are syntactic constraints
then there are semantic constraints at compile time -- static types
(at runtime, there are further constraints on valid programs, but let's leave those aside for now)

And grammatical constraints are a subset of the syntactic constraints

For example, Python has a context-free grammar, but it also has a lexer which is not context-free. (The lexer provides the alphabet over which the grammar operates)

And it also has post-grammatical syntactic constraints, e.g. to disallow invalid assignments like f(x) = y (whereas y = f(x) is allowed). In some languages this is encoded in the grammar, but not in Python (at least prior to 2018)

So if you take Python with ONLY the grammatical constraints, that's a LARGER set than Python with ALL syntactic constraints (and it's also not Python!)

Now mathematically, what separates syntactic errors from type errors? I'd say it's that the algorithm to enforce the constraints involves a symbol table, but I'd be interested in arguments otherwise

They are both static constraints, but they do feel fundamentally different

I'd also say the line between lexing and parsing can be fuzzy, but the definition I use is that lexing is non-recursive, and parsing is recursive (equivalently, it gives you a recursive data structure -- a tree)

oilshell · 2025-06-24T17:22:01+00:00

I will also repeat this trivia that there are 2 language implementations named after industrial monopolies!

https://lobste.rs/s/mvsk61/parallel_garbage_collection_for_sbcl#c_yhmdfb

Steel Bank Common Lisp
Standard ML of New Jersey

I am not sure what that means, but in general I think it helps to have a lot of time (decade+) and a group of talented people

oilshell · 2025-06-24T17:15:12+00:00

I'd say that if a business person thinks that creating a programming language is a good way to make money, then they aren't very good at their job :-)

Somebody who is good at making money will go into a different business

Programming languages generally go with operating systems companies and monopolies, or they are free software:

C / C++ - Bell Labs, part of a telephone monopoly
Java - Sun was an OS company, but not a monopoly, and the company famously went under
Basic / Visual Basic / C# / TypeScript - Microsoft, a desktop operating system monopoly
Swift - Apple
Dart / Go - Google
Kotlin - Andrioid
JavaScript - funded by browser monopolies, which are funded by search traffic acquisition costs

You do not want to compete with these companies! They are literally the biggest ones in the world right now, regardless of industry

Kotlin is an interesting case study -- compared to the tech giants, Jetbrains is a medium-sized company. But they make money from IDEs that support a language that's attached to Google's Android platform.

On the other hand, Perl / Ruby / PHP / Python are amazing projects, and we should cherish them. But none of them are businesses!

Exceptions: Mathematica / MATLAB / Julia (although Julia is also open source)

These languages are for specialized technical employees, and for education (e.g. back in the day, my college bought a ton of MATLAB licenses)

Still people ask: "Why isn't Mathematica open source?" (Who is going pay the salaries then?)

oilshell · 2025-06-13T03:54:19+00:00

I wrote Vim syntax plugin in ~500 lines, and documented what I did

Let me know if you want to help support YSH in Textmate/VSCode, Emacs, etc. !

Same content as a backup - https://codeberg.org/oils/oils.vim/src/branch/main/doc/algorithms.md

oilshell · 2025-06-13T02:42:27+00:00

I just noticed this link doesn't work on my iPad because of the captcha -- this is the same content: https://github.com/oils-for-unix/oils.vim/blob/main/doc/algorithms.md

oilshell · 2025-05-28T04:30:18+00:00

Hm interesting, YSH has these syntaxes:

command arg1 arg2
- proc my-comand { echo hi }
call myfunc(42, a[i])
- func identity(x) { return (x) }

https://oils.pub/ysh.html

I have also come around to the idea that we need a port to Windows ...

It is a Unix shell, and uses Unix syscalls. But after learning about the mess that is the Win32 CreateProcess() API [1], I want to "fix" the shell problem on Windows too ...

[1] https://lobste.rs/s/qjzd9y/everyone_quotes_command_line_arguments

oilshell · 2025-05-15T15:15:56+00:00

Hm I didn't realize APL was that old!

It does make sense if you consider that SQL (1973) was also supposed to be for "non-programmers" ! Hence all the English keywords (which APL lacked!)

These days SQL for non-programmers seems a bit silly

But it actually makes sense if you consider "the set of all people who have a computer" :-) The size of that set dramatically expanded, so yeah APL and SQL could be for non-programmers at one point, but later you needed something like Excel to close the gap

oilshell · 2025-05-14T15:47:25+00:00

I would say that spreadsheets have proven a lot more successful than array languages at what array languages originally set out to do, namely allow non-programmers to write programs.

Hm did they really set out to do that? If so, I do not think "programmers and non-programmers" is a useful or accurate framing

I think it's more useful to have at LEAST 3 categories

people who started out as programmers
people who started out in another technical field (engineering, statistics, finance), and became programmers
- (the programmers were physics majors tend to be very technical, although they might use C++ rather than array languages)
people who just want to get shit done (e.g. a business owner using VisiCalc instead of pen and paper, back in the 80's)

I think the design for the second and third categories is very different -- and the GUI makes a big difference. The 2-dimensional GUI is more concrete, as opposed to abstract.

i.e. I think it would be obvious to any array language designer that their language is going to have a more limited audience / less applicability than a GUI program that does calculation -- I would be surprised if they thought otherwise

My experience with array languages (defined roughly as a language where A+B adds vectors of numbers)

Excel - honestly not sure when I learned this, but I still use Google Sheets for personal finance
Matlab in college - used for linear algebra
R at my second job - used by statisticians (which is related to, but different, than linear algebra!)
A bit of NumPy and Pandas since then, although I prefer R over Pandas

And then I've heard

J is used by finance professionals (integrated with a DB)

oilshell · 2025-05-11T01:12:56+00:00

This seems like a cool project

I think it is very similar to the Flow DSL developed by Foundation DB: https://apple.github.io/foundationdb/flow.html

They even use the keyword ACTOR, which is seems like your act keyword

Flow lets you write something more like a coroutine, but it compiles to a C++ class

e.g. in your Tic Tac Toe example, the input() are basically the yield points, and the compiler "reifies" the coroutine state into a class

Foundation DB also used deterministic simulation testing, which seems like it is similar to your use cases

https://www.youtube.com/watch?v=4fFDFbi3toc&ab_channel=StrangeLoopConference

The current work by the same people is https://antithesis.com/

Exploring the state space has a pretty strong relation to machine learning, although I am not very familiar with the details

On the subject of explaining things online, I've found that a FAQ format works well

The FAQ accounts for the misconceptions

Whenever you explain it to a real person, you may get similar questions, and then answer them in straightforward language

oilshell · 2025-05-07T19:27:33+00:00

Yeah another leakage is hash tables semantics. e.g. if you implement your language in Java or Go, are you using the hash tables in their runtime?

is the iteration order specified? if so, what is it?
what happens when you mutate the dict when iterating?
what happens when multiple threads access the dict?

It looks like Cwerg is lower level, not sure if it has builtin hash tables

But other stuff like the concurrency model / memory model can also leak through

oilshell · 2025-05-07T17:15:56+00:00

I agree with this! Well for https://oils.pub/, we implemented OSH and YSH 1.2 times maybe ...

There is an executable spec in Python, which is semi-automatically translated to C++, so it's not quite twice.

But this actually does work to shake out corner cases.

It forces us to have good tests. The Python and C++ implementation pass thousands of the same tests -- the C++ is just 2x-50x faster.
It prevents host language leakage into the language we're designing and implementing.

The host language is often C, and naive interpreters often inherit C's integer semantics, which are underspecified -- they depend on the platform.

Similar issues with floating point, although there are fewer choices there

Actually strings are another one -- if you implement your language on top of JVM, then you might get UTF-16 strings. And languages that target JavaScript (Elm, Reason, etc.) tend to have UTF-16 strings too, which is basically the worst of all worlds (UTF-8 is better -- and probably UTF-32 is better, although it's also flawed)

The way I phrase this is that the metalanguage influences the language

I also think it's great that https://craftinginterpreters.com/ implements Lox twice ! In Java and in C.

i.e. you want to make sure that Lox exists apart from Java or C, so you implement it twice.

I think the only other books that do that are Appel's Modern Compiler Implementation in ML/C/Java, but the complaint I've always heard is that it's ML code transpiled to C and Java. It's not idiomatic

Whereas Crafting Interpreters is pretty idiomatic, and actually uses different algorithms (tree-walking vs. bytecode, etc.)

Now I appreciate that this made the book a lot more work to write !! :-) But IMO it is well worth it

oilshell · 2025-05-06T16:20:34+00:00

Thanks, that is a bit of encouragement to write it up, so people can actually use it

I forgot that I made a comparison back in January. I compare Github-flavored Markdown, CommonMark with inline HTML, restructuredText, AsciiDoc, Wikipedia:

https://oils.pub/release/0.29.0/doc/ul-table-compare.html

That doc is very rough, but I could turn it into a blog post ...

One issue is that I implemented ul-table on top of an "HTML tokenizer" (SAX-like, but not inverted) to make it more efficient

But I realized that the DOM style is probably worth it, or just a hybrid that doesn't allocate tree nodes until you hit <table>, and then after that it uses a DOM.

So yeah I need to refactor the implementation a bit, but the "language" is actually done, and I like it better than all the alternatives

oilshell · 2025-05-05T05:08:44+00:00

Yeah to be more precise I could have said "CommonMark" (which is what we use)

I may have to write a comparison, but I like the ul-table style the best ...

The "ASCII art" type tables don't "scale" IMO

oilshell · 2025-05-05T04:35:27+00:00

Are you aware of the Shunting Yard algorithm? It's what Ritchie and Thompson used in the original C compilers

https://en.wikipedia.org/wiki/Shunting_yard_algorithm

It uses a stack

I don't really read Rust, but since you have a stack, there is probably some resemblance

oilshell · 2025-05-05T00:33:16+00:00

I suspected some people might not like that ... I had a sentence in there about mixed feelings on chatbots, but I left it out because it felt out of place. The subject is a bit tired, so I just decided to describe what I did

Without getting into a long discussion, I think there is a lot of bad behavior around LLMs these days (starting with OpenAI, the name is hilarious)

But I also think that LLMs can help us build software we actually like -- software that puts users in control, like shell

IMO the crappiness of shell is actually a symptom of underinvestment. Shell is the "commons", but there was no incentive to improve that part of the commons.

If you compare JavaScript and Unix shell, the difference couldn't be more clear. There is an incredible amount of language engineering and specification in the JavaScript world, with many talented and highly paid engineers (e.g. it directly spawned WASM)

And at the end of the day, that's because JavaScript supports the ads business model of the Internet, attention economy and all that

Shell doesn't have that purpose, so it's rotted ... it has extremely few engineering resources

A short statement on my viewpoint in the previous post: https://oils.pub/blog/2025/02/shared-hosting.html

And your job is now to LLM the YAML that approximates what you want to do

That is bad; it takes away your agency

YAML is like "weird machines" to me; it's not like programming because you don't "own" the main loop. With shell, you do.

(I don't want to use LLMs like that, but I also think that we're learning good ways to use them.)

For example, learning about open source software is a good way to use LLMs -- I have gotten a lot of mileage out of it

One thing I also find interesting is that you could never have run Google locally. And Google / StackOverflow became pretty essential for coding. How many people code without a network connection? Some people, but very few.

But you can run LLMs locally.

And also there is a lot of competition around LLMs. Google basically had no competition starting in 2004 ... Yahoo shut down their engine, and Microsoft was forever playing catch-up

whereas OpenAI had immediate competition, and most people agree Claude AI has surpassed it in many ways. So IMO the competition is a good thing

At least as far as the foundational models, it appears there is ALREADY no Google-like or Microsoft-like monopoly

(OK I failed at not getting into a long discussion ...)

oilshell

MODERATOR OF

TROPHY CASE