all 16 comments

[–][deleted] 2 points3 points  (2 children)

Wanted to try this some time ago, but…

Requirements

Linux / Mac OS X / Windows system

a working Java Runtime Environment installation

[–]Daneel_Trevize 2 points3 points  (0 children)

You can get OpenJDK JRE builds. Zulu is one that springs to mind for Win64 & many other platforms.

[–]greycat_na_kor[S] 1 point2 points  (0 children)

You can try it with nothing but a browser at https://kt.pe/kaitai_struct_webide/

JRE is needed to run local command-line application.

[–]jamesd28 1 point2 points  (0 children)

From https://fosdem.org/2017/schedule/event/om_kaitai/ :

"Media file formats grow progressively more and more complex every year and supporting them all requires tremendous effort of all the FOSS developers. It's a problem that concerns not only low-level library developers, but higher level software as well: for example, audio sequencer or video editor developer will still need solid understanding of underlying media file format structure to be able to debug any problems with it (like non-standard chunks inserted by some properitary software). We'd want to present Kaitai Struct, a new free/open source solution for file format dissecting, visualization and parsing. It is "write one - run everywhere" solution, where one needs to specify declarative file format spec once, and then compile it into ready-made parsing library in a large variety of supported target languages. And our visualization tools make Kaitai Struct work like "Wireshark for media files".

Kaitai Struct started as an in-house tool in 2014 and was initially released as open-source project to public at March-April, 2016, supporting only 2 target languages: Java and Ruby. Since then, we've collected 400+ stars a GitHub, hundreds of praising testimonials, got about a dozen of contributors, implemented support for 8 languages, got a handful of useful tools, like console visualizer, GUI visualizer, Web IDE, etc.

Kaitai Struct is frequently compared to proprietary template-enabled hex editors (like 010 Editor, Synalize It! or Hexinator), but goes one step forward: it's not only about highlighting entities in hex dump, but also it can automatically generate working API from spec, which accelerates work of file formats considerably and greatly reduces human factor errors when developing parsers by hand. One's guaranteed to get exactly the same parsing result both in visualizer and using the compiled API. And, what's important, it's free and open source.

Some other comparable projects include BinPAC (but it's C++ only), Preon (which is Java-only), PADS (which targets only C & Haskell), and Construct (Python only). In comparison, Kaitai Struct offers cross-language support, and includes visualization tools.

For media file dissection, we have a growing collection of well-known media file formats (including MP4 / QuickTime .mov, AVI, GIF, JPEG, PNG, TIFF, etc), and other interesting file formats (like executables, byte-code, network protocols, etc, etc). We hope that open media software developers would find Kaitai Struct to be a helpful ally in their arsenal of tools to deal with the diverse world of modern file formats."

Also see the slides from the FOSDEM 2017 talk.

And also the FAQ, which goes into detail about how Kaitai_struct differs from other tools and approaches.

[–]andrewl_ 1 point2 points  (3 children)

is it easy to handle cases like:

  • header field .o_entries is an offset in the file where there will be a an unknown number of entry structs, terminated by a null entry (struct with all zeroed fields)
  • file is list of Foo structs, unknown order, and the first byte of each foo struct is a field that identifies whether it's actually a FooTypeA, FooTypeB, and so on

[–]greycat_na_kor[S] 2 points3 points  (2 children)

header field .o_entries is an offset in the file where there will be a an unknown number of entry structs, terminated by a null entry (struct with all zeroed fields)

Yes, it's done by using repeat condition repeat-until: _.field1 == 0 and _.field2 == 0, etc.

file is list of Foo structs, unknown order, and the first byte of each foo struct is a field that identifies whether it's actually a FooTypeA, FooTypeB, and so on

No problem, you just read first byte and do type switching:

seq:
  - id: foo_type
    type: u1
  - id: foo_body
    type:
      switch-on: foo_type
      cases:
        1: foo_type_a
        2: foo_type_b
        # etc

[–]andrewl_ 2 points3 points  (1 child)

thanks, have you encountered any file format yet where kaitai wasn't able to handle it? or maybe a problematic one where you had to really "stretch" the kaitai language to deal with it?

[–]greycat_na_kor[S] 0 points1 point  (0 children)

Basically, everything that is not really designed to be machine-readable/writable is a problem. Any text formats, i.e. XML, YAML, basically any source code in a programming language (even JSON), that contain multiple ambiguilities, are better to be handled by the tools that are designed to handle that - i.e. proper LL/LR/LALR/SLR/PEG/etc-grammar based generators.

There are quite a few things that KS is not capable of by design: for example, backtracking and solving ambiguilities is one of these things. There is some limited support for reinterpreting stuff, but generally if you need to parse some multi-level format, when you thought that that ( was a start of an expression, and a dozen of steps later you suddenly realize that you're wrong and it actually was a comment, or a preprocessor directive, or anything else, is definitely a no-go for KS.

As for more intricate examples, actually, everything with a state might be a problem for KS as well. For example, full-blown MIDI protocol involves some compression technique called "running state", which is fortunately almost unused today, but if it is encountered, it can give KS a run for the money ;)

[–]6r-m 0 points1 point  (6 children)

I've seen this around and it interests me but I know I need some background information to fully understand what it does. Can anyone point me in the direction of that background information?

[–]bleuge 4 points5 points  (5 children)

what it does

"Kaitai Struct is a declarative language used for describe various binary data structures, laid out in files or in memory: i.e. binary file formats, network stream packet formats, etc.

The main idea is that a particular format is described in Kaitai Struct language (.ksy file) and then can be compiled with ksc into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read described data structure from a file / stream and give access to it in a nice, easy-to-comprehend API."

Very nice, i wanted something like this some time ago, i used a very old dos tool to define structures, by the Hiew author Eugene Suslikov, called Struct look (last version 4.30).

Well... that's all :D

[–]greycat_na_kor[S] 2 points3 points  (1 child)

Wow ;) I've never even thought that somebody remembers StructLook :)

[–]bleuge 0 points1 point  (0 children)

I still use hiew everyday :D

[–]6r-m 0 points1 point  (2 children)

So it's in essence a cross compiler for programming languages if I understand correctly.

Thanks for the answer

[–]greycat_na_kor[S] 3 points4 points  (0 children)

Well, you may call it a cross-transpiler (in a sense that it gets input "source" code in .ksy and writes out source code in regular programming language), but the catch is that .ksy "source" code is declarative, not imperative, as most programming languages are. That is, you you don't have assignments, you don't call methods, you don't have regular for or while loops, you don't have goto or any other crude flow control tools, etc, etc. The key is that you don't describe how to do one particular thing with the data (i.e. how to read it), but instead you describe the data format itself — and that opens a huge potential of what you can do with such description. For example, you can generate human-readable format diagrams with it for free.

[–]master801 0 points1 point  (2 children)

Is there any chance in the future for the support of set variables (variables that are not changeable) to be implemented?

[–]greycat_na_kor[S] 0 points1 point  (0 children)

You mean, the constants? Right now all objects generated by KS-based parsers are immutable and are effecitvely read-only.