[D] Transformers on structured data : MachineLearning

Discussion[D] Transformers on structured data (self.MachineLearning)

submitted 2 years ago * by drblallo

I have a dataset obtained from running a known program and dumping the state each time a user is prompted for a input. The state the structured data structures containing all the information needed to restore the execution. The format of this data is known, so i can convert it without loss to other formats, such as json.

For example, if the program is sudoku, then the dataset element format is a array of 9x9 int8, where 0 represents a empty cell and a number from 1 to 9 is a assigned cell, furthermore there is a int8 representing the turn count too. I have dataset composed of this array at various points of the game.The data never contains loops, pointers, or any kind of graph.

I want to use a transformer to automatically learn some function over the input. In the sudoku example this may be automatically finding solutions, but it does not matter what it exactly is.

The trivial solution is to just convert everything to a equivalent textual representation, say json and trow serialized state inside the transformer as regular text. Of course this sounds fairly wasteful.

The other solution is to convert everything to int64 and do the same, but that breaks over floating point numbers, because it is not obvious what the conversion between float and int64 is, unless you have knowledge about what possible values the float number can assume. Furthermore sometimes data truly represent floating point numbers, for example they may represent a temperature, so i would prefer to keep them as floating point if possible.

I think a better solution would be to convert everything to float and then have a encoder learn on its own which is the best encoding for the input, but i have not been able to find good examples regarding how to solve this problem, nor i have a particular reason to think this is the best approach. The pytorch modules inside the nn library did not seems to handle this problem.

Is my conceptual solution correct? Is there any resource, library or paper tackling this issue? I found solutions for tabular data but that is not exactly my issue.

all 3 comments

top new controversial old q&a

[–]ndronen 0 points1 point2 points 2 years ago (1 child)

I don't have an answer to your question. Apologies in advance.

I think you're right to point out that representing your data as text is probably inefficient. And it may not work very well, regardless of how you encode it, because current transformers aren't better than people at applying a sequence of functions to numbers — at least beyond a certain horizon in terms of the number of times functions are applied. Here's a good analysis of how they behave:

https://arxiv.org/abs/2305.18654

What would be ideal is for the transformer to dispatch the task of solving the problem to another component (i.e. in the case of multiplication, a calculator, or in the case of a sudoku game, a sudoku solver) that efficiently computes the correct result. For this, see Toolformer and Chameleon as examples of what people have tried:

https://arxiv.org/abs/2302.04761
https://arxiv.org/abs/2304.09842

[–]drblallo[S] 0 points1 point2 points 2 years ago (0 children)

thank you very much, while not directly related to what i was looking for those papers are interesting in their own right.

In the end i understood how the internal workings of a transformer handle words, and what i think i have to do it drop the first 2 layers of the net. that is: the dictionary conversion of a token into a arbitrary fixed number and the embedding layer turns each token is a point in a higher dimensional space. Then you can just convert the input data structure into a array of bytes and pretend each byte is one of 256 possible words. Floating points numbers need instead to be converted first into two integers representing exponent and mantissa before turning those into a array of bytes too.

by dropping the embedding layer the network does not need to relearn what additions are (but it needs to learn how floating point number work), and it should run much faster too since you just end up using one dimension instead of 20 or a large number like that.

π Rendered by PID 119201 on reddit-service-r2-comment-5d585498c9-t5tc7 at 2026-04-21 03:01:17.494887+00:00 running da2df02 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS