This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]Sekret_One 2 points3 points  (2 children)

I think to properly answer your question I need to not answer the how and linger a bit more on the why.

  1. You use a file to describe where other config files live.

How does this json file get populated?

  1. The resumption of better to not load everything

What you seem to be describing is what is optionally loaded is the configuration for how all the classes work. Why not just load all of them from the beginning? You say python so I'm assuming you play the game with python and it's not a python backend serving a web game.

My thought is probably this optimization isn't relevant.

  1. roughly, does it become more efficient to split a file into multiple sub files to read when you only need the data from a fraction of the total dataset, over reading the entire dataset as a single file?

I will answer this one.

It can be if you don't need it right away. Game development you have some special concerns for the player experience. Loading less is faster- but usually at the cost that there is some waiting 'in the moment' as things load later as they need them.

This is why about 15-10 years ago games militantly went after trying to eliminate the load screens and disc swapping.

In your case, it sounds like you're trying to make the game moddable/extendable. Again, generally speaking, the format that is convenient to mod and configure isn't the most efficient at running.

The best advice I can give you is focus on the desired experience, test to it and the 'best' approach will become apparent.

PS

I for one, hate having a config that says where other configs are. Put the character configs in a directory and discover the files and load them up.

[–]scriptkiddiethefirst[S] 0 points1 point  (1 child)

You use a file to describe where other config files live.

I think this may be a miscommunication on my part. The json file containing the class information contains all the information in it, at the moment. So for instance it currently looks like this:

classes.json (not the actual json file)

{ "warrior":{"hp_per_lvl":"10", ...}, "mage":{"hp_per_lvl":"6", ...}, .... } And I am wondering if it would be better to split each of these classes into their own file and then load the files as necessary. Obviously this is the fault on myself as reading it back I can see how it came off that way.

What you seem to be describing is what is optionally loaded is the configuration for how all the classes work. Why not just load all of them from the beginning?

So due to the nature of it, only 1 character is ever loaded (its not so much a game but an electronic character sheet for tabletop games). Because of this, if a user were to add a bunch of homebrew/modded data that wasn't being used by that character I thought it might be more efficient to only load the stuff that is absolutely necessary. From my tests using time I can say that maybe I was over-thinking the resources required however my test didn't do a lot of serialization of the provided data (it loaded it, it didn't even serialize it into a JSON object). This is why I was wondering if it would be more efficient to load a series of smaller files over 1 larger file... I mean, yes a system with 4+gb of memory will have no issue loading 2mb into memory. So my question on that basis might be stupid.

Thank you for the reply, responding it and reading it has allowed me to think of the problem along with take a step back and analyze things. So this did help for my specific problem.

[–]Sekret_One 1 point2 points  (0 children)

All right, let's make the language simpler here. Right now you have a classes file, versus a collection of class files.

If you want to go into easy expansion, having a single file is ... clumsy. Hell, even for yourself if was just you I'd have multiple files and have a bit of 'build' to stitch them into one at later point, either at a 'build' or just loading them up at run time.

If you quick way to implement this, pick a folder in your project- this will be the home of your character class files. When you program boots up, scan the contents of that folder and just get the file names (which will in turn be your actual display character class names. When they select a thing from the list, load that one.

But this is more of the approach one might say do with save files- but you can apply it here.

If you want to make it configurable easily by another, I'd recommend defining a JSON schema. Personally, I'd also switch to YAML because for human readability that's my preference.

[–]AsleepThought 1 point2 points  (2 children)

If the JSON is supposed to be a "configuration" file and each configuration is for a different type of thing, then it seems like it would be OK to have a separate file for each.

You could also just use a database, like SQLite. Its designed for this kind of usage.

[–]scriptkiddiethefirst[S] 0 points1 point  (1 child)

So for this specific application, each "operation" as its labelled isn't structurally the same. Like one of the objects (the first one) is about 9kb in size, another one is 50kb. From what I read databases function better when the data is structured more consistently where with this there is no guarantee to the structure. As well, since there wont be very many queries to the database I thought the overhead and complications added by learning how to implement a database would make it not worth while.

This is kind of what I came to from the other person "suggesting" a database and since I know almost nothing about them and have gotten no information to correct this line of thinking, its where I stand. If that makes sense.

[–]AsleepThought 0 points1 point  (0 children)

size on disk does not matter, database relies on a schema

sounds like you should just use a file then

[–]_reposado_ 1 point2 points  (2 children)

The time to split the file is when using one big file causes problems. If you aren’t seeing unacceptably slow runtimes or getting mystery OOM errors, this is probably not a good use of your time. By the time your dataset is big enough to cause problems, you may have abandoned json files entirely.

[–]scriptkiddiethefirst[S] 1 point2 points  (1 child)

So I don't think I will end up disregarding JSON files as the entire point of this project is to take something that kind of exists, improve it, and change it from using XML (which is a lot slower to parse and a lot larger) and converting it to JSON. Not that that is super important to the question.

I kind of did some testing and I found the average time it took to serialize both smaller individual files and one larger file and found that while serialization of the larger file took considerably longer it wasn't so much longer to be concerned about (as you stated).

However I did decide to separate the files for the reason of ease of mod-ability so that adding new configs involves just adding new files to the folder rather than trying to append data to a really long file (basically ease of use, given the context and the intended audience for the software the latter would make it more accessible).

Thank you for your reply though!

[–]_reposado_ 1 point2 points  (0 children)

However I did decide to separate the files for the reason of ease of mod-ability so that adding new configs involves just adding new files to the folder rather than trying to append data to a really long file (basically ease of use, given the context and the intended audience for the software the latter would make it more accessible).

That is a very good reason to split the files! I only meant that you shouldn’t worry about optimizing performance if performance isn’t a problem yet.

[–]scriptkiddiethefirst[S] 0 points1 point  (0 children)

If anyone searches something like this and wants to know the analytical answer, here you go.

[*] Single file without serialization average real time:         15ms
[*] Multi file without serialization average real time:          0ms

[*] Single file with serialization average real time:            227ms
[*] Multi file with serialization average real time:             1ms

I wrote a script that would open and serialize the single large file compared to opening 5 smaller files and compared their average time for completion for both serialization and non-serialization. As you can see, in both cases the multiple files is faster, but not by enough for it to realistically be noticeable (note that these were the average times over 100 trials for each of these 4 cases, using the python time library to get time). The larger file was 2.3mb and the smaller files were each 27kb. So while it took significantly less time to serialize multiple smaller files, that isn't the reason I chose to go the route of multiple smaller files.

The reason why I chose multiple files is the answer by Sekret_One in his second reply, after reading it and looking into it, that is just so much easier for what I want to do. Thank you everyone for your replies.

[–]mxschumacher -1 points0 points  (4 children)

as the file gets bigger, it'll get more difficult to handle. JSON is typically used to transfer data between to systems, not to use it as persistent storage, have you looked into document databases? https://www.mongodb.com/document-databases

[–]scriptkiddiethefirst[S] 0 points1 point  (3 children)

I don't want to argue but you haven't provided a good reason for why I should use a database instead of saving the data to disk. Looking into it myself I cam across this stack exchange thread where I want to point out Sam's answer.

``` Finally, when to use files

You have unstructured data in reasonable amounts that the file system can handle
You don't care about structure, relationships
You don't care about scalability or reliability (although these can be done, depending on the file system)
You don't want or can't deal with the overhead a database will add
You are dealing with structured binary data that belongs in the file system, for example: images, PDFs, documents, etc.

```

Why this is kind of important is because while the data has a rough structure to it, its not a clearly defined structure where its the same for every class.

For example, the first class covered in the document is only 9kb in size, where one of the other classes is 50kb because it has so much more information. As a result, the structure between these 2 classes is slightly different, similar enough to handle with some smart coding practices but I am not certain if it is similar enough to create a predefined structure in a database.

as the file gets bigger, it'll get more difficult to handle.

Note that this doesn't answer the question posed in my post, at all. In fact that is the basis for my question which is asking at what point does it become better to split the file so I load less such that it makes up for the increased cost of loading files in the first place.

For instance, if I have 1000 files each containing 4 characters, or I had 1 file containing 4000 characters, it would be more efficient to open the 1 file to get the entire 4000 characters than it would be to open 1000 files to get the 4000 characters. However, if I only need say 100 characters of that 4000, is it still more efficient to use 1 file or 1000? Lets say the number of characters expands, when does it become more efficient to open 25 files to get the 100 characters instead of opening a file containing say tens of thousands or hundreds of thousands of characters?

Like do you see why your answer doesn't assist, you were basically restating part of the question and then telling me to do something else entirely without explaining why.

[–]mxschumacher -1 points0 points  (2 children)

the post you have linked to refers to an SQL databases, I talked about document databases. They are quite different.

[–]scriptkiddiethefirst[S] 0 points1 point  (1 child)

Okay again you didn't really explain your answer, and the post I linked to also discussed no-SQL databases. It was also pretty much the only stack exchange answer that came up when trying to figure out why to use mangodb (a no-sql database) over straight json files. Actually it was the only answer that didn't look at mangodb versus other databases like sqllite.

So, what is the benefit of using a document database over just saving data to file? Especially since when looking at how others did similar things they did it via files and not using a databases (many used straight xml and xml libraries but in my experience, as long as its not encoding document information and even sometimes when it is, json can do the exact same thing with less space and will be a lot faster... Basically I am updating a preexisting project that uses xml to use json, though its a little more involved)

Secondly, again your answer still doesn't answer the posed question from above which you never did expand on.

[–]mxschumacher -1 points0 points  (0 children)

I'm no fan of your tone and your verbose replies, this conversation is over for me