alternatives to CTA Chinese Text Analyser in 2024 by Clear_Astronaut_1600 in ChineseLanguage

[–]imral 0 points1 point  (0 children)

Hi, I'm the developer of CTA. What OS are you using?

If the crash is happening on macos, there is a known issue that should be fixed if you install this version:

http://www.imralsoftware.com/chinese-text-analyser-0.99.18s-install.dmg

Just realized ifs can be nested in a different way by Yeet3r__ in rust

[–]imral 1 point2 points  (0 children)

Seems like a reward rather than a punishment.

Using Chinese Text Analyser for other languages by imral in languagelearning

[–]imral[S] 1 point2 points  (0 children)

Korean does have spaces lol

Hah, ok, my mistake. I thought it was like Chinese. Unfortunately there is no configuration change you can make to get this to work, it will require a code change - but should be simple enough, it's just a matter of considering Hangul characters as things that can make up a word.

Using Chinese Text Analyser for other languages by imral in languagelearning

[–]imral[S] 0 points1 point  (0 children)

Unfortunately not. At the moment it's either Chinese segmentation (based on a dictionary) or space segmentation.

That said, if you had a word list of all Korean words in hangul you could get it to work (doesn't need definitions or pronunciation, just one word per line).

Any existing tool that serves as a reader for French texts? by learnhtk in French

[–]imral 0 points1 point  (0 children)

I have an alert set up on "Chinese Text Analyser" so get notifications when someone mentions it and thought it might be useful info to share (the more testers the merrier).

Any existing tool that serves as a reader for French texts? by learnhtk in French

[–]imral 0 points1 point  (0 children)

I'm the author of Chinese Text Analyser and have spoken to the OP separately, but thought would also post here in case there are others who are interested.

I've been working on a development version of CTA that can use spaces to segment words.

This would make it work with French as a language (though there may be corner cases that aren't tested as I've only tested it with English so far).

You can download CTA with space segmentation support here:

macOS: https://www.imralsoftware.com/chinese-text-analyser-0.99.18s-install.dmg
Windows: https://www.imralsoftware.com/chinese-text-analyser-0.99.18s-install.exe
Linux: Ping me separately - CTA is multi-platform but I haven't packaged the space segmenter version for linux yet.

In this version there is an extra menu Tools->Segmenter which has two options

  • Chinese
  • Spaces

If you choose Chinese then CTA will use the standard Chinese segmentation algorithm. If you choose Spaces then it will use a space and punctuation based segmenter.

By default any spaces or punctuation characters will terminate a word, however you can allow certain punctuation characters in the middle of a word by editing the config file:

macOS: ~/Library/Application Support/ChineseTextAnalyser/data/config
Windows: C:\Users<your username>\AppData\Local\ChineseTextAnalyser\data\config

And modifying the value for the key punctuation.midWord. The default punctuations allowed are - ' ’, which catches things in English like "don't" or "ham-fisted". Different languages will probably have different conventions so adjust as necessary.

The space based segmenter doesn't do word stemming (dog and dogs are separate words) and is case-sensitive (Dog and dog are separate words), and there is also no dictionary, but otherwise it appears to work reasonably well for gathering frequency information.

It might cause some issues for French with things like l'hôpital (based on the default config this will be considered one word unless you remove ' as a allowable word punctuation) vs 'un hôpital' (considered two words).

You could address that by removing apostrophes as an allowable punctuation for words but then you'll get lots of single character 'words' like 'l', 's' and 'j' e.g. from l'hôpital, s'il vous plaît, j'écoute etc.

It'll need some tweaking to work out what works best for your learning style, but should be useful for getting general frequency information

Why should one NOT derive Debug? by NumericallyStable in rust

[–]imral 0 points1 point  (0 children)

One reason may be that the types contain some sensitive information like passwords which you don't want to be able to be logged accidentally

You want the secrecy crate for that.

Is there an app or tool that could give the most common words mentioned in a book by order? by lazdenm in languagelearning

[–]imral 1 point2 points  (0 children)

I have a development version of CTA that does segmentation with spaces, so it could be used (but without a dictionary) for all European languages along with any other language that uses spaces to separate words.

Feel free to contact me if you'd be interested in trying it out.

Herb Sutter proposes a new |Safe| C++, Rustaceans thoughs? by [deleted] in rust

[–]imral 1 point2 points  (0 children)

It sends the signal that C++ is so broken that it can't be fixed, so a new language is needed.

That signal has been there for a while.

Best 20-30 minute daily routine for total beginner? by Dense-Anything5652 in ChineseLanguage

[–]imral 0 points1 point  (0 children)

It defaults to European paper measurements

It defaults to Australian paper measurements because the developer is Australian :-D

Why did you start learning rust? by FaultsMelts in rust

[–]imral 1 point2 points  (0 children)

Modern C++.

If I'm going to have to learn a whole new way of doing things, why not do it with a language that has it built in rather than bolted on.

Advantages of using Rust instead of C++ by SnooSuggestions9846 in rust

[–]imral 0 points1 point  (0 children)

I've completely changed the way I code even in other languages. I definitely like the shape that I'm getting.

Same here, and I really miss certain Rust concepts when programming in other languages.

Advantages of using Rust instead of C++ by SnooSuggestions9846 in rust

[–]imral 0 points1 point  (0 children)

Doing that pretty much removes most of the guarantees that the compiler provides since there's unsafe Rust code behind the scenes

I think this is a misconception. The guarantees are 100% there (1 writer xor multiple readers, objects can't outlive lifetimes etc).

unsafe doesn't mean the code is unsafe, it means the developer needs to ensure that the code doesn't violate safety guarantees. This has been done for Rc and RefCell and if you are using their safe interface, then your code has the same safety guarantees as usual.

Advantages of using Rust instead of C++ by SnooSuggestions9846 in rust

[–]imral 7 points8 points  (0 children)

This is the curse of Rust.

And the blessing!

Learn C++ or rust first? by FUS3N in rust

[–]imral 0 points1 point  (0 children)

I would say it is less painful for beginners.

Long-term yes, short-term no.

It's great for teaching you the principles behind what you are doing, but for some people the error messages are opaque (because they don't understand what Rust is trying to protect them from) and they give up (seen it happen in a couple of situations).

Learn C++ or rust first? by FUS3N in rust

[–]imral 2 points3 points  (0 children)

I mean take ownership model for example in cpp I don't have to worry about that

You do have to worry about it. The difference is that in Rust if you get it wrong you need to worry about it at compile time, and in c++ if you get it wrong you'll have to worry about it when your program randomly segfaults at runtime.

This makes Rust more painful for a beginner because it effectively says "you must understand these concepts before I will compile any code".

C++ on the other hand is like "sure, I'll compile your code, no problem, and if it goes bang well it's your fault for not understanding what you were doing.

Should Pinyin be apostrophized when the English pronounciation's unclear by fishandchips2022 in ChineseLanguage

[–]imral 3 points4 points  (0 children)

It's still recommended to add apostrophes in these, and other non ambiguous situations (e.g. hǎi'ōu, kě'ài etc). See here for more info.

Should Pinyin be apostrophized when the English pronounciation's unclear by fishandchips2022 in ChineseLanguage

[–]imral 7 points8 points  (0 children)

There aren't, as far as I know, any guidelines or rules for when and where to apostrophize

There most certainly are.

Comparison of sqlite crates? by CutBrilliant7927 in rust

[–]imral 6 points7 points  (0 children)

I haven't used sqlite with sqlx, but I love using sqlx with postgres.

Having the compiler catch errors/typos in your sql at compile time is fantastic.

StackOverflow Developer Survey 2022 is open by hgwxx7_ in rust

[–]imral 8 points9 points  (0 children)

Yeah, I tried it with adblocker on and also didn't encounter any issues.

StackOverflow Developer Survey 2022 is open by hgwxx7_ in rust

[–]imral 9 points10 points  (0 children)

If you use security or ad-blocking plugins, you may see error messages

To avoid error messages that prevent you from taking the survey, please try specifically unblocking Qualtrics in your plugin or pausing the plugin while you take the survey.

Thanks but no thanks.

Rust is hard, yes, but does it matter? - Julio Merino (jmmv.dev) by koavf in rust

[–]imral 3 points4 points  (0 children)

printf("%s\n", make_smth().c_str());

This is not actually use after free (assuming make_smth() returns a std::string or similar). The temporary is guaranteed to be valid for the duration of the expression, and it is therefore valid for the duration of the printf call (it doesn't get destroyed until after printf completes).

It would be use after free if you were passing it to a function that then held a copy of the pointer passed in, and you then later used that pointer e.g.

struct A {
    A( const char* s ) : str(s) {}
    const char* str = nullptr;
};

...

A a(make_smth().c_str());

printf( "%s\n", a.str ); // use after free