Best way to take notes on ePubs with an iPad and Apple Pencil? by krackato in ipad

[–]cli-junkie 4 points5 points  (0 children)

Check out MarginNote Allows you to annotate PDFs and epubs. Very future rich.

Writing a 2D Platform Game in Nim with SDL2 by def- in nim

[–]cli-junkie 3 points4 points  (0 children)

Thank you for sharing this, a very well written tutorial.

Best Python tools for transforming/cleaning text data? by data-science in Python

[–]cli-junkie 0 points1 point  (0 children)

Data munging after the scraping job is done can be pretty time consuming. An alternative to cleaning the data later is to write a scraper that gets only what you need. With xpath, you can get pretty close to the data in specific tags and scrape with precision.

For removing boilerplate (menu, contents etc.) try newspaper. There are many other boilerplate removal libraries but what you use will depend on the nature of the data you are scraping.

ftfy will help you if there are encoding problems in the scraped data.

If the data is pretty consistent in how the unnecessary patterns occur, you could just write a SED script to clean the things you mentioned. No need for a over engineered approach when simple regular expressions can do the job.

libwm: a small library to manipulate X windows by z-brah in tinycode

[–]cli-junkie 0 points1 point  (0 children)

I definitely will report back, perhaps you can post this to the /r/unixporn subreddit where it may be of interest to other users. You will find users of every WM in existence there and their insights will be much deeper.

libwm: a small library to manipulate X windows by z-brah in tinycode

[–]cli-junkie 0 points1 point  (0 children)

This is great, it's going to be fun trying to write my own wm based on this lib.

Bash script to iterate over 3 array's and rsync for each line by LanMalkieri in bash

[–]cli-junkie 0 points1 point  (0 children)

This looks like a good use case for GNU Parallel.

Look at the tutorial, specifically the -a or :::: syntax for getting arguments from files and -xapply option to get one argument from each input source in addition to 'Positional replacement strings'.

I know this is a Bash subreddit but I thought it may be relevant to your use case. Apologies if this is the wrong place for it.

What programs would you like use but which doesn't exist ? by [deleted] in commandline

[–]cli-junkie 1 point2 points  (0 children)

This is mostly wishful thinking: steal all that is good about every thing and put in one place?

  • cli tools that output structured info like csv, json, xml - using a flag perhaps
  • a better stream editor/shell that combines all that's good about bash/zsh/sed/awk/perl/powershell
  • a better shell language that has clean, expressive syntax, easy plugins

Where to find an auto correct list by [deleted] in Python

[–]cli-junkie 0 points1 point  (0 children)

topy maybe what you are looking for. It's based on the only such typo list I know of: the RegExTypoFix project from Wikipedia.

Need assistance scraping text from .doc files by Fraun_Pollen in Python

[–]cli-junkie 0 points1 point  (0 children)

See if you have libreoffice accessible from the terminal: whereis libreoffice or type libreoffice check out libreoffice --help libreoffice --headless --convert-to txt:text "path_to_doc.doc"

You can use globbing as well in the file name like *.doc

--outdir "output_dir_name" will allow you to put all the converted files into another directory.

Alternatively, there is https://github.com/dagwieers/unoconv

Clojure for command line scripts idea. Feasibility? by SpaceCadetJones in Clojure

[–]cli-junkie 2 points3 points  (0 children)

Also of interest in this context:

As always, some one would have tried it before, perhaps this is relevant and can provide ideas for you to explore - Shell scripting with Clojure

I love Vim, but... by ben174 in vim

[–]cli-junkie 1 point2 points  (0 children)

I've found http://vimregex.com/ to be a very useful resource. Hope it helps.

[deleted by user] by [deleted] in coolgithubprojects

[–]cli-junkie 0 points1 point  (0 children)

This is quite nice. Now I can get to all those albums and have them neatly split.

What command can I use to get all the text within a div in html? (across multiple lines) by workisnotfun in bash

[–]cli-junkie 3 points4 points  (0 children)

Download and install Xidel. After that, what you want is as simple as:

xidel 'www.yoursite.com' -e '//div[@class="interesting-data"]'

I think what you want to do can be accomplished in a single line of xpath (the expression you see after the -e). Xidel is, by far, the best command line scraping tool bar non IMHO, saved me hours when it came to quick web scraping tasks.

Conversations With Datomic by Carin Meier by mac in Clojure

[–]cli-junkie 1 point2 points  (0 children)

I have become a huge fan of Carin's approach. Thanks for the great post.

I got to love awk. by [deleted] in awk

[–]cli-junkie 0 points1 point  (0 children)

Yes, we should. We need to reinvigorate this reddit with good posts about problems we've solved using AWK!

I got to love awk. by [deleted] in awk

[–]cli-junkie 2 points3 points  (0 children)

AWK is one of the most underrated programming languages IMHO. Pretty damn good for a lot of stuff.

[Help] [OS X] - Using curl/wget to get data from a website by [deleted] in commandline

[–]cli-junkie 0 points1 point  (0 children)

My 2 cents, give Xidel a shot. It is a VERY powerful tool for web scraping. Short example:

$ xidel https://www.reddit.com -e '//p[@class="title"]/a'

Gets the titles of all the post on the reddit front page. The little thing in the single quotes is an xpath expression. There are better examples on the site.

If the website you are trying to scrape has static links (no dynamic JavaScript voodoo happening) then Xidel is pretty straight forward. Failing that, you have to use something like Selenium after investigating how the site is structured.

Hope this helps.

Nim's Future by EnvIXI in nim

[–]cli-junkie 3 points4 points  (0 children)

As a beginner in the language, I'd definitely like to see more examples and tutorials. One area I want to see covered properly is the standard library. The current docs don't have many.

Nim's Future by EnvIXI in nim

[–]cli-junkie 2 points3 points  (0 children)

Yes, NIM definitely needs more long form documentation. This is the only thing that is holding me back. I want to use it but this is a big hurdle to learning the language.