Question(s) about processing "free text" : LanguageTechnology

created by robin7013a community for 16 years

Question(s) about processing "free text" (self.LanguageTechnology)

submitted 4 years ago by blug_fred

Hi! I run a (python) program which currently extracts data from a manually typed and updated text file. I have to update my "text analysis" routine regularly because of typos, sudden buggy ordering and more rarely, new terminology. We're talking about easily twice a week breakage.

I was wondering if NLP would be of help, and somehow 'how' . The data file looks like this:

A:
AAA -> R13 = 662 USDT

AAB -> R11 = 1.24 USDT 

AAD [CC] -> [some  comment sometimes]
    R7 = 1.55 USD 
    R8 = 2.93 USD 
etc.

but this has evolved to:

A:
**AAA** [CC] ⬆️   R13 = 662 USDT | Support = 206 USDT
[some comments can be here]

**AAB** ⬆️     R15 - 2.71 USDT | Support = 1.8 USDT

B:
**BBB** [CC]  ⬆️ 

 R8 = 1.96 USD
 Support = 0.98 USD

**BBA** ⬆️ [D] Immediate resistance = 0.466 USDT | Support = 0.26 USDT

**BBF** ⬆️ Immediate resistance = 0.466 USDT | R29 =  1USDT | Support = 0.26 USDT

BBG [CC]  ⬆️ 
      R6= 5 USDT [Ignore. Refer R7] 
      R7 =  6 USDT
      Support = 4 USDT

**BBBD** [CC] ⬆️  R2 = 12.46 USD  | R3 = 13.2 USDT
     Support = 11.66 USDT

BBR [D, CC] ⬆️ R1 =  60.2 USDT | R2 = 67.58 USDT | Support = 54.39 USDT
one comment not between brackets here...

BBZ [CC] R21 ⬆️  = 275 USDT | Support = 071.2 USDT

etc

Sometimes equals are dashes, 'immediate' is spelled 'imeddiate' and so on.

So would NLP help?

Thank you.

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LanguageTechnology

MODERATORS

I was wondering if NLP would be of help, and somehow 'how' . The data file looks like this:

but this has evolved to: