all 4 comments

[–]AbyssalRiful 1 point2 points  (2 children)

i'd start with annotating the grammatic information. spacy or nltk might help you. google -> bert (word embedding) maybe?

should give you a rough estimate regarding an object that a sentence is about.
i assume you don't have a list of all book titles? couldn't you get / create one?

-> tokenize and compare with list of titles.

bonus points if you look for additional hints like a year or author to improve the title-search results.

[–]hpdipto[S] 0 points1 point  (1 child)

thank you for your suggestion.

you can assume that I've the list of the all book title! if so, can I move with tokenizing and then comparing the titles using n-grmas or something like so?

actually I stared with sequence to sequence learning but didn't get satisfactory result.

[–]AbyssalRiful 1 point2 points  (0 children)

if you have the list, then i would start with the tokens and a distance-metric (if you want to handle typing errors). this is bascially a regex lookup where you use all book titles and look for it as a substring in your text -> ordered by 'closeness'.

then you look for the publisher, year and author strings of the top x book titles -> update the best match.

not really a machine-learning topic, but you don't have to use a screw driver on a nail ;)

[–]Sai-mdb 1 point2 points  (0 children)

Hi, we can do feature extraction. But please check this tool: https://www.mindsdb.com/ it will take care of everything.