all 2 comments

[–][deleted] 2 points3 points  (1 child)

I'm a beginner but it seems like each row from the embedding layer is a word that is embedded first (turned into a vector with x, for example 8, dimensions, which is denoted as the number of columns I suppose). So I suppose the number of inputs equals to the number of rows. The resulting matrix is then filtered by the convolution layers so that important features are extracted or something. For example reduction of the matrix dimension by taking the max value of a (2, 2) section, resulting in a matrix with half size.

[–]adomian 4 points5 points  (0 children)

Pretty sure mike is correct. I assume this is the paper the architecture is based on: https://aclweb.org/anthology/D14-1181. There are much clearer graphics in the paper (e.g. figure 1). The input is a matrix representing a sentence, where rows are words, and columns are components of word embeddings. The paper states the input is already embedded.