3.25-3.31
A Structured Self-attentive Sentence Embedding.
matrix attention + penalty
Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences
discretized structure, learning by reinforcement learning
Neural Language Modeling by Jointly Learning Syntax and Lexicon
soft tree, syntax distance, language model. and can be used to do grammar induection
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
use syntax distance to do parsing task
Max-Margin Markov Networks
objective of Markov Network change to margin based. But different from Structure SVM loss.
TODO: not understand, should to read again.
4.1-4.7
Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction
Objective: Mutual Information Maximization. It also can be seen as a general case of brown clustering. It maybe can be use to parsing.
Learning Deep Compositional Grammatical Architectures for Visual Recognition
Using and-or-grammar idea to build network. And node is composition, or node is add, terminal node is CNN. It is not a adaptive network, that means it is a fixed structure.
4.8- 4.21
Learning deep representations by mutual information estimation and maximization
mutual information maximization, also add local structure information. Show highly improvement over current SOAT on image classification task.
Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders
Constituency grammar induction by autoencoder. Calculate inside score and outside score by TreeLSTM and bilinear projection.
Unsupervised Recurrent Neural Network Grammars
unsupervised version of RNNG, use amortized variational inference for training. Generate network is RNNG generate part, inference network is a CRF parser. Sample tree from inference network to generate network for training.
4.21-5.19
A Structural Probe for Finding Syntax in Word Representations
Find syntax information from word representation like ELMo or Bert by a linear transformation matrix.
Text Generation from Knowledge Graphs with Graph Transformers
Propose a dataset and generate multi-sentence text from the output of automatic information extraction systems. Show the useful for structured data format.
Learning What and Where to Transfer
Intresting Work This work consider a problem: what and where information should transfer from source network to target network. They propose a meta-learning method learning a matrix of weight $W_{m, n}$ from source layer $m$ to target layer $n$. The performance significant improved.Latent Variable Model for Multi-modal Translation
Obeyed Gaussian distribution hidden continuous variable $z$ generate by source sentence $x$. From $z$ we want to generate image $v$. From $x, z$, we generate target sentence $y$. This work separate $v$ and $x, y$, so the model can be trained the model with image but also can work (inference) without image. To train their model, they use amortized variational inference.