## 3.25-3.31

• A Structured Self-attentive Sentence Embedding.

matrix attention + penalty

• Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences

discretized structure, learning by reinforcement learning

• Neural Language Modeling by Jointly Learning Syntax and Lexicon

soft tree, syntax distance, language model. and can be used to do grammar induection

• Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

use syntax distance to do parsing task

• Max-Margin Markov Networks

objective of Markov Network change to margin based. But different from Structure SVM loss.

TODO: not understand, should to read again.

## 4.1-4.7

• Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction

Objective: Mutual Information Maximization. It also can be seen as a general case of brown clustering. It maybe can be use to parsing.

• Learning Deep Compositional Grammatical Architectures for Visual Recognition

Using and-or-grammar idea to build network. And node is composition, or node is add, terminal node is CNN. It is not a adaptive network, that means it is a fixed structure.

## 4.8- 4.21

• Learning deep representations by mutual information estimation and maximization

mutual information maximization, also add local structure information. Show highly improvement over current SOAT on image classification task.

• Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

Constituency grammar induction by autoencoder. Calculate inside score and outside score by TreeLSTM and bilinear projection.

• Unsupervised Recurrent Neural Network Grammars

unsupervised version of RNNG, use amortized variational inference for training. Generate network is RNNG generate part, inference network is a CRF parser. Sample tree from inference network to generate network for training.

## 4.21-5.19

• A Structural Probe for Finding Syntax in Word Representations

Find syntax information from word representation like ELMo or Bert by a linear transformation matrix.

• Text Generation from Knowledge Graphs with Graph Transformers

Propose a dataset and generate multi-sentence text from the output of automatic information extraction systems. Show the useful for structured data format.

• Learning What and Where to Transfer
Intresting Work This work consider a problem: what and where information should transfer from source network to target network. They propose a meta-learning method learning a matrix of weight $W_{m, n}$ from source layer $m$ to target layer $n$. The performance significant improved.

• Latent Variable Model for Multi-modal Translation

Obeyed Gaussian distribution hidden continuous variable $z$ generate by source sentence $x$. From $z$ we want to generate image $v$. From $x, z$, we generate target sentence $y$. This work separate $v$ and $x, y$, so the model can be trained the model with image but also can work (inference) without image. To train their model, they use amortized variational inference.