## 3.25-3.31

A Structured Self-attentive Sentence Embedding.

matrix attention + penalty

Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences

discretized structure, learning by reinforcement learning

Neural Language Modeling by Jointly Learning Syntax and Lexicon

soft tree, syntax distance, language model. and can be used to do grammar induection

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

use syntax distance to do parsing task

Max-Margin Markov Networks

objective of Markov Network change to margin based. But different from Structure SVM loss.

TODO: not understand, should to read again.

## 4.1-4.7

Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction

Objective: Mutual Information Maximization. It also can be seen as a general case of brown clustering. It maybe can be use to parsing.

Learning Deep Compositional Grammatical Architectures for Visual Recognition

Using and-or-grammar idea to build network. And node is composition, or node is add, terminal node is CNN. It is not a adaptive network, that means it is a fixed structure.

## 4.8- 4.21

Learning deep representations by mutual information estimation and maximization

mutual information maximization, also add local structure information. Show highly improvement over current SOAT on image classification task.

Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

Constituency grammar induction by autoencoder. Calculate inside score and outside score by TreeLSTM and bilinear projection.

Unsupervised Recurrent Neural Network Grammars

unsupervised version of RNNG, use amortized variational inference for training. Generate network is RNNG generate part, inference network is a CRF parser. Sample tree from inference network to generate network for training.

## 4.21-5.19

A Structural Probe for Finding Syntax in Word Representations

Find syntax information from word representation like ELMo or Bert by a linear transformation matrix.

Text Generation from Knowledge Graphs with Graph Transformers

Propose a dataset and generate multi-sentence text from the output of automatic information extraction systems. Show the useful for structured data format.

Learning What and Where to Transfer

Intresting Work This work consider a problem: what and where information should transfer from source network to target network. They propose a meta-learning method learning a matrix of weight $W_{m, n}$ from source layer $m$ to target layer $n$. The performance significant improved.Latent Variable Model for Multi-modal Translation

Obeyed Gaussian distribution hidden continuous variable $z$ generate by source sentence $x$. From $z$ we want to generate image $v$. From $x, z$, we generate target sentence $y$. This work separate $v$ and $x, y$, so the model can be trained the model with image but also can work (inference) without image. To train their model, they use amortized variational inference.