Weekly reading summary


  • A Structured Self-attentive Sentence Embedding.

    matrix attention + penalty

  • Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences

    discretized structure, learning by reinforcement learning

  • Neural Language Modeling by Jointly Learning Syntax and Lexicon

    soft tree, syntax distance, language model. and can be used to do grammar induection

  • Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

    use syntax distance to do parsing task

  • Max-Margin Markov Networks

    objective of Markov Network change to margin based. But different from Structure SVM loss.

    TODO: not understand, should to read again.


  • Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction

    Objective: Mutual Information Maximization. It also can be seen as a general case of brown clustering. It maybe can be use to parsing.

  • Learning Deep Compositional Grammatical Architectures for Visual Recognition

    Using and-or-grammar idea to build network. And node is composition, or node is add, terminal node is CNN. It is not a adaptive network, that means it is a fixed structure.

4.8- 4.21

  • Learning deep representations by mutual information estimation and maximization

    mutual information maximization, also add local structure information. Show highly improvement over current SOAT on image classification task.

  • Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders

    Constituency grammar induction by autoencoder. Calculate inside score and outside score by TreeLSTM and bilinear projection.

  • Unsupervised Recurrent Neural Network Grammars

    unsupervised version of RNNG, use amortized variational inference for training. Generate network is RNNG generate part, inference network is a CRF parser. Sample tree from inference network to generate network for training.


  • A Structural Probe for Finding Syntax in Word Representations

    Find syntax information from word representation like ELMo or Bert by a linear transformation matrix.

  • Text Generation from Knowledge Graphs with Graph Transformers

    Propose a dataset and generate multi-sentence text from the output of automatic information extraction systems. Show the useful for structured data format.

  • Learning What and Where to Transfer
    Intresting Work This work consider a problem: what and where information should transfer from source network to target network. They propose a meta-learning method learning a matrix of weight $W_{m, n}$ from source layer $m$ to target layer $n$. The performance significant improved.

  • Latent Variable Model for Multi-modal Translation

    Obeyed Gaussian distribution hidden continuous variable $z$ generate by source sentence $x$. From $z$ we want to generate image $v$. From $x, z$, we generate target sentence $y$. This work separate $v$ and $x, y$, so the model can be trained the model with image but also can work (inference) without image. To train their model, they use amortized variational inference.