Home

Hierarchical softmax

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Intro In practice, hierarchical softmax tends to be better for infrequent words, while negative sampling works better for frequent words and lower-dimensional vectors. Hierarchical softmax uses a binary tree to represent all words in the vocabulary. 2...

Read more

GloVe(Global Vectors for Word Representation)

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Intro Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014 2. Previous Method Count-based and rely on matrix factorization (e.g. Latent Semantic Analysis (LSA), Hyperspace Analogue to Language (HAL)) Effectively leverage gl...

Read more

Evaluation of Word Vectors

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Intrinsic and Extrinsic Evaluation $\checkmark$ Intrinsic evaluation Evaluation on a specific, intermediate task Fast to compute performance Helps understand subsystem Needs positive correlation with real task to determine usefulness To train a ...

Read more

Continuous Bag-of-Words (CBOW)

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Intro How can we predict a center word from the surrounding context in terms of word vectors? One approach is If we treat {“The”, “cat”, ’over”, “the’, “puddle”} as a context and from these words, it will be able to predict or generate the center word “jum...

Read more

CS224N W5. Self attention and Transformer

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Issue with recurrent models $\checkmark$ Linear interaction distance RNNs take O(sequence length) steps for distant word pairs to interact. Reference. Stanford CS224n, 2021, Before word 'was', Information of chef has gone through O(sequenc...

Read more

CS224N W4. Machine Translation Sequence to Sequence And Attention

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Statistical Machine Translation, SMT (1990s-2010s) “Learn a probabilistic model from data” it want to find best English sentence y, given French sentence x. \[argmax_y P(y\lvert x)\] Use Bayes Rule to break this down into two components to be learned sep...

Read more

CS224N W3. RNN, Bi-RNN, GRU, and LTSM in dependency parsing

All contents is arranged from CS224N contents. Please see the details to the CS224N! 1. Language Model Language Modeling is the task of predicting what word comes next. the students opened their [——] → books? laptops? exams? minds? A system that does this is called a Language Model. More formally: given a sequence of words $x^{(1)}, x^{(2)},...

Read more