@rohanpaul_ai @GoogleResearch It's a special case of meta

Maria Sandu

@rohanpaul_ai @GoogleResearch It's a special case of meta learning. See, e.g., the 2001 paper by @HochreiterSepp [1]: an LSTM uses gradient descent to learn a test time algorithm for quadratic functions that is much faster than gradient descent. No weight changes :-) Gradient descent can be used to learn a

din zilele anterioare