

i bought an original cartridge and played it on the vcs i iherited from dad
i bought an original cartridge and played it on the vcs i iherited from dad
overactors trying to out-overact each other. Love it!
i still enjoyed the crap out of it. Sometimes zoning out and just running around collecting stuff is just what I need.
localhost is “this device”.
connecting to localhost means connecting to something running on the same machine.
Browsers generally block connections to other domains (ex if you’re on google.com, the browser won’t simply let the site contact amazon.com willy-nilly).
But localhost is your own machine, so it is usually “trusted”. Facebook exploited this fact to exfiltrate data from the browser to the other apps running on your own phone, which would, in turn be free to do with it as they please, because they’re not the browser
he was forced to release it quickly to coincide with the film’s release. For comparison, it used to take a team of devs a couple of months to make a game. He had 6 weeks.
Also, if you read the manual, this essentially never happened to you. It was easy to avoid.
You also needed to read the manual. The game did stuff that other games at the time didn’t, for example, a contextual button. You couldn’t know what would happen unless you read the manual to learn what the icons meant. A lot of people never did and so decided that the game was bad.
I don’t see a ball in any of the nets. So there are zero goals in that image
when climbing out of the pit, it was very easy to immediately fall back down (due to the pixel-perfect collision detection).
And here is an excerpt from the manual: “Even experienced extraterrestrials sometimes have difficulty levitating out of wells. Start to levitate E.T. by first pressing the controller button and then pushing your Joystick forward. E.T.'s neck will stretch as he rises to the top of the well (see E.T. levitating in Figure 1). Just when he reaches the top of the well and the scene changes to the planet surface (see Figure 2), STOP! Do not try to keep moving up. Instead, move your Joystick right, left, or to the bottom. Do not try to move up, or E.T. might fall back into the well.”
it was actually way ahead of its time, for a game. One small bug (the workaround for which was in the manual) ruined its reputation. But I genuinely think it was a good game.
Also written in 6 weeks by one guy. Freaking impressive
I’m partial to this one
you don’t check your brain’s file system regularly?
you wouldn’t be “freezing” anything. Each possible combination of input tokens maps to one output probability distribution. Those values are fixed and they are what they are whether you compute them or not, or when, or how many times.
Now you can either precompute the whole table (theory), or somehow compute each cell value every time you need it (practice). In either case, the resulting function (table lookup vs matrix multiplications) takes in only the context, and produces a probability distribution. And the mapping they generate is the same for all possible inputs. So they are the same function. A function can be implemented in multiple ways, but the implementation is not the function itself. The only difference between the two in this case is the implementation, or more specifically, whether you precompute a table or not. But the function itself is the same.
You are somehow saying that your choice of implementation for that function will somehow change the function. Which means that according to you, if you do precompute (or possibly cache, full precomputation is just an infinite cache size) individual mappings it somehow magically makes some magic happen that gains some deep insight. It does not. We have already established that it is the same function.
the fact that it is a fixed function, that only depends on the context AND there are a finite number of discrete inputs possible does make it equivalent to a huge, finite table. You really don’t want this to be true. And again, you are describing training. Once training finishes anything you said does not apply anymore and you are left with fixed, unchanging matrices, which in turn means that it is a mathematical function of the context (by the mathematical definition of “function”. stateless, and deterministic) which also has the property that the set of all possible inputs is finite. So the set of possible outputs is also finite and strictly smaller or equal to the size of the set of possible inputs. This makes the actual function that the tokens are passed through CAN be precomputed in full (in theory) making it equivalent to a conventional state transition table.
This is true whether you’d like it to or not. The training process builds a markov chain.
no, not any computer program is a markov chain. only those that depend only on the current state and ignore prior history. Which fits llms perfectly.
Those sophisticated methods you talk about are just a couple of matrix multiplications. Those matrices are what’s learned. Anything sophisticated happens during training. Inference is so not sophisticated. sjusm mulmiplying some matrices together and taking the rightmost column of the result. That’s it.
yes you can enumerate all inputs, because thoy are not continuous. You just raise the finite number of different tokens to the finite context size and that’s exactly the size of the table you would need. finite*finite=finite. You are describing training, i.e how the function is geerated. Yes correlations are found there and encoded in a couple of matrices. Those matrices are what are used in the llm and none of what you said applies. Inference is purely a markov chain by definition.
i let the wife do it. She enjoys it, I don’t
“lacks internal computation” is not part of the definition of markov chains. Only that the output depends only on the current state (the whole context, not just the last token) and no previous history, just like llms do. They do not consider tokens that slid out of the current context, because they are not part of the state anymore.
And it wouldn’t be a cache unless you decide to start invalidating entries, which you could just, not do… it would be a table with token-alphabet-size^context length size, with each entry being a vector of size token_alphabet_size. Because that would be too big to realistically store, we do not precompute the whole thing, and just approximate what each table entry should be using a neural network.
The pi example was just to show that how you implement a function (any function) does not matter, as long as the inputs and outputs are the same. Or to put it another way if you give me an index, then you wouldn’t know whether I got the result by doing some computations or using a precomputed table.
Likewise, if you give me a sequence of tokens and I give you a probability distribution, you can’t tell whether I used A NN or just consulted a precomputed table. The point is that given the same input, the table will always give the same result, and crucially, so will an llm. A table is just one type of implementation for an arbitrary function.
There is also no requirement for the state transiiltion function (a table is a special type of function) to be understandable by humans. Just because it’s big enough to be beyond human comprehension, doesn’t change its nature.
yes, the matrix and several levels are the “decompression”. At the end you get one probability distribution, deterministically. And the state is the whole context, not just the previous token. Yes, if we were to build the table manually with only available data, lots of cells would just be 0. That’s why the compression is lossy. There would actually be nothing stopping anyone from filling those 0 cells out, it’s just infeasible. you could still put states you never actually saw, but are theoretically possible in the table. And there’s nothing stopping someone from putting thought into it and filling them out.
Also you seem obsessed by the word table. A table is just one type of function mapping a fixed input to a fixed output. If you replaced it with a function that gives the same outputs for all inputs, then it’s functionally equivalent. It being a table or some code in a function is just an implementation detail.
As a thought exercise imagine setting temperature to 0, passing all the combinations of tokens of input, and record the output for every single one of them. put them all in a “table” (assuming you have practically infinite space) and you have a markov chain that is 100% functionally equivalent to the neural network with all its layers and complexity. But it does it without the neural network, and gives 100% identical results every single time in O(1). Because we don’t have infinite time and space, we had to come up with a mapping function to replace the table. And because we have no idea how to make a good approximation of such a huge function, we use machine learning to come up with a suitable function for us, given tons of data. You can introduce some randomness in the sampling of that, and you now have nonzero temperature again.
Ex. A table containing the digits of pi, in order, could be transparently replaced with a spigot algorithm that calculates the nth digit on-demand. Output would be exactly the same
the probabilities are also fixed after training. You seem to be conflating running the llm with different input to the model somehow adapting. The new context goes into the same fixed model. And yes, it can be reduced to fixed transition logic, you just need to have all possible token combinations in the table. This is obviously intractable due to space issues, so we came up with a lossy compression scheme for it. The table itself is learned once, then it’s fixed. The training goes into generating a huge markov chain. Just because the table is learned from data, doesn’t change what it actually is.
an llm works the same way! Once it’s trained,none of what you said applies anymore. The same model can respond differently with the same inputs specifically because after the llm does its job, sometimes we intentionally don’t pick the most likely token, but choose a different one instead. RANDOMLY. Set the temperature to 0 and it will always reply with the same answer. And llms also have a fixed order state transition. Just because you only typed one word doesn’t mean that that token is not preceded by n-1 null tokens. The llm always receives the same number of tokens. It cannot work with an arbitrary number of tokens.
all relevant information “remains in the prompt” only until it slides out of the context window, just like any markov chain.
writing code that doesn’t need a browser to run on