Is it all transferred through the router?
Yes.
Or the client and the server establish a direct connection and streaming?
No - at least not in Wi-Fi terms. Of course the source and sink devices establish a connection with each other, but the actual data packets travel (for example) Source---Router---Sink.
Bear in mind that in Wi-Fi networks, "only one thing at a time can transmit" so, in the example I've cited, the Source---Router and Router---Sink transmissions cannot occur simultaneously. And that's competing with all your other Wi-Fi devices (and any neighbours on the same radio channel if they are close enough) too.
The more Wi-Fi devices you have in your locale, the more "competition" (it's anything but "fair," though the protocols try to give everything an equal chance,) there is for "air time."
Another thing where is the list of PC.s and shared files stored? Is it centralised in the router or in a pc, or each time a scan is requested all nodes receive the question and answer?
A bit of both. In a Windows network, something called a "Master Browser" is established in one of the hosts which maintains a list of all the Windows hosts connected to the network. Which host functions as the MB is determined by an "election," whenever one can't be found (for example, it's been turned off.) Over time the MB almost inevitable ends up being hosted in something that is "on" all the time such as a server. Some SOHO Routers can also function as the MB.
I don't recall whether it's mandatory for hosts to consult the MB and it's possible that later versions of Windows employ other mechanisms (I haven't "kept up" with the evolution of the technology.)
Other (non Windows) hosts might "advertise" their presence by other means - my media streamer "shouts" about itself once a second or so. IIRC Apples "Bonjour" protocol, Skype hosts, uPNP (to name just a few) do something similar.
There's nothing that catalogues the files stored within the hosts, each hosts maintains that for itself and responds whenever it's interrogated. Some OS's, especially servers, cache the indexes in RAM for faster response time.
There's no reason why client devices couldn't try to "scan" the shares and compile it's own catalogue. My NetGear media streamer can do such (not that I ever use the feature) - it scans all the shares it "knows" about (has learned or been taught) once a day. But again, such behaviour is not mandatory.