It would work exactly how it does now. When a player performs an action (shovels a hole, shoots a magic wall, opens a door, etc.) it notifies the server. The server determines if the action is permitted, then notifies the player (and any surrounding players/creatures if needed). You can edit the map in the client's memory right now, make it think there's a hole where there isn't, but you can't go down that hole because the server knows there isn't a hole there. How is that any different?
Yes, but just because they can see these areas doesn't mean they can figure out how to access them. The map that is distributed with the client doesn't have to contain the same information that the master map on the server does. The client map doesn't have to store whether this tile is pickable, or that this lever controls that object, or where a teleport goes to. Besides, the map data from the 7.72 server files are public now, and map trackers have been around for quite some time. CipSoft's map isn't a surprise anymore.
Have you ever done any type of programming before? Loading the entire map is terrible programming. CipSoft doesn't load the whole sprite file, they open it and load information as needed. Why wouldn't they do the same thing with the map? How do you think games like World of Warcraft work that have the map files locally? Do you actually think they load the whole map file? In case you don't know, the answer is no. No, they do not because that would be terrible and inefficient programming, and probably not even possible with the size of their map.
I have seen login packets from CipSoft's game servers contain almost 10,000 bytes of data (mostly depending on what's on screen where you log in). That's quite a bit of data, and the client has to parse all of it. 16 tiles? Actually, it's 18. Yes, it's hard to believe, but if you look at a map packet, they send up to 18 tiles per x-axis, and 14 per y-axis. Also, it doesn't just send data about the floor you're on. If you're on the ground floor or higher, it sends data for the ground floor all the way up to the top floor (8 floors of data; floors 0-7). If you're underground it sends data for the two floors below you, your floor, and the two floors above you. Sure, it could be as little as 18 tiles, but it could be as many as 144 tiles. Each item contains as little as 3 bytes (id [2 bytes] and mark [1 byte]), or up to 5 bytes (id, mark, count/data [1 byte], and animation phase [1 byte]). That means you're looking at anywhere from 54 bytes to 720 bytes for that one movement (assuming each tile contained only one item). Now let's imagine a full map packet: 18 tiles wide, by 14 tiles high, by 8 floors; that's 2016 tiles. If every tile contained just one item object, that would come out to 10,080 bytes of data.