The first thing to keep in mind here, which you seem to be catching on to, is that the formula used to calculate a creature’s default CR is quite wonky and tends to produce wildly inaccurate results that very frequently bear no relation to how actually challenging the creature is. Thus, if you are making use of CRs for anything important in your module, it is generally a good idea to manually set creatures’ CR to something more appropriate than the default.
In addition to assigning accurate CRs, encounters also need a wide range of creatures with lots of different CRs in order to pick out an appropriate set and scale reasonably. To take an extreme example to make the point, if the only creatures an encounter could pick were a CR 40 and and a CR 1 creature, it would have a hard time coming up with an appropriate spawn for a Lvl 10 adventurer. What you would want (in an encounter aimed at Lvl 10 players) is a variety of creatures with CRs spanning something like 8-12.
Assuming CRs have been set to something reasonably close to the creature’s actual challenge, and that there is a wide range of CRs among the available creatures, the encounter system can actually do a pretty good job of scaling encounters. Unfortunately those preconditions are not always that easy to meet.
For your purposes, it is not clear that you actually need to use the encounter system, since the point of it is to scale encounters to match the number (in multi-player) and level of PCs. For a single player module linear enough that the PC’s level at any given point is predictable, using non-scaling encounters of creatures either pre-placed or spawned by a simple custom script could work as well or better.
The CR actually does take items into account, but like most everything else it does so in a wonky way, so typically you need to give creatures very powerful magical items to notice much difference. Among other oddities, a single item with lots of magical properties will tend to raise a creature’s default CR much more than the same properties spread out among multiple items. However, I think the encounters only take the number and level of PCs into account, which could be an important consideration. For example, in a high magic module where PCs have access to very powerful gear, the encounters spawned will tend to be too weak.
Ultimately, to have well desgined encounters, you need to do a lot of playtesting, ideally with a wide variety of different builds, to find out in actual practice how hard they are, and tweak the number and stats of creatures spawned to adjust the level of challenge to what you want.