I took 101 photos with my phone of a Pinus pinea (Stone Pine) cone on a rotating table, and I’m surprised by the appearance of the point cloud vs the textured model.
Suggestions as to why the PC is so messy?
It doesn’t, and as you’ve found out that you can get a reconstruction regardless. But a few factors make it non-ideal (and probably part of the reason for the strange point cloud, I’m guessing):
Varying lighting/shadows due to the light source in the room being stationary but the subject rotating (it will confuse point detection and matching)
Lack of scale variation (your camera is kept at a fixed distance from the subject, not ideal for camera optimization)
This does not mean you cannot use a turn table, but you have to be careful with your lighting (best would be to have a light source follow the subject) and you’re missing out scale variation (so hopefully our database has a good initial estimate for the focal length, but it’s not always the case).
It’s probably not quite that bad due to hand holding the camera, rather than using a tripod, so distance does vary a bit.
I’ve just taken 61 photos of it now that it has opened, this time a fixed pine cone and moving camera. Also used the phone camera flash this time, for more even lightning. I’m still getting a weird point cloud, although not so bad. With textures added there are a few floating bits around the top, some perhaps made up of out of focus grass and ground, about half a metre below the pine cone, and some looks like it might be parts of the cone, on the left side.
I would add to this static background that gets reconstructed in strange ways. You can compensate for this last one with masking, and I’m guessing this is your main challenge in the above. The python library REMBG works moderately well if you want to be lazy about masking.