Between Pixels and Prompts: Exploring Nano Banana Pro in ComfyUI Through an Archviz Lens
- Vladyslav Alyeksyenko
- 3 days ago
- 15 min read
Lately, I’ve been thinking a lot about the evolving space between art and automation - that tension is fascinating - craftsmanship and computation... In so many fields artists have felt the ripples of change coming to them with slow but steady inevitability. Some faced it with rejection, some embraced it completely. My workflow in architectural visualization has been comfortably traditional for years: modeling in 3ds Max, rendering with Corona or V-Ray, and polishing in Photoshop. Every step has been deliberate - each light particle, reflection, and shadow painstakingly tuned to create realism and emotion. It’s a rhythm I know well, one that carries a satisfying sense of control, a feeling of complete connection between me and a 3D space I am trying to simulate.
Then came Midjourney, which was very imaginative and impressive. But still, it was only possible to use that for concept generation - in architectural visualization, a lot of attention is given to precision, and that is exactly the thing Gen AIs were really bad at. Were... Later we got a chance to use Firefly AI in Photoshop and it certainly added a new dimension to our workflow, but just a bit - it was far from precise and took several generations until you got something you were looking for that blends in well with your scene. It still is that way nowadays, but slightly better. I use it almost all the time to create difficult elements in the scene - like animals or small details. After 15 trials I get what I want.
So when I first heard about Nano Banana Pro, I was skeptical but curious. The promises were big - great precision when following complex prompts, quality image refinement and most importantly, something what every AI before it really sucked at, ability to preserve majority of the structure of the image you are adjusting. It all sounded impressive, maybe a little too impressive...
My suspicion gauge was going up and a hype this tool generated didn't help. You see, I am a contrarian, I revel in opposing the majority people's perception, but I rarely do so without doing a proper research with a goal to approach this topic scientifically. Tune out of the noise you hear on the internet and just conduct an experiment. After all, I wanted to give it a fair shot. Not to confirm or debunk the hype, but to see if it could honestly find a place inside an established archviz workflow.
After some trials I put this tool on the shelf, waiting for its time. A few days later, my friend, who is a great multidisciplinary 3D artist and, above all, an amazing friend. He also dips his toes into archviz once in a while and when he told me I should definitely take a look at this I met his suggestion with a healthy dose of "we will see..." but I took his recommendation seriously and I wanted to put both of us on the same page in terms of understanding of the capabilities of this tool and whether it lives up to the hype. But the only way to do it is through a rigorous experimentation. And that is when I came up with the idea to test Nano Banana Pro against my own skills. I think of myself as a reasonably experienced archviz artist and thought that it would be a fair challenge. In fact I thought I would completely crash it in terms of quality, but not speed, that's a given. Little did I know, that the task wasn't as easy as I thought.
Experiment
Test 1: Taking the old render and changing the season with Nano Banana Pro (Gemini) in Photoshop. I begun by taking my old render I did about 2 years ago and asking Gemini in Photoshop to change the season from Winter to Autumn.


Right away, I was genuinely impressed - it understood the assignment better than I expected. The ice had melted into water, the trees and bushes kept their original positions from the old renders, and there were plenty of leaves - maybe too many, but fair enough since the prompt wasn’t very detailed. I especially liked the lovely piles of leaves gathered in the boat. Of course, the first impressions honeymoon period was over when I started zooming in on the building's details and interior. Suffice to say, I was not impressed by that, but I knew that if it was quite precise with other parts of the image, I could just mask it out. But then I noticed another problem that got my blood boiling. Generated images from Gemini for some reason change proportions - slightly, but they do. It doesn't blend newly generated information well into the original image, which Midjourney and Firefly do really well. This is not a complete deal breaker, but it makes integration of certain parts of the generated image a more messy and time-consuming process than I would have liked. Also, over time I felt that the image was looking too much like kitsch; the previous winter scene was more serene and subtle, while this was overly dramatized and exaggerated, but I am sure you can adjust it with a prompt.
In the end of this first test I came out of it with a sense that this tool is still too raw to be properly applied into a very precision- and feedback-driven workflow, just like I thought Midjourney was. I strongly believed that Nano Banana's place is not at the top of Gen AI for precision-focused generation of concepts and nothing more. After all, mood of the image and atmosphere are great and can be used to guide you through the necessary steps in your traditional 3D workflow. That was my conclusion at the end of Test 1, but no good scientist does just one experiment and goes away. Just because it confirmed my previous observations doesn't mean this is where it ends; in fact, knowing my previous bias, I had to do more tests. And I did... Conclusions based on the test 1:
Messy integration for post-production, better use Firefly, especially for small parts of the image
Amazing for concept generation
Now it was time to get serious...
Test 2: Produce a Autumn themed render in 3D based on the Winter scene, maintaining precision, context awareness, and aiming to avoid kitsch depictions of Autumn and then try to replace a large part of the image that would be generated by Gemini.
Yes, I decided there wouldn't be a better opportunity to test my skills, practice them, create a content for my blog and socials then doing it this way. I began by cleaning the scene from the previous assets.

I did want to maintain the same positions and shape of the trees as in the original, but naturally, my assets for wintery trees and autumny trees are going to be different, and adding leaves and removing snow from existing 15-25 different models of trees and bushes would prove quite laborious, so I just decided to go with maintaining the positions. I also gathered a bunch of materials and assets that would fit the theme and cracked on. I also used a previously generated image as a reference for some things and it inspired me to change the roofing material. Almost right away I locked on this sky - I thought that a slight golden hour, with bits of blue sky and denser clouds, is going to work well and distance myself from tacky depictions of Autumn I got from AI.

Like in any rendering process, or even painting, once the composition is set, it is time to get to details. What I was missing were leaves in the boat, more vegetation in the foreground, better work on the background and of course, switching back all the interior. I had to keep going, and this is one of my favorite parts in the process of making renders.
But... Before I proceeded with the fun stuff, I had to conclude Test 2. A bit early you would say, but if we are to find uses for AI to augment our workflow and make it faster it is good to see how it would behave when I give it little definite visual information. Naturally, AI will do a great job when you did most of the thinking for it, but I needed to see at which point can I use AI without a loss of quality I was looking for. And I did, I tried to replace a sky and to make it more warm, yet subtle, and to take into account a reflection in the water.

Naturally, it did just what I asked it, but I thought the sky was too much, too warm - it is sunset. Generating more skies is going to cost you, and you might cycle through maybe 6 until you find the one you like - that would cost almost $1. So perhaps it is not the best application of this tool. Also, previous problems still plagued this tool - matching the position and masking things out over precise edges is a pain. At this point I was really thinking that application of Nano Banana exhausts itself at concept creation; there is no way I would want this kind of output with blurry edges of masked trees and strange unrealistic sky on my portfolio. I would not send that to the client. But we still had to continue; we have renders to finish and I had an idea where I could leverage strengths of this generative tool to my advantage. Conclusions based on the test 2:
Still messy integration for post-production
Good to test the mood
Naturally, I put that image to the folder to save it for this blog and continued my rendering process.

This was the time when I was still not exactly happy with the environment and depth of field, so I was experimenting, moving things around, adding and removing stuff. But I could already "feel" the autumn.

Here I decided to take a look back at my old winter render and think what was making it work well. I tweaked the facade materials to be darker since it created the weight and framing necessary for emphasizing the subject. Also, I decided to go with more of the lakeside vegetation, but overdid it a bit, so I tweaked it later. I was getting really happy how it was shaping up.

The final tweaks were mostly about leaves on trees - they needed more translucency, and the leaves in the boat that I scattered myself - not so great, but it is a basis from which I can build later in Photoshop. Now, going further I knew I needed to adjust a bunch of things in post-production, and it primarily concerned a few parts of the building and vegetation.


Solid stuff, right? I could send it to the client and probably (you never know) make them very happy with the result. But here is where I figured it is time to have some fun. I set up Nano Banana Pro within the ComfyUI ecosystem and went on experimenting, seeing what it can do, and more importantly, what I can use in my final image. I began with what I needed and planned right away that I will add to my image - fog and leaves scattered in the boat.
Test 3: Use an autumn render made in 3D and add to it elements like fog or leaves and integrate it seamlessly to the final image. The result should look as little as possible like Gen AI.
Here I actively used ComfyUI and it is a very nice tool. Quite easy to use. I made all generated images in 2K but keep in mind that my render resolution was 3K. Another great opportunity to test how AI-generated content behaves and looks if we don't work in native resolution. I could use the 4K option and scale down, but why bother and spend twice the amount of money?



Now, with the fog, you can see how it changed based on the prompt. One large problem I realized while working with any generative AI - and Gemini is not an exception - is that describing things in quantitative words in a prompt is hit or miss. For example, how can AI interpret "a bit of", "some", "medium amount", "a few" etc.? If I know the exact number of things I can write it and maybe AI will get it right, but if I need to "feel" my way into getting it right the procedure becomes more difficult and has a lot of variations. Arguably, in 3D, it is similar but at least you can put a number on an amount of leaves you want to scatter, their density, collision etc.; with the fog, you can control where it ends, how dense it is, its color and other stuff. In 3D software it feels like alchemy, where the outcome is more or less predictable based on the numbers and their relation to each other and you can replicate it, while in Gen AI it feels like wild magic, where everything you do may have only some degree of replicability. And that is what makes AI Gen less attractive in an Archviz workflow. If you submit this image to the client - he loves it and wants to keep the fog as is but also he just got a few corrections to the building structure - you have to generate the fog again. And he may not like it the next time you do it.
Ok, next test, now I need this boat with leaves, but not a crazy amount. It has to look unkempt.



Here I aimed at precision that is contextually appropriate. The project is in Toyama, Japan, so it has to have vegetation that at least is cultivated there or can grow. In this case Japanese Maple has leaves that are distinct from maples we might see in other corners of the world. Without that distinction the AI naturally would not even consider importance of the context if it doesn't know it. I will make you pals here if you would want it. But even with the instruction, it failed at specificity of maple leaves and made Gleditsia's leaves too large. Not what I was looking for, but overall look and the way it blends with the environment is great. So, I used the first image with many leaves and did some of my own adjustments to reduce the density and remove strange leaves. Btw, note the same problem as with the fog - how much is too many leaves from the perspective of Gemini? Apparently this number is somewhere between 10 and 20 because it added about 5 more leaves in the last prompt. And next time I will ask it the same prompt, I will get a different number.
So, now that I got what I was looking for I started thinking of what else I can contribute to the image and I thought that it would be great to add some fish to the lake.

Oh boy, that's a catch! If this kind of fish swam in this lake I would not come near it - it could eat a dog. But hey, that ripple at the left part is pretty neat; I will take it.
Ok, I was happy. I did not need any more elements in my image that I could not make with Firefly, so I just started messing around.

This time I had to make the same test with the sky, but this time with the finalized image (almost).
This time the results were better, but it didn't really save me any time - in fact I spent more of it. Unless the client asked you to have a different sky; in this case I see great potential in terms of saving yourself time in re-rendering and multiple tests.

This one turned out well aesthetically, but filled the trees in the area where the access to the house is, so nope, we are skipping that.

This one I liked, certainly simulating it in 3D would have take me an absurd amount of time, while this just did it in 30 seconds.
Final result:
Alrighty, time to take a look at what I did in the end after merging some generated elements any my render.




The goal of this experiment was to find the way I can integrate Gen AI in my workflow seamlessly. At all costs I must avoid the image looking like AI slop; I will not sink so low as to deliver an image that was heavily modified by AI - in the end, this image is 95% my work, of which ~80% is what I did myself in 3D or Photoshop. The rest is additional tweaks and handy solutions to complex problems that happen to use AI. And the goal of a quality image must evoke a feeling of presence and emotions.
"You have to feel that slight cold breeze, hear those crows and smell the humid air as you are looking at a quality autumn render. Perhaps you will sense a bit of nostalgia or sudden comfort, or maybe you hate Autumn and you feel disgust, rage - really anything, but the image can't leave you indifferent."
© Vladyslav, ZenViz render mage
**Conclusions based on Test 3:**
You can integrate Gemini AI-generated elements in your image even if they are only 2/3 of your final resolution.
With patience you can live with the strange proportion problems of Nano Banana Pro but keep in mind that it will take time to mask it well and it may simply not work in the end.
Changing the lighting, sky or atmosphere of the finished image is easier and faster than ever, but it is far from perfect and you should probably never rely on giving your image 100% AI treatment - you will lose details and make your image look more like AI slop. Mask things that work and mask away garbage.
If you doubt whether or not to make some element in 3D or do it in Gen AI - stop where you are currently in the project: render what you have and shove it in the Nano Banana furnace. Test and find out while working on the image.
You still have to know a lot about composition and lighting to give AI something to work off of. Otherwise, it will not do a good job.
Be aware that things that are hard to describe in words would be interpreted by AI in an unpredictable way. Leverage it when planning what to do in your project.
Generating small parts of the image is not as efficient; if you already own Photoshop and use Firefly, it will be faster, cheaper, and while not necessarily great from the get-go, you know you will get there.
Will every project benefit or lose from using AI? No, it depends on your conditions, deadline, client and quality of the outcome.
Is Gemini, Nano Banana Pro, significantly better than what we had before in the field of Gen AI? Sort of, I can't say a definitive yes until the proportions problem is solved. But it is certainly a step towards creating more precise generative images which was lacking in the field previously.
Will I, ZenViz Studio 3D artist, integrate it in my workflow? It will depend on the project, but I will be considering it next time I am doing renders for the client.
How will it reflect on the pricing of the architectural visualization images? I don't know about others, but I will base it as I always did - depending on the amount of time I spent on the project - it is a direct and fair way to be rewarded for the time worked.
As you can see, my takeaways are far more informed than they were from the previous 2 tests. Perhaps because they were too shallow and needed much more than what I did, but I am sure I did my best on this 3rd test, and yet, I am only beginning. I am looking forward to test 4 and test 15 etc. I do hope that this tool becomes better and will not be the end of our civilization.
My recommendation to all 3D artists: ignore the hype, it is always shallow, sometimes manipulative and may lead to the downfall of you as a professional artist. But ignoring the tool that is hyped completely will not do you any good either. Your peers, even those who did worse images previously, will fully embrace this technology, pump out tons of cheap, unimaginative AI slop that their clients will eat up, because it is 3x cheaper than what you will do. But you are different, you don't want to even touch those tools because you think they compromise your work, make it less creative, fun or you are contributing to Gen AI using (some would even say stealing) artworks of real people (I disagree by the way, the closest thing I can compare it to is digital piracy - no physical object was stolen, but only its digital copy). Ok, cool, I may not doubt your ability to create amazing images, but don't be surprised that people will outcompete you in the market, take your clients and leave you with no money and a sense of resentment in a few years. You can't afford to be a Luddite in the heavily commercialized world of Archviz. If you are in the top 1% of best archviz artists you probably have nothing to worry about and you can stick to your workflow, but if you aren't, you may want to at least consider looking at what this tool can do for you or seek a more artistic, less commercial field within the 3D world.
Cheers to everyone and have a lovely day!