We propose MCOW (Multi-subjects Cyclic-One-Way Diffusion), a training-free framework for compositional text-to-image generation that overcomes the limitations of traditional one-shot diffusion models. MCOW follows a Generate-then-Compose paradigm by first generating individual object images and then composing them based on spatial layouts. Unlike prior layout-to-image methods, MCOW decouples content generation and spatial arrangement, enabling stronger attribute binding, object localization, and numeracy capabilities. Experimental results on the T2I-CompBench benchmark demonstrate that MCOW outperforms existing baselines in shape attribute binding and texture attribute binding. Additionally, we show that MCOW can be seamlessly transferred to domain-specific diffusion models such as DiffusionSat for satellite image synthesis. Despite its effectiveness, MCOW is limited by its reliance on DDIM-based diffusion models and its assumption of context independence during subject generation. We discuss these challenges and highlight potential future directions to further improve compositional generalization in generative models.