With the advances of the contemporary computer technology, the complexity grows significantly in both hardware architecture and software application. In order to meet the performance requirement of target applications, more and more emphasis is put on the compiler techniques to exploit both hardware and software parallelism. Scheduler, an important compiler component to allocate operations to hardware resources, is crucial to the success of a computing system. In this thesis, several novel scheduling optimization techniques are presented to address the challenge faced by existing computing architectures and applications. The first targeted architecture is a system with memory hierarchy and processor comprising multiple processing and memory units. Loop partition scheduling technique is proposed to take advantage of the memory hierarchy and effectively hide the memory access latency for the loop-intensive applications. The concept of balanced partition schedule is presented to achieve the best memory access latency toleration and hardware resource utilization. Various extensions of the base problem are studied in depth. The solution are presented for the system model with multiple-level memory hierarchy, memory size constraint and loop model with initial data and multiple nested loops. Multiple cluster architecture becomes more and more popular due to its superiority over centralized architecture. Inter-cluster communication, achieved by explicit register-to-register move, is compiler-controlled and invisible to the programmer. The thesis proposes an efficient scheduling algorithm which take into account ILP, register file size and inter-cluster communication constraints. Furthermore, the solution is completed by deliberate the effect of distributed caches. The consideration of data spilling, cache conflicts and cache communications are integrated into the algorithm. Another target architecture is multi-bank memory architecture, which brings the scheduling complexity and difficulty of variable partitioning. The approach in the thesis not only improves the existing techniques when exploiting the parallelism, but also considers the serialism to take advantage of multiple operating modes of the memory banks. By identifying the best tradeoff between parallelism and serialism, both goals of performance and energy saving can be achieved. A novel memory access graph model, which captures both information of parallelism and serialism, forms the basis for this scheduling approach.