This thesis explores using a hybrid processing approach for doing application specific memory intensive processing. The hybrid system uses a general purpose processor (GPP) in conjunction with an FPGA-based Image Processor (FIMP) to improve performance for image processing applications. The hybrid architecture is designed to implement an image registration algorithm that partitions the algorithm into separate functions executing either on the GPP or the FIMP system. This thesis explores the trade-offs of different configurations for FIMP architecture, utilizing the flexibility of reconfigurable hardware to achieve maximum performance. The FIMP system was designed in Verilog HDL and implemented on the Xtremedata XD1000 system, which uses an Opteron main processor and an Altera Stratix II FPGA co-processor. Multi-core systems of up to 32 nodes were implemented, using three network topologies: a bus, a ring and a fully connected mesh. Benchmark results for the FIMP system are compared to software execution. For an image registration algorithm using 256x256 gray scale images use of a 16 node fully connected FIMP system as a co-processor produced a 1.65 times speedup over the GPP alone.