• προηγούμενη έκδοση
T.S.I.
  • Αρχική
  • Νέα & Ανακοινώσεις
    • Διαύγεια
    • Θέσεις Εργασίας
  • Το Ι.Τ.Σ.
    • Οργάνωση
    • Προσωπικό
    • Έντυπα
    • Οδηγός Χρηματοδότησης
  • Ερευνητικά Έργα
  • Επικοινωνία
  • English
  • Search
  • Menu Menu

FASTCUDA

You are here: Home1 / FASTCUDA

FASTCUDA

Open Source FPGA Accelerator & Hardware Software Codesign Toolset for CUDA Kernels

  • Χρηματοδότηση: European Commission
  • Κωδικός Έργου: FASTCUDA
  • Πρόγραμμα: SEVENTH FRAMEWORK PROGRAMME (FP7-SME)
  • Προϋπολογισμός: Overall 1.603.596,00 €
  • Ημερομηνία Έναρξης: 1st November 2011
  • Διάρκεια: 24 months
  • Website(s): www.fastcuda.eu – CORDIS

Πληροφορίες

Σύντομη Περιγραφή

Scientific applications such as graphics, biological modeling, molecular dynamics and others, are usually highly parallel and can benefit from specialized hardware to accelerate their execution. For this reason, highly parallel Graphic Processing Units (GPUs) have been traditionally favored over General Purpose Processors for running such applications. In the same way, FPGAs can potentially provide even higher speedups at lower power consumption than GPUs. However, their use is still limited since the path to porting an application onto FPGAs’ custom hardware is often prohibitively cumbersome. Therefore, FASTCUDA facilitates this path by providing a novel methodology, architecture and toolset to automatically port and run already-parallelized algorithms onto reconfigurable hardware. For this purpose, the FASTCUDA methodology utilizes CUDA, a Graphical Processing Unit (GPU) language, which exposes parallelism at source code.

The FASTCUDA toolset splits, with minimal user intervention, application’s code into two parts: one that is compiled and executed as parallel software on an embedded multi-core, and another consisting of multiple special-purpose accelerators that are synthesized and implemented in hardware. A last generation low power FPGA provides the processing power and the logic capacity to implement and execute both parts.

In particular, FASTCUDA is a design methodology and accompanying toolset that allows CUDA programs to be executed efficiently on a shared memory, multi-core CPU communicating with an FPGA-based accelerator. A multi-core processor, consisting of multiple embedded cores (configurable small processors), is used so as to run the host program serially and the SW CUDA kernels in parallel. Threads belonging to the same CUDA thread-block are executed by the same core. The HW CUDA kernels are partitioned into thread-blocks, and synthesized and implemented inside an “Accelerator” block. Each thread-block has a local private memory while the global shared memory can be accessed by any thread following the philosophy of the CUDA model.

For our prototype version, we have used the Xilinx Virtex-6 FPGA with 500MB of external DDR memory placed on a Xilinx ML605 evaluation board, and the multi-core processor consists of an array of Xilinx Microblaze CPUs. However, real products designed with FASTCUDA may also use faster embedded processors such as the ARM Cortex-A9 MPCore.

Στόχοι έργου

In recent years, an observable trend in High Performance Computing (HPC) architectures has been the inclusion of accelerators, such as Graphical Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), to improve the performance of scientific applications. Several applications, ranging from graphics, to biological modeling, molecular dynamics, physics and others, have been successfully ported to GPUs, taking benefit of highly parallel hardware to accelerate their execution. Porting to GPUs, hard as it may be, requires only software skills to code the specific algorithm into parallel multi-threaded software. On the other hand, the path to FPGA development is notoriously more difficult since porting an algorithm to custom hardware is less straightforward, and the simulation-verification-debugging cycle can be many orders longer. For this reason, even though FPGAs’ custom hardware can potentially provide higher speedups at lower power consumption than GPUs, GPU-based solutions dominate the scientific world.

FASTCUDA aims to bridge this gap by taking advantage of the software parallelization effort that has gone into porting scientific applications to GPUs, and utilize it so as to implement FPGA-based systems. FASTCUDA focuses on CUDA, a GPU architecture and programming model initially developed by Nvidia for its line of GPUs, and provides a novel methodology, architecture and toolset to automatically port and run CUDA programs onto FPGA hardware.

Execution starts with the CUDA host program running single-threaded on the host CPU. Whenever a CUDA kernel is invoked, the host CPU dispatches the execution of the kernel to an accelerator (separate device) that supports parallel execution of multiple threads. Traditionally these are Nvidia’s GPUs or other multi-core platforms. However, we prove that even higher performance acceleration, as well as lower power and energy consumption, can be obtained if a computationally intensive CUDA kernel is synthesized into hardware and mapped onto an FPGA for execution. Therefore, FASTCUDA employs a hybrid approach: it uses an FPGA-based accelerator for executing the time critical CUDA kernels and a multi-core processor for executing the CUDA kernels that could not fit in the FPGA fabric.

FASTCUDA is a design methodology and accompanying toolset that allows CUDA programs to be executed efficiently on a shared memory, multi-core CPU communicating with an FPGA-based accelerator. A modern FPGA provides all required resources; multiple embedded micro-CPUs for the CUDA host program and the CUDA kernels that will be executed on the multi-core processor as well as large logic capacity for the CUDA kernels that will be accelerated in hardware. Toward this end FASTCUDA has not developed everything from scratch but it has joined numerous on-going efforts in industry and academia to create a unified efficient open-source framework.

The objectives of FASTCUDA were twofold:

  1. create an innovative embedded system design flow by designing highly efficient components and by taking advantage of numerous open-source ongoing efforts in codesign of embedded systems, both at the academic and at the industrial level
  2. enable an easier transition from research results to industrial exploitation, i.e. standardization of codesign usage

FASTCUDA has successfully defined the new design flow and has provided to the open-community the related toolset. The objectives have been achieved by defining, implementing and disseminating a publicly available platform that takes as input a description of the system in the CUDA programming model, and produces an efficient FPGA-based embedded design that executes certain CUDA kernels in software, while it implements the rest in hardware according to a hardware/software partitioning algorithm that has been developed throughout the project.

In order to fulfill the aforementioned objectives we have built the FASTCUDA platform which is comprised of the following sub-systems:

  • A novel reconfigurable computing (RC) architecture composed of a multi-processor system, shared memory and reconfigurable fabric in order to run the multi-threaded CUDA applications.
  • An advanced high-level synthesis tool which efficiently maps the coarse and fine grained parallelism exposed in CUDA kernels onto the reconfigurable fabric.
  • A compiler framework in order to port the CUDA programming model to the FASTCUDA multi- processor environment.
  • A design space exploration strategy based on profiling, user-driven block partitioning, and analysis by simulation, compilation and high-level synthesis of the quality of each point in the design space.
  • A central on-chip processor that coordinates the execution of the CUDA kernels and executes the main code (referred as host code in the CUDA programming model) of the CUDA application.

The FASTCUDA platform is relatively easy to use through a graphical user interface (GUI) in order to gain wide acceptability by the embedded design community. Especially, as the tool targets the group of designers programming in a high-level and it is critical to speed-up their design time, the factor of having a tool that operates in a user friendly environment is of major importance. This can play an important role to the wide adoption of the tool.

Αποτελέσματα

FASTCUDA’s main target was to derive a high level synthesis toolset in order to efficiently run a CUDA application on a FPGA-based hybrid platform which consists of a multi-core processor and an FPGA accelerator. Throughout the project several tools were developed. A brief description of the main results/foregrounds is the following:

  1. High Level Synthesis tool: A complete software package that takes as input a CUDA kernel, which describes a part of the application and provides as output synthesizable multi-threaded SystemC code and RTL code that implements the exact same functionality with the input.
  2. CUDA to multi-threaded C Compiler: A complete software package that (a) takes as an input a CUDA kernel, which describes a part of the application and (b) provides as output a CPU-based code performing the exact same function.
  3. Multi-core processor: A hardware package that provides a multi-core CPU platform customized for the executions of CUDA kernels.
  4. Εstimation tools: Software packages that given a CUDA description of an application, they provide performance estimation numbers.
  5. Εxploration tool: A complete software package that takes as an input a description of an application in CUDA (including the parts that will be implemented both in hardware and in software) as well as the characteristics of the FPGA-based platform and gives the necessary performance and power estimations for various hardware-software partitioning alternatives to the designer, so as to allow him/her to choose the optimal underlying architecture.
  6. SW-HW bridge and system API: A hardware package that provides the SW-HW bridge between the multi-core and the FPGA accelerator, a software package that includes the SW-HW communication API library.
  7. Since there was no available Xilinx IP core which could provide cache coherency for the FASTCUDA multi-core processor, FASTCUDA built its own HW blocks which provide cache coherency.
  8. Numerous CUDA applications have been developed addressing different application domains from security to bioinformatics.
Εταίροι
  • Ingenieria de Sistemas Intensivos en SW (Coordinator)
  • Politecnico di Torino – Italy
  • Universidad Politecnica de Madrid – Spain
  • Telecommunication Systems Institute – Greece
  • Ardoran OU – Estonia
  • FSRESULT GMBH – Germany

Άλλα Έργα

All 10 /Εθνικά Έργα 2 /Έργα Ε.Ε. 7 /Ιδιωτικά Έργα 1

ΕΛΑΙΩΝ

Καινοτόμα Μεθοδολογικά Εργαλεία για Ιχνηλασιμότητα, Πιστοποίηση και Έλεγχο Αυθεντικότητας Ελαιολάδου και Ελαίας

WMatch

Adaptation of grammars in Greek language, to allow the rewriting of the speech recognition hypothesis into a canonical form

VARCITIES

Visionary Nature Based Actions for Heath, Wellbeing & Resilience in Cities

TRADENET

Transceiver Design for Distributed Wireless Networks

SUN

Social and hUman ceNtered XR

SpeDial

Spoken Dialogue Analytics. Machine-Aided Methods for Spoken Dialogue System Enhancement and Customization for Call-Center Applications

SENTINEL

Bridging the security, privacy and data protection gap for smaller enterprises in Europe

SecOPERA

Secure OPen source softwarE and hardwaRe Adaptable framework

SAFEMETAL

Increasing EU citizen security by utilizing innovative intelligent signal processing systems for euro-coin validation and metal quality testing

RUNNER

Σχεδίαση και ανάπτυξη ενός αυτόνομου ρομποτικού συστήματος υψηλής αντίληψης
Load more

ΣΧΕΤΙΚΟΙ ΣΥΝΔΕΣΜΟΙ

Πολυτεχνείο Κρήτης

Πολιτική Απορρήτου και Cookies

Τελευταία Νέα

  • Έγκριση Πρακτικού Επιτροπής Αξιολόγησης για τη σύναψη μίας σύμβασης μίσθωσης έργου ιδιωτικού δικαίου, στα πλαίσια του έργου με ακρωνύμιο“ REBECCA – No 101097224”, κωδικός Ε.Π.Ι.Τ.Σ. 60047.28 Φεβρουαρίου, 2023 - 9:50 πμ
  • Πρόσκληση εκδήλωσης ενδιαφέροντος για υποβολή προτάσεων, ΑΠ.415/60047.2 Φεβρουαρίου, 2023 - 11:13 πμ
  • Έναρξη του προγράμματος IntellIoT, ένα ευρωπαϊκό ερευνητικό έργο προϋπολογισμού €8εκ. με αντικείμενο τη Επόμενη Γενιά συστημάτων IoT13 Δεκεμβρίου, 2020 - 12:49 μμ

ΔΙΕΥΘΥΝΣΗ

Ερευνητικό Πανεπιστημιακό Ινστιτούτο Τηλεπικοινωνιακών Συστημάτων – ΕΠΙΤΣ

Πολυτεχνείο Κρήτης

Πολυτεχνειούπολη – Κουνουπιδιανά

Τ.Κ. : 73100, Χανιά – Κρήτη

© Copyright - T.S.I. Created by: Median Web Solutions
  • Mail
  • Αρχική
  • Νέα & Ανακοινώσεις
  • Το Ι.Τ.Σ.
  • Ερευνητικά Έργα
  • Επικοινωνία
  • English
NOPTILUS eTRAWELSPA
Scroll to top

Αυτός ο ιστότοπος χρησιμοποιεί cookies. Συνεχίζοντας την περιήγηση στον ιστότοπο, συμφωνείτε με τη χρήση των cookies από εμάς.

Αποδοχή ΌλωνΡυθμίσεις

Ρυθμίσεις Cookie και Απορρήτου



How we use cookies


Ενδέχεται να ζητήσουμε τη ρύθμιση cookie στη συσκευή σας. Χρησιμοποιούμε cookies για να μας ενημερώνουμε όταν επισκέπτεστε τους ιστότοπούς μας, πώς αλληλεπιδράτε μαζί μας, να εμπλουτίζουμε την εμπειρία χρήστη σας και να προσαρμόσουμε τη σχέση σας με τον ιστότοπό μας.

Κάντε κλικ στις διάφορες επικεφαλίδες των κατηγοριών για να μάθετε περισσότερα. Μπορείτε επίσης να αλλάξετε κάποιες από τις προτιμήσεις σας. Σημειώστε ότι ο αποκλεισμός ορισμένων τύπων cookies μπορεί να επηρεάσει την εμπειρία σας στους ιστότοπούς μας και στις υπηρεσίες που μπορούμε να προσφέρουμε.
Essential Website Cookies


Αυτά τα cookies είναι απολύτως απαραίτητα για να σας παρέχουμε υπηρεσίες που είναι διαθέσιμες μέσω του ιστότοπού μας και για να χρησιμοποιήσετε ορισμένες από τις δυνατότητες του.

Επειδή αυτά τα cookies είναι απολύτως απαραίτητα για την παράδοση του ιστότοπου, δεν μπορείτε να τα αρνηθείτε χωρίς να επηρεάσετε τη λειτουργία του ιστότοπού μας. Μπορείτε να τα αποκλείσετε ή να τα διαγράψετε αλλάζοντας τις ρυθμίσεις του προγράμματος περιήγησής σας και αναγκάζοντας τον αποκλεισμό όλων των cookies σε αυτόν τον ιστότοπο.
Google Analytics Cookies


Αυτά τα cookies συλλέγουν πληροφορίες που χρησιμοποιούνται είτε σε συγκεντρωτική μορφή για να μας βοηθήσουν να κατανοήσουμε πώς χρησιμοποιείται ο ιστότοπός μας ή πόσο αποτελεσματικές είναι οι καμπάνιες μάρκετινγκ ή για να μας βοηθήσουν να προσαρμόσουμε τον ιστότοπο και την εφαρμογή μας για εσάς, προκειμένου να βελτιώσουμε την εμπειρία σας.

Εάν δεν θέλετε να παρακολουθούμε τον επισκέπτη σας στον ιστότοπό μας, μπορείτε να απενεργοποιήσετε την παρακολούθηση στο πρόγραμμα περιήγησής σας εδώ:

Other external services


Χρησιμοποιούμε επίσης διαφορετικές εξωτερικές υπηρεσίες, όπως το Google Webfonts, τους Χάρτες Google και εξωτερικούς παρόχους βίντεο. Δεδομένου ότι αυτοί οι πάροχοι ενδέχεται να συλλέγουν προσωπικά δεδομένα όπως η διεύθυνση IP σας, σας επιτρέπουμε να τους αποκλείσετε εδώ. Λάβετε υπόψη ότι αυτό μπορεί να μειώσει σημαντικά τη λειτουργικότητα και την εμφάνιση του ιστότοπού μας. Οι αλλαγές θα τεθούν σε ισχύ μόλις φορτώσετε ξανά τη σελίδα.

Google Webfont Settings:

Google Map Settings:

Vimeo and Youtube video embeds:

Privacy Policy


Μπορείτε να διαβάσετε αναλυτικά για τα cookies και τις ρυθμίσεις απορρήτου μας στη Σελίδα Πολιτικής Απορρήτου.

Privacy and Cookies Policy
Allow AllSave Settings
Open Message Bar