MPI (Message Passing Interface) is the standard way to use multiple nodes. It requires more intensive changes to code than OpenMP. It is available for C and Fortran primarily, but wrappers exist for C++, python, R, octave and other languages. The Java implementation seems to be moribund.
It can be used for multicore programming as well as multinode, and you can simply let it assign a thread of execution (known as a 'rank') to each core. Similarly if you have two nodes, then it can use every core on each node.
It relies on sending messages containing signals or data from one parallel process (rank) to another. All ranks run the same code, and so care must be taken to separate work into master and slave sections. Care must be taken to avoid deadlocks (ranks waiting for data to arrive before sending data, but not sending the data to other ranks until it has arrived), avoiding too many small communications, and ensuring that ranks have work to do whilst waiting for other ranks to send data, and ensuring that there are points where all ranks wait for others to catch up (e.g. at the end of an iteration).
Whilst on a node MPI uses fast communications it still has some overhead, and in some instances on a single node OpenMP may beat MPI for some operations.