Understanding Remote Procedure Calls (RPC) in Distributed Systems

Posted in distributed-systems by Christopher R. Wirz on Wed Aug 21 2024

What is RPC?

Remote Procedure Call (RPC) is a protocol that allows a program to execute a procedure on another computer as if it were a local procedure call. This abstraction makes it easier for developers to create distributed applications without worrying about the underlying network details.

The Challenges of Client-Server Architecture

There are several challenges of traditional client-server architectures:
  • Server discovery and connection establishment
  • Agreeing on operation identifiers and parameters
  • Data representation and serialization
  • Handling network delays and failures

Goals of RPC Systems

RPC systems aim to address these challenges by:
  • Hiding the complexity of distributed programming
  • Making distributed systems programming similar to single-node programming
  • Providing support for service registration and discovery
  • Handling data serialization and deserialization
  • Managing connections and protocols
  • Dealing with failures and retries

RPC System Architecture

An RPC system typically consists of:
  • Programming interface for client and server applications
  • Stub layers for marshaling and unmarshaling data
  • RPC runtime for connection management and data transmission
  • Interface Definition Language (IDL) for service specification
  • RPC compiler for generating stub code
  • Registry service for service discovery

How RPC Works

A simple example of an RPC call:
  1. The client calls a procedure (e.g., add(i, j))
  2. The call is intercepted by the client stub
  3. The stub marshals the parameters into a message
  4. The RPC runtime sends the message to the server
  5. The server stub unmarshals the message
  6. The server executes the actual procedure
  7. The result follows the reverse path back to the client

Types of RPC Operations

RPC operations can be:
  • Synchronous: The client waits for the response
  • Asynchronous: The client continues execution and checks for the response later

RPC Semantics

RPC systems can provide different guarantees:
  • Exactly-once semantics (ideal but challenging in distributed systems)
  • At-most-once semantics (more practical)
  • At-least-once semantics (with potential for duplicates)

Popular RPC Systems

Several RPC systems have been developed over the years:
  • Sun RPC (early 1980s)
  • SOAP and CORBA (enterprise solutions)
  • Apache Thrift and gRPC (modern internet services)
  • Specialized systems for high-performance data centers or embedded environments

gRPC: A Modern RPC Implementation

gRPC, released by Google in 2016, is a popular modern RPC system. It uses Protocol Buffers for interface description and data serialization. Key features include:
  • Language-agnostic interface definition
  • Support for multiple programming languages
  • Efficient serialization
  • Strong typing

Key Concepts

Remote Procedure Call (RPC): A protocol that allows a program to execute a procedure on another computer as if it were a local procedure call.

Client-Server Architecture: A distributed application structure that partitions tasks between providers of a resource or service (servers) and service requesters (clients).

Stub: A piece of code that converts parameters passed between client and server during a remote procedure call.

Marshalling: The process of transforming memory representations of data structures into a format suitable for transmission over a network.

Unmarshalling: The reverse process of marshalling, where transmitted data is converted back into native data structures.

Interface Definition Language (IDL): A language used to describe the interface between client and server in a way that is independent of any particular programming language.

RPC Compiler: A tool that generates stub code from IDL specifications.

Registry Service: A component that allows servers to register their available services and clients to discover these services.

Synchronous RPC: An RPC operation where the client waits for the server's response before continuing execution.

Asynchronous RPC: An RPC operation where the client continues execution after making the call and checks for the response later.

At-most-once Semantics: A guarantee that an RPC operation will be executed at most one time, even in the presence of network failures.

At-least-once Semantics: A scenario where an RPC operation may be executed more than once due to retries, potentially leading to duplicate operations.

Exactly-once Semantics: The ideal (but challenging to implement) guarantee that an RPC operation will be executed exactly one time.

gRPC: A modern, open-source RPC framework developed by Google.

Protocol Buffers: A language-agnostic data serialization format used by gRPC for defining service interfaces and message structures.

Serialization: The process of converting a data structure or object into a format that can be stored or transmitted across a network.

Deserialization: The process of reconstructing a data structure or object from the serialized format.

RPC Runtime: The system software that manages the execution of remote procedure calls, including network communication and error handling.

Service Discovery: The process by which clients locate and identify available services in a distributed system.

Callback: A function that is passed as an argument to another function and is executed after the completion of that function, often used in asynchronous RPCs.

Review Questions

Can you identify and describe the main elements and steps involved in a distributed RPC system?

The main elements of a distributed RPC system include:

  • Client and Server applications
  • Programming interface
  • Stub layer (both client-side and server-side)
  • RPC runtime
  • Interface Definition Language (IDL)
  • RPC compiler
  • Registry service

The steps involved in an RPC call typically include:

  1. Client makes an RPC call that looks like a local procedure call
  2. Call is intercepted by the client-side stub
  3. Client stub marshals (serializes) the parameters
  4. RPC runtime sends the message to the server
  5. Server-side stub receives the message
  6. Server stub unmarshals (deserializes) the parameters
  7. Server stub calls the actual procedure implementation
  8. Server performs the requested operation
  9. Results are marshaled by the server stub
  10. Server RPC runtime sends the response back to the client
  11. Client-side stub receives and unmarshals the results
  12. Results are returned to the client application

Contrast exactly once, at most once, at least once, invocation semantics

Exactly once semantics

  • Requires that the operation is executed only once, regardless of failures.
  • RPC runtime would need to implement:
    • Automatic retransmission on no response
    • Mechanism for servers to detect and ignore duplicate requests
  • Feasible under assumptions of:
    • Reliable network with bounded delays
    • No permanent server failures
    • Ability to maintain state about executed operations

At most once semantics

  • Guarantees that the operation is executed either once or not at all.
  • RPC runtime implements:
    • Limited retransmission attempts
    • Mechanism to inform the client of uncertainty
  • More practical in real-world scenarios where network partitions or server failures can occur.
  • Feasible under most network conditions, but requires the client application to handle potential non-execution.

At least once semantics

  • Ensures the operation is executed one or more times.
  • RPC runtime implements:
    • Continuous retransmission until acknowledgment
    • No duplicate detection on the server side
  • Feasible in most network conditions
  • Suitable for idempotent operations (operations that can be repeated without changing the result beyond the initial application)
  • Not suitable for non-idempotent operations (e.g., incrementing a counter)

Under which assumptions would such invocation semantics be feasible

  • Exactly once semantics are the most challenging to implement in distributed systems due to the possibility of network partitions, server failures, and message losses. They require strong consistency guarantees and are often not practical in large-scale distributed systems.
  • At most once semantics are more feasible and are often the default in many RPC systems. They work well when the client can handle potential non-execution of the operation.
  • At least once semantics are the easiest to implement but require careful consideration of the operation's idempotence. They are feasible in most network conditions but may lead to unintended side effects for non-idempotent operations.

What would be required from the RPC runtime

The choice of invocation semantics depends on the specific requirements of the application, the nature of the operations being performed, and the trade-offs between consistency, availability, and partition tolerance in the distributed system.