Skip to content

Bulk Operations

Fetching one gem at a time is slow when you need dozens or hundreds. rubygems-skills ships a concurrent worker pool for bulk reads — BulkGet* methods fan out in parallel and collect typed results.

The bulk methods

MethodReturns per gem
BulkGetPackages(ctx, names, opts)*models.PackageInformation
BulkGetVersions(ctx, names, opts)[]*models.Version
BulkGetDependencies(ctx, names, opts)[]*models.DependencyInfo
BulkGetReverseDependencies(ctx, names, opts)[]string

Each returns a []*BulkResult[T] aligned to your input slice — index i of the result corresponds to names[i].

Quick start

go
repo := repository.NewRepository()

names := []string{"rails", "puma", "sidekiq", "redis", "rake"}

results := repo.BulkGetPackages(ctx, names, nil) // nil = default options
for i, r := range results {
    if r.Error != nil {
        fmt.Printf("%s: error — %v\n", names[i], r.Error)
        continue
    }
    fmt.Printf("%s: %s (%d downloads)\n", r.Value.Name, r.Value.Version, r.Value.Downloads)
}

Options

go
opts := repository.NewBulkOptions().
    WithMaxConcurrency(8).       // parallel workers
    WithContinueOnError(true)    // keep going if one gem fails

results := repo.BulkGetPackages(ctx, names, opts)
MethodDefaultPurpose
WithMaxConcurrency(n)10Number of in-flight requests.
WithContinueOnError(bool)trueIf false, the pool stops on the first error.

The result type

go
type BulkResult[T any] struct {
    Key   string // the request key, usually the gem name
    Value T      // operation result
    Error error  // possible error during the operation
}

Because errors are per-item rather than a single fatal error, one missing gem doesn't sink the whole batch. Always check r.Error before using r.Value; r.Key lets you correlate a result back to its input name without relying on slice index.

Concurrency limits

RubyGems.org rate-limits aggressively. Set MaxConcurrency conservatively:

  • Unauthenticated: 2–5 workers. Higher risks 429s.
  • Authenticated (token): 5–10 workers is usually safe.
  • With retry enabled: you can be a bit more aggressive — transient 429s will be retried automatically.

If you see frequent IsRateLimited errors, lower MaxConcurrency or add a token via Options.SetToken.

With caching

Bulk methods bypass the CachedRepository decorator and hit the underlying repo directly (the per-item concurrency makes the cache layer redundant for bulk reads). For single-gem repeat reads, use the wrapped methods on CachedRepository instead.

Implementation note

The pool is generic — runWorkerPool[T] — so all four bulk methods share one concurrency implementation.

How it stays correct without locks:

  • The dispatcher sends indices (not gem names) onto a buffered channel — each index i corresponds to names[i].
  • Each worker pulls an index, calls the underlying method for names[i], and writes its BulkResult[T] only into results[i]. Because every write targets a distinct slice slot, there's no data race and no mutex needed on the result slice.
  • The result slice is pre-sized to len(names), so order matches the input exactly — results[i] always answers names[i].

With ContinueOnError(false), the pool stops dispatching new indices after the first error (workers finish their in-flight item, then exit) — so a single failure short-circuits the rest of the batch. With the default true, every item runs regardless, and failures land in their slot's Error field.


Next: Error Handling.

Released under the MIT License.