Weaviate: A Practical Guide for Full-Stack Developers

Weaviate is an open-source vector database that lets you build AI-powered search, recommendations, and classification directly into your applications.

If you're building features that need to understand semantic meaning—like searching with natural language or finding similar images—you've likely hit the limits of traditional databases. Weaviate solves this by storing data objects alongside their vector embeddings, enabling similarity-based retrieval. I've integrated it into several projects at my agency, Anjeer Labs, and it consistently simplifies adding AI search capabilities. This guide will walk through the practical steps and decisions involved in using Weaviate as a full-stack developer.

Why Weaviate Matters (and When to Skip It)

Weaviate matters because it abstracts away the heavy lifting of vector search. You don't need to manage a separate embedding service and a vector index; Weaviate can generate embeddings for you using integrated modules (like OpenAI or sentence-transformers) and handles the approximate nearest neighbor (ANN) search efficiently. It's a batteries-included, production-ready system.

However, be opinionated about when to use it. Skip Weaviate if your search needs are purely keyword-based or if your dataset is tiny (under 10,000 records). A well-indexed PostgreSQL or Elasticsearch instance will be simpler and cheaper. The complexity of managing a vector database is only justified when you need semantic, multimodal, or hybrid search. For simple autocomplete, it's overkill.

Getting Started with Weaviate

The fastest way to run Weaviate locally is with Docker Compose. This configuration includes Weaviate and the text2vec-transformers module for local embedding generation.

# docker-compose.yml
version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:1.24.1
    command:
      - --host
      - 0.0.0.0
      - --port
      - '8080'
      - --scheme
      - http
    ports:
      - "8080:8080"
    environment:
      TRANSFORMERS_INFERENCE_API: 'http://t2v-transformers:8080'
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      DEFAULT_VECTORIZER_MODULE: 'text2vec-transformers'
      ENABLE_MODULES: 'text2vec-transformers'
      CLUSTER_HOSTNAME: 'node1'
  t2v-transformers:
    image: semitechnologies/transformers-inference:sentence-transformers-multi-qa-MiniLM-L6-cos-v1
    environment:
      ENABLE_CUDA: '0'

Run docker-compose up -d, and your Weaviate instance will be ready at http://localhost:8080. For a programmatic start in a Node.js/TypeScript project, install the official client: npm install weaviate-client.

Core Weaviate Concepts Every Developer Should Know

1. Schema as a Contract

Unlike schemaless NoSQL databases, Weaviate requires you to define a schema for your data classes. This schema defines the properties, their data types, and crucially, how each class should be vectorized.

import weaviate, { ApiKey } from 'weaviate-ts-client';

const client = weaviate.client({
  scheme: 'http',
  host: 'localhost:8080',
});

const classObj = {
  class: 'Article',
  vectorizer: 'text2vec-transformers', // Use the module from Docker setup
  moduleConfig: {
    'text2vec-transformers': {
      vectorizeClassName: false, // Good practice: don't vectorize the class name itself
    },
  },
  properties: [
    {
      name: 'title',
      dataType: ['text'],
      moduleConfig: {
        'text2vec-transformers': {
          skip: false, // This property will be included for vectorization
          vectorizePropertyName: false,
        },
      },
    },
    {
      name: 'wordCount',
      dataType: ['int'],
      moduleConfig: {
        'text2vec-transformers': {
          skip: true, // Skip vectorizing pure numbers
        },
      },
    },
  ],
};

async function createSchema() {
  await client.schema.classCreator().withClass(classObj).do();
  console.log('Schema created for Article class');
}
createSchema();

2. Data Objects and Vectorization

When you add data, Weaviate automatically creates a vector for each object based on the schema's vectorizer and module config. The vector becomes the object's address in high-dimensional space.

async function addArticle() {
  const articleData = {
    title: 'Weaviate simplifies vector search for developers',
    wordCount: 750,
  };

  // The vector for 'articleData' is created automatically
  const result = await client.data
    .creator()
    .withClassName('Article')
    .withProperties(articleData)
    .do();

  console.log(`Object created with ID: ${result.id}`);
  console.log(`Vector exists: ${result.vector ? 'Yes' : 'Yes'}`);
}
addArticle();

3. Querying with GraphQL and Vector Search

You retrieve data using GraphQL. The true power is the nearText search, which finds objects semantically close to your query concepts.

async function semanticSearch() {
  const result = await client.graphql
    .get()
    .withClassName('Article')
    .withFields('title wordCount _additional { id distance }')
    .withNearText({ concepts: ['developer tools for AI databases'] }) // Concept, not keyword
    .withLimit(2)
    .do();

  console.log(JSON.stringify(result, null, 2));
  // Returns articles whose *meaning* is close to the concepts, sorted by distance.
}
semanticSearch();

Common Weaviate Mistakes and How to Fix Them

Mistake 1: Vectorizing Everything. Developers often set skip: false for all properties, including IDs, dates, and enums. This pollutes the vector space. Fix: Only vectorize meaningful text fields. For properties like status or categoryId, set skip: true in the module config.

Mistake 2: Ignoring Hybrid Search. Relying solely on nearText can miss exact keyword matches. Fix: Use Weaviate's hybrid search, which combines BM25 (keyword) and vector scores. It's a single GraphQL query using the hybrid operator, giving you the best of both worlds.

Mistake 3: Assuming Vectors are Immutable. If you update an object's text property without re-vectorizing, the vector becomes stale. Fix: Use the replace API endpoint or ensure your update operation triggers the configured vectorizer module to generate a new embedding for the changed data.

When Should You Use Weaviate?

Use Weaviate when you have a clear need for semantic or multi-modal search over thousands of records. Classic use cases include e-commerce product discovery using natural language ("comfortable summer shoes"), content recommendation engines, or deduplicating similar support tickets. It's also an excellent choice when you want a single system to handle both your traditional metadata filters and your AI-powered similarity searches, simplifying your backend architecture.

Weaviate in Production

First, never use the anonymous access setting (AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true') in production. Configure API key or OIDC authentication from the start. Second, plan your backup strategy. While Weaviate persists data to disk, you need a process for snapshotting and restoring the PERSISTENCE_DATA_PATH volume. Third, monitor your import and query performance using the built-in Prometheus metrics, paying close attention to query latency and vectorization throughput as your data scales.

Start your next project by defining the semantic search question it needs to answer—that will tell you if you need Weaviate.