Implementing Google Cloud Vision in Node.js

Sebastian Gomez

Jun 24, 2026

Updated: Jun 25, 2026

Implementing Google Cloud Vision in Node.js

Continuing with Google's Cloud Vision theme, in this post we'll implement it in Node.js to detect features in images.

The first thing we need to understand is that Google's Cloud Vision SDK is meant to run on a server, and that's where the need for Node.js comes in. If you have some experience with Node.js, you know that in most cases you need a library to handle HTTP POST, GET, and PUT requests. In this post we'll use Express, since it's the best known library for this task. If you have questions about the fundamentals of Google's Cloud Vision or about the free tier, you can review them in the introductory post of this series.

Note: this post is updated for 2026. The @google-cloud/vision SDK is now on v5.x, and the way you initialize the client has changed since the early versions; below you'll see the current pattern.

First, let's review the features Google's Vision API offers us to implement with Node.js:

Face detection.
Image attributes.
Label annotation.
Adult content detection.
Logo detection.
Object localization in the image.
Optical character recognition.

The architecture

Let's review the responsibilities each component of the architecture would have.

Client Browser:

Present information.
Take photos or videos.
Preprocess images.
Send the images to the server.

Express:

Filter the client's requests.
Receive the images and store them on the server.
Send the responses back to the client.

*Google Vision SDK ():**

Process the images.
Mathematical operations.
Pixel detection.
Filters.
Transformations.
Image encoding.
Request preparation.

Google Vision API:

Compare the images against the image set.
Apply machine learning routines with the images as input.
Connect with other technologies such as TensorFlow.
Analyze the results and grow the knowledge base.

(*) An SDK (Software Development Kit) is a set of tools that help build applications for a particular technology environment.

Getting the credentials

Before we start, we need to obtain a key and the project credentials to use the API and the SDK. We can get this directly in the console: https://cloud.google.com/, following these simple steps:

Create the project at https://console.cloud.google.com/projectcreate.
Access the project.
Select service accounts.
Create a service account and fill in the information.
Choose to provide a new private key in JSON format.
Save the JSON file that gets generated.

Note on authentication: downloading a service account JSON key still works in 2026, and it's the simplest path for learning. However, Google now recommends using Application Default Credentials (ADC) or Workload Identity, and discourages downloading long lived JSON keys. If you deploy to production, consider ADC instead of a credentials file on disk.

Let's get to work

Let's start by creating a Node project from scratch using the command line.

# Create a directory
mkdir my-directory

# Initialize git
git init

# Initialize node
npm init

# Install the dependencies
npm install @google-cloud/vision
npm install express
npm install multer

We configure our npm start command in package.json so it starts our server:

{
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1",
    "start": "node server.js"
  }
}

This will create the base structure of the project, and you'll need to add a few more files. The only files we'll actually write code in are server.js, index.html, and app.js.

How do we take photos with JavaScript?

Let's start with a basic feature: taking photos from the browser camera. You can see the full example in this CodePen:

https://codepen.io/seagomezar/pen/QayorL

As you'll see, simply by creating a stream with getUserMedia and accessing the DOM elements, we can get the simplest possible feature: capturing images from your computer's camera and painting them inside the img tag.

Uploading the image to the server

Now we'll see how to upload the image we captured to our server. Once we capture the image and paint it inside the src attribute, we need to make an HTTP request to the server telling it that the image is on its way. To do this, open your app.js file and add the upload function:

function upload() {
  const http = new XMLHttpRequest();
  const url = "upload";
  snap().then((blob) => {
    http.open("POST", url, true);
    http.setRequestHeader("X-Requested-With", "XMLHttpRequest");
    http.onreadystatechange = (data) => {
      // Call a function when the state changes.
      if (http.readyState == 4 && http.status == 200) {
        console.log(http.response);
      }
    };
    const formData = new FormData();
    formData.append("uploads", blob);
    http.send(formData);
  });
}

As you'll see, in this function we call the snap() function we defined to capture images from the browser camera, and we send the image content through an XMLHttpRequest. Then we need to create our server so it can receive and process that image.

Note: in 2026 the idiomatic choice would be to use fetch instead of XMLHttpRequest, but we keep XMLHttpRequest here so the example stays easy to follow. Rewriting it with fetch is a good exercise.

This is the point where we need to create and configure a basic Express server that supports uploading the image. So your server.js file should look like this in its initial part, with the upload function that processes the image and saves it in the uploads folder. Each image is saved with a consecutive number tied to the current moment, to avoid duplicates.

const express = require("express");
const multer = require("multer");

const app = express();

// We save each file with a unique name based on the timestamp.
const storage = multer.diskStorage({
  destination: "uploads/",
  filename: (req, file, cb) => cb(null, `${Date.now()}-${file.originalname}`),
});

const upload = multer({ storage });

app.use(express.static("public"));

app.listen(3000, () => console.log("Server listening on port 3000"));

Detecting features with Cloud Vision

So far we've only created utility functions in server.js to start the server and to upload files, but we haven't run any processing on the image. Now we'll learn how to detect features in the image using the Cloud Vision SDK.

First we initialize the Cloud Vision client. Note the key change compared to old versions: the package is no longer invoked as a function (require('@google-cloud/vision')({...})), it now exports classes instead. We create an instance of ImageAnnotatorClient:

const vision = require("@google-cloud/vision");

// Since v1.0 the package exports classes, not a callable factory.
// The projectId is inferred from the credentials file, no need to declare it.
const client = new vision.ImageAnnotatorClient({
  keyFilename: "./cloud-credentials.json",
});

Then we create an Express endpoint in charge of receiving an image and getting all the labels we can extract from it, to return them as the response and present them in the frontend.

app.post("/labels", upload.single("uploads"), function (req, res) {
  // For a local file you just pass the path. source.filename
  // is for Cloud Storage URIs (gs://...), not local disk.
  const currentFile = req.file.path;
  client
    .labelDetection(currentFile)
    .then((results) => {
      const labels = results[0].labelAnnotations;
      console.log("Labels:");
      labels.forEach((label) => console.log(label.description));
      res.send(labels);
    })
    .catch((err) => {
      console.error("ERROR:", err);
      res.send("BAD");
    });
});

Note: in the current SDK (@google-cloud/vision, ImageAnnotatorClient) client.labelDetection(currentFile) accepts a local file path. If you prefer to be explicit, use { image: { content: fs.readFileSync(currentFile) } }.

Finally, we can use another feature that lets us find the faces in an image, with their trace and the position of each facial element:

app.post("/faces", upload.single("uploads"), (req, res) => {
  const currentFile = req.file.path;
  client
    .faceDetection(currentFile)
    .then((results) => {
      const faces = results[0].faceAnnotations;
      res.send(faces);
    })
    .catch((err) => {
      console.error("ERROR:", err);
      res.send("BAD");
    });
});

Showing the results in the frontend

Once our server.js file is complete, we need to tweak our index.html and app.js a bit to display our findings:

<!-- index.html -->
<button class="shutter" onclick="upload()">
  Take and upload a photo
</button>
<button class="shutter" onclick="sendToLabelDetection()">
  Analyze image
</button>
<button class="shutter" onclick="sendToFaceDetection()">
  Detect faces and features
</button>

<div>
  <h2>Image features</h2>
  <div id="labels"></div>
</div>

Putting all of the above together, you'll be able to see the image features (labels) in the feature detection. For face detection and extraction, we implement only console logging, where you'll see one array element per detected face, and inside each face the landmarks that mark exactly each one of the facial elements.

You can play with these functions to create unique feature combinations for your application or product. In this repository you can play with the code and use it to implement your own features; you'll also find other functions, such as detecting happy or sad people in a photo:

https://github.com/seagomezar/devfest vision

Note: the example code lives in the public seagomezar/devfest-vision repository and uses the ImageAnnotatorClient client from @google-cloud/vision.

For more information about the operations of the Cloud Vision SDK and API for Node.js, you can check the official samples repository:

https://github.com/googleapis/nodejs vision/tree/main/samples

Suggested exercises

Rewrite the client's upload function using fetch instead of XMLHttpRequest.
Add a new endpoint that uses textDetection (OCR) to extract text from an image.
Switch authentication from a JSON file to Application Default Credentials and deploy the server in a Google Cloud environment.

3-point summary

The Cloud Vision SDK runs on the server; with Express we receive the image and with multer we save it before analyzing it.
The client is initialized with new vision.ImageAnnotatorClient(...); for local files you pass the path or a buffer, not source.filename.
With labelDetection and faceDetection we get labels and faces, and return them to the frontend to display them.

That's all. I hope this post is useful to you and that you can apply it to a project you have in mind. Leave me a comment if you managed to implement it, if you want to add another feature, or if you have any questions. And remember, if you liked it, you can also share it using the social links below.