Introduction to Google Cloud Operations Suite

Reading Time: 2 minutes

Hi Readers, Once we have deployment our code to production and the application is in use by large number of users we want to make sure that our application is reliable and highly available to the users.

However anything we try, there would be always be some chances of failures. We can’t remove them but we can definitely follow several things to minimise it.

In this blog we will Google’s Cloud operations which provides us several service to handle these situations.

What is Cloud Operations?

  • Cloud Monitoring formerly known as stackdriver is a tool that can be used for monitoring, logging, In this blog we will Google’s Cloud operations which provides us several service to handle these situations.error reporting, tracing and debugging our application in the cloud.
  • It can also provides us facility to look into the health performance and availability of applications on cloud.
  • It can also help us to find and fix issues faster.

Why do we need Cloud Operations?

  • To know how the cloud deployment is behaving.
  • To check when something is broken on production and is impacting customers.

Cloud Operations Capabilities

Monitoring

  • Cloud Monitoring provides us ability to keep and eye onto the performance, availability, and health of our applications and infrastructure.
  • For Instance it helps us to identify trends based on previous data and on basis of that it helps to prevent issues.

Logging

  • Cloud monitoring is a fully managed, highly scalable real-time log management with storage, search, analysis and alerting at large data storage.
  • For Instance it aggregates data from all the infrastructure and applications to a single location.

Observability

  • Tracer
  • Debugger
  • Profiler

For Capturing signals we have

  • Metrics
  • Logs
  • Trace

To visualise and analyse we have

  • Dashboard
  • Metrics explorer
  • Log viewer
  • Service monitoring
  • Health checks
  • Debugger
  • Profiler

To manage incidents

  • Alerts
  • Error
  • SLO

Use Case Scenarios

Scenario -1

Consider a scenario cum requirement, You want to keep you application up and running and have minimum downtime in order to make customer happy. What should be the game plan?

On the basis of concepts learnt above we can definitely add Cloud operation services to our GCP project. We can follow the flow something as shown below,

Scenario -2

Consider another scenario where you are supporting a Node.js application running on Google Kubernetes Engine (GKE) in production. The application makes several HTTP requests to dependent applications. You want to anticipate which dependent applications might cause performance issues.

In above scenario we can Instrument all applications with Cloud operations Trace and review inter-service HTTP requests.
Cloud operations trace provides visualisation and analyse request flow, service topology and latency issues in you application.

That’s all for this blog. In this blog we discussed about the basics of Google’s Cloud Operation feature. For more reading you can follow this blog as well. In next blog we will see a demo of it and set up some monitoring dashboards, alerts, policies associated with it.

Thank you for following this blog till end. If you found this blog helpful do share this blog with your colleagues. In case of any feedback, suggestion or question reach out to me at nitin.mishra@knoldus.com.

References

Written by 

Nitin Mishra is a Software Consultant at Knoldus Software LLP. He has done MCA from GGSIPU and completed Bachelor of Science in Computer Science from Delhi University. He is a tech enthusiast with good knowledge of Java. He is majorly focused in DevOps practice. On personal front he loves to travel mountains and writes poetry.