You need scaling huh? Maybe it’s just ego

I’ve just come back from OracleCode Singapore.  It was a great event – the venue was awesome and the attendees were engaged and interested in the content. But there was one thing that I found amusing (disturbing perhaps?) is the number of times I had people approach me on the topic of scaling.  Conversation would typically run along the lines of:

“What is your recommendation for scaling?”

which almost suggests that scaling is of itself, the end solution here.  Not “Here is function X, and I need it to scale”, or “My business requirement is X, and it needs to scale” but just “I need to scale”

So I’d push back and ask more questions:

  • Scale what?
  • What data are you capturing?
  • Where is it coming from?
  • What speed? What volume?
  • What are you plans with the data you are capturing?
  • How do you intend to process the data?
  • Is it transient? Are you planning on storing it forever? Is it sensitive information?

And the scary thing is – more often than not, those were questions for which they did not have answers to (yet?). I’d ask the questions, and very quickly the conversation would be returned to:

“Do I need sharding?”
”Should I use a NoSQL solution?”
“What ‘aaS’ option should I be using to achieve my scaling needs”
”How many nodes do I need?
”What server configuration is best?”

I’m seeing this more and more – that the technological approach to achieve a business requirement is seen AS the business requirement. I hate to be brutal (well…that’s a lie, I like being brutal Smile) but here’s the thing – Stop being so damn focussed on scaling until you have an idea of what your true performance requirements are!

Don’t get me wrong – there are systems out there that need to be architected from the ground up that will have to deal with scaling challenges that have perhaps never been tackled before.  But read those last few words again: “never been tackled before”.  Do you know what that also means?  It means it applies to an intsy wintsy tiny percentage of IT systems.  If it wasn’t, then surprise surprise – those challenges have been tackled before.  Why does everyone in the IT industry think that the system they are about to build will need the same architectural treatment as those 0.00001% of systems in the world that truly do.

Because in almost all of my time in IT, for the other 99.999% of systems out there – the two critical solutions to scaling systems to meet (and well and truly exceed) the performance requirements to meet the business needs are pretty simple:

1) don’t write crappy code,

2) don’t store data in a crappy way

That’s it.  When you can definitively demonstrate that

a) your code is well written,

b) your data is being stored in a means to best serve business requirements

and your application still cannot meet performance needs, then yes, it’s time to talk about architectural options for scaling. But more and more I see folks ignoring (a) and (b), or worse, just assuming that they are implicit and guaranteed to happen, and leaping straight into “I need a 10,000 node, geo-disperse, NoSQL, cached, compressed, mem-optimized, column-based, non-ACID, distributed blah blah blah” for my system to work.

Here’s a reality check – you don’t.  Save yourself a lot of hassles and start simple and focus on quality. You’ll find things will probably scale just fine.

If you’ve made it this far through the post and you think I’m just ranting…well, that’s true Smile but let me also answer the next obvious question:

“So how do we make sure we write good code? How do we make sure we store our data intelligently?”

That’s why we (developer advocates) are here. We’re here to help you succeed. So check out our resources, reach out to us via social media channels, and we’ll help you every step of the journey.

When you screw up … make it positive for your users

Yesterday I was caught up in an interesting SNAFU at my local Supermarket.  All of the checkout registers shut down, thus making it impossible to pay for groceries.  Later on Twitter, the company apologized as we discovered it was actually a nationwide outage!

 

image

News of the outage spread like wildfire through the media:

http://www.news.com.au/finance/business/retail/woolworths-checkouts-hit-by-national-outage/news-story/5611943156249d4ecc6427ef0b447c18

https://www.smh.com.au/business/consumer-affairs/woolworths-meltdown-closes-stores-across-australia-20180416-p4z9y4.html

http://www.abc.net.au/news/2018-04-16/woolworths-checkouts-across-australia-down/9663904

https://www.sbs.com.au/news/woolworths-hit-by-nationwide-technical-outage

https://www.9news.com.au/national/2018/04/16/16/29/woolworths-outage-stops-trading-across-australia

https://www.lifehacker.com.au/2018/04/the-woolworths-outage-is-a-lesson-in-business-continuity-planning/

TL;DR – people were forced to abandon their shopping trolleys and had to leave the stores. 

Needless to say, consumers vented loudly and forcefully at Woolworths, and the negative press naturally struck a chord with the general public, because we all feel empathy for the parent in front of the television cameras lamenting their inability to feed their family that night.  (For the record, my boys had meat balls and salad last night that we made with the leftovers we had in the fridge Smile)

In a perfect world, IT system updates should never cause pain for the users of those IT systems, but no matter how careful the testing and planning, I think it is reasonable to assert that we can never eliminate totally the chances of a major problem during an upgrade, our aim is always to shrink the probability to a close to zero as possible.

That brings me to the point of this post – and this perhaps slightly controversial stance.  I don’t think IT outages really matter that much from the perspective of the customer. For example, a while back Amazon had a huge outage here in Australia due to storms in Sydney.  Delta Airlines had a big outage in late 2016.  But last time I checked, people are still flying Delta and still buying stuff they didn’t need from Amazon Smile. Customers will forgive an outage but only if you prioritize their needs over yours during the crisis.  People are still ripping into Woolworths today because a Twitter apology doesn’t really get consumers any closer to taking groceries home. 

So this is what I would have done if I was Woolworths…. Make an announcement in each store that the store needs to close unexpectedly and customers to take your trolley to the nearest checkout (even though I know that the checkout’s are not working).  At that point, simply let people take what they have accumulated so far in their trolleys for no charge.  The news articles above already mentioned that the stores had security staff on hand to assist with closing the stores – so there is protection against a “looting mentality” being created.  Yes, there will be still be some negative press for those customers that could not get into the stores once they closed, but I contend that ultimately this would have turned into a positive result for Woolworths.  Yes you take a hit on the bottom line for yesterdays revenue, but the media attention becomes the mums and dads walking out of the stores smiling about the free shop they just got, rather than swearing they’ll never shop at Woolworths again.

Outages don’t matter.  Meeting the customer need is what matters.

Don’t get me wrong – I’m not claiming that any and every company I have ever worked for, or worked with, has a glowing record of understanding how to meet customer needs during times of crisis.  My point is that it should be something to always strive for – when you inflict pain on your customers due to the information technology solutions you build, then do your best to own the problem, and bust a gut trying to make the experience as bearable as possible for your customers, or even a win for them. 

Whether you turn bad into good, or bad into worse, rest assured your customers will remember you for it.

TO_DOG_YEAR

Some members of the Oracle community got well and truly into the April Fools Day spirit this year.

There were plenty of very earnest looking blog posts about a new 18c function – “TO_DOG_YEAR”.  You can read their posts here

http://www.oralytics.com/2018/04/predicting-ibs-in-dogs-using-oracle-18c.html 
https://blog.dbi-services.com/after-iot-iop-makes-its-way-to-the-database/
http://berxblog.blogspot.ie/2018/04/more-fun-with-ages.html
http://vanpupi.stepi.net/2018/04/01/exploring-18c-exadata-functions/

They even enlisted the help of the team here at AskTOM where they posed a question looking for more details here.

But naturally, it was important to get as many puns and hints into our answer as possible – did you spot them all ? Smile

 

image

It’s not about ego … it’s about knowledge

Take a quick look at this blog post by Jonathan Lewis

https://jonathanlewis.wordpress.com/2017/12/30/nvarchar2/

Anyone that has been working with Oracle for any length of time probably knows that Jonathan has a great depth of knowledge in the Oracle database, and is a regular blogger.  But this post is a good example to inspire anyone that is working with Oracle (or any technology for that matter) to start blogging and sharing their experiences with the community, no matter what their level of experience is.

If you read the post, you’ll see that Jonathan presented a well-crafted test case, and presented a hypothesis about NVARCHAR2 and potential side effects of adding columns of this data type to an existing table.

Turns out the hypothesis was wrong, and the observations were unrelated to NVARCHAR2 at all.  A comment from a reader pointed out the true cause of the side effect.

But here’s the important thing.

Has the blog post been deleted ? No

Has the comment been deleted ? No.

Publishing information for the community to digest is not (as we say in Australia) a pissing contest (https://en.wikipedia.org/wiki/Pissing_contest) to show who is the smartest or the fastest or the cleverest.  It is about collectively growing the knowledge base of one’s self and the community.

So don’t be afraid to publish your experiences so that all may benefit.  If your findings or claims are incorrect, then good people in the community will correct you gently and professionally.  And those not-so-good people that choose to point out errors in a condescending or derogatory tone…well….they’ll be doing a lot more damage to their online reputations than they could ever possibly do to yours.

Happy New Year!

2017–what grabbed your attention

Here are the blog posts that you hit on most this year.  Thanks for supporting the blog, and always, there will be more content next year !

Buffer cache hit ratio–blast from the past

I was perusing some old content during a hard drive “spring clean” the other day, and I found an old gem from way back in 2001.  A time when the database community were trying to dispel the myth that all database performance issues could be tracked back to,  and solved via, the database buffer cache hit ratio.  Thankfully, much of that folklore has now passed into the realm of fiction, but I remember at the time, as a means of showing how silly some of the claims were, I published a routine that would generate any buffer cache hit ratio you desired.  It just simply ran a query to burn through logical I/O’s (and burn a whole in your CPU!) until the required number of operations bumped up the buffer cache hit ratio to whatever number you liked Smile 

Less performance, more work done…. all to get a nice summary number.

The kinds of statistics that the database collects, and what each one represents has changed over the years and versions of Oracle, but I figured I’d present the routine in original form as a nostalgic reminder that statistics without an understanding behind them are as good as no statistics at all.

Enjoy !


create or replace
procedure choose_a_hit_ratio(p_ratio number default 99,p_show_only boolean default false) is
  v_phy                number;
  v_db                 number;
  v_con                number;
  v_count              number;
  v_additional_congets number;
  v_hit number;
  
  procedure show_hit is
  begin
    select p.value, d.value, c.value
    into v_phy, v_db, v_con
    from 
      ( select value from v$sysstat where name = 'physical reads' ) p,
      ( select value from v$sysstat where name = 'db block gets' ) d,
      ( select value from v$sysstat where name = 'consistent gets' ) c;
    v_hit := 1-(v_phy/(v_db+v_con));
    dbms_output.put_line('Current ratio is: '||round(v_hit*100,5));
  end;
begin
--
-- First we work out the ratio in the normal fashion
--
  show_hit;

  if p_ratio/100  99.9999999 then
    dbms_output.put_line('Sorry - I cannot help you');
    return;
  end if;
--
-- Flipping the formula we can work out how many more consistent gets
-- we need to increase the hit ratio
--
  v_additional_congets := trunc(v_phy/(1-p_ratio/100)-v_db - v_con);

  dbms_output.put_line('Another '||v_additional_congets||' consistent gets needed...');

  if p_show_only then return; end if;
--
-- Create a simple table to hold 200 rows in a single block
--
  begin
    execute immediate 'drop table dummy';
  exception 
    when others then null;
  end;

  execute immediate 'create table dummy (n primary key) organization index as '||
                    'select rownum n from all_objects where rownum  prior n
      start with n = 1 )
    where rownum  exec choose_a_hit_ratio(85,true);
Current ratio is: 82.30833
Another 29385 consistent gets needed...

PL/SQL procedure successfully completed.

SQL> exec choose_a_hit_ratio(85);
Current ratio is: 82.30833
Another 29385 consistent gets needed...
Current ratio is: 86.24548

PL/SQL procedure successfully completed.

SQL> exec choose_a_hit_ratio(90,true);
Current ratio is: 86.24731
Another 79053 consistent gets needed...

PL/SQL procedure successfully completed.

SQL> exec choose_a_hit_ratio(90);
Current ratio is: 86.24731
Another 79053 consistent gets needed...
Current ratio is: 90.5702

PL/SQL procedure successfully completed.

SQL> exec choose_a_hit_ratio(98,true);
Current ratio is: 90.5709
Another 1141299 consistent gets needed...

PL/SQL procedure successfully completed.

SQL> exec choose_a_hit_ratio(98);
Current ratio is: 90.5709
Another 1141299 consistent gets needed...
Current ratio is: 98.02386

PL/SQL procedure successfully completed.