Artificial Resume Deciphering Intelligent Software ( ARDIS )

Artificial Resume Generating Intelligent Software ( ARGIS )

-------------------------------------------------------------------------------------------------------

Note : Most of what I envisaged in my following notes 20 years ago , must have materialized by now !

But , it did lead to the launch of...www.3pJobs.com ....on 14 Nov 1997 !

Date written : 01 Dec 1996

Uploaded : 03 Nov 2016

--------------------------------------------------------------------------------------------------------------------------------

What are these software ? What will they do ? How will they help us ? How will they help our clients / candidates ?

ARDIS :

This software will break up / dissect a Resume into its different constituents such as ,

# Physical information ( data ) about a candidate ( Executive )

# Academic information about a candidate

# Employment Record ( Industry / Function / Products / Services , wise )

# Salary

# Achievements / Contributions

# Attitudes / Attributes / Skills / Knowledge

# His preferences with respect to Industry / Function / Location

In fact , if every candidate was to fill in our EDS ( Executive Data Sheet ) , the info would automatically fall into " proper " slots / fields since our EDS forces a candidate to " dissect " himself into various compartments

But ,

Getting every applicant / executive to fill in our standardized EDS is next to impossible - and , may not even be necessary

Executives ( who have already spent a lot of time and energy preparing / typing their bio-data ), are most reluctant to sit down once more and spend a lot of time once again , to furnish us the SAME information / data in our neatly arranged blocks of EDS . For them , this duplication is a WASTE of TIME !

EDS is designed for our ( information handling / processing / retrieving ) convenience and that is the way he perceives it ! Even if he is vaguely conscious that this ( filling in of EDS ) would help him in the long run , he does NOT see any IMMEDIATE BENEFIT from filling this - hence , reluctant to do so

We , too have a problem - a " COST / TIME / EFFORT " problem

If we are receiving 100 bio-data each day ( this should happen soon ) , whom to send our EDS and whom NOT to ?

This can be decided only by a SENIOR executive / consultant , who goes through each and every bio data , DAILY , and reaches a conclusion as to ,

* which resumes are of " interest " & need sending an EDS

* which resumes are " marginal " or not of immediate interest , where we need not spend time / money / energy

of sending an EDS

We may not be able to employ a number of Senior / Competent Consultants who can scrutinize all incoming bio-data and take this decision on a DAILY basis ! This , itself would be a costly proposition

So ,

On one hand > we have time / cost / energy / effort of sending EDS to everyone ,

On second hand > we have time / cost of several Senior Consultants to separate out " chafe " from " wheat "

NEITHER IS DESIRABLE !

But ,

from each bio data received daily , we still need to DECIPHER , and drop into relevant slots / fields , RELEVANT DATA / INFORMATION , which would enable us to ,

# Match a candidate's profile with " Client Requirement Profile " against specific requests

# Match a candidate's profile against " Specific Vacancies " that any Corporation ( client or not ) , may post on

our VACANCY BULLETIN BOARD ( un-advertized vacancies )

# Match a candidate's profile against " Most Likely Companies who are likely to hire / need such an executive " ,

using our CORPORATE DATABASE , which will contain info such as , PRODUCTS / SERVICES of each and every

Company

# Convert each bio data received into a RE-CONSTITUTED BIO DATA ( Converted Bio data ) , to enable us to send

it out to any client / non-client organization , at the click of a mouse

# Generate ( for commercial / profitable exploitation ) , such by-product services as ,

* Compensation Trends

* Organization Charts

* Job Descriptions....etc

# Permit a candidate to log into our DATABASE and remotely modify / alter his bio data

# Permit a client ( or a non-client ) , to log into our DATABASE and remotely conduct a SEARCH

ARDIS is required on the assumption that , for a long time to come , " TYPED BIO DATA " would form a major source of our database

Other sources , such as

* Duly filled in EDS ( hard copy )

* EDS on a floppy

* Downloading EDS over Internet ( or Dial-Up phone lines ) , and uploading after filling in ( like Intellimatch ),

will continue to play a minor role in foreseeable future

HOW WILL ARDIS WORK ?

Step # 1

Receive typed Bio Data

Step # 2

Scan bio data

Step # 3

Create BIT-MAP image

Step # 4

Using OCR , convert to ASCII ( using PageMaker )

Convert to English characters ( by comparison )

Step # 5

OWR / Optical Word Reader

Convert to English language WORDS , to create a Directory of Keywords ( using ISYS )

Compare with KEY-WORDS , stored in WORD DIRECTORY of " Most Frequently Used " WORDS in 3,500 converted bio-data ( ISYS analysis )

Step # 6

OPR / Optical Phrase Reader

Pick out " Phrases " and create DIRECTORY of " Key Phrases " ( ARDIS )

* Detect " Pre-fixes " & " Suffixes " used with each KEY WORD that go to make up " Most Frequently Used

PHRASES "

* Calculate " Occurrence Frequency "

* Calculate " Probability " of each Occurrence

* Create " Phrase Directories " for comparison

Step # 7

OSR / Optical Sentence Reader

Pick out " Sentences " & create , Directory of " KEY SENTENCES "

Most commonly used VERBS / ADVERBS / PREPOSITIONS , with each " Key Phrase " to create Directory of KEY SENTENCES

TO RECAPITULATE :

ARDIS will ,

* Recognize " Characters "

* Convert to " WORDS "

* Compare with 6,258 key words which we have found in 3,500 converted Bio Data ( using ISYS ) .

If a " Word "

has not already appeared ( > 10 times ) in those 3500 bio data , then its " chance " ( probability ) of occurring

in the next bio data , is very very small indeed

But even then ,

ARDIS software will store in memory , each " Occurrence " of each Word ( old or new / first time or a thousandth time ) ,

And ,

will continuously calculate its " Probability of Occurrence " as :

P = [ No of Occurrence of the given word so far ] .. divided by... { Total No of occurrence of all the words in the

in the entire population so far }

So that ,

By the time we have SCANNED , 10,000 bio data , we would have literally covered ALL the words that have , even a small PROBABILITY of OCCURRENCE !

So , with each new bio data " scanned " , the " probability of occurrence " of each word is getting , more and more accurate !

Same logic will hold for,

* KEY PHRASES

* KEY SENTENCES

The " Name of the Game " is : Probability of Occurrence

As someone once said :

If you allow 1000 monkeys to keep on hammering keys of 1000 type-writers , for 1000 years , you will , at the end find that , between them , they have " re-produced " , the entire literary works of Shakespeare !

But today , if you store into a Super Computer ,

* all the words appearing in English language ( incl Verbs / Adverbs / Adjectives ..etc )

* the " Logic " behind construction of English language ,

then ,

I am sure , the Super Computer could reproduce the entire works of Shakespeare , in 3 MONTHS !

And , as you would have noticed , ARDIS is a " SELF LEARNING " type of software !

The more it reads ( scans ) , the more it learns ( memorizes words , phrases & even sentences )

Because of its SELF LEARNING / SELF CORRECTING / SELF IMPROVING , capability , ARDIS gets better & better equipped to detect , in a scanned bio data ,

* Spelling Mistakes ( wrong WORD )

* Context Mistakes ( wrong Prefix or Suffix )

* Preposition Mistakes ( wrong PHRASE )

* Verb / Adverb Mistakes ( wrong SENTENCE ),

With minor variations ,

- ALL Thoughts , Words ( written ) , Speech ( spoken ) and Actions , keep on " repeating " again and again and again

It is this REPETITIVENESS of Words , Phrases , and Sentences in Resumes , that we plan to exploit

In fact ,

by examining & memorizing the several hundred ( or thousand ) " Sequences " in which the words appear , it should be possible to " Construct " the " Grammar " ie: the logic behind the sequences

I suppose , this is the manner in which the experts were able to unravel the " meaning " of hierographic inscriptions on Egyptian tombs .

They learned a completely strange / obscure language by studying the " Repetitiveness " & " Sequential " occurrence of unknown characters

===============================================================\

Added on 11 JULY 2022 :

LaMDA: our breakthrough conversation technology

( 18 May 2021 )

Extract :

LaMDA’s conversational skills have been years in the making. Like many recent language models, including BERT and GPT-3, it’s built on Transformer, a neural network architecture that Google Research invented and open-sourced in 2017. That architecture produces a model that can be trained to read many words (a sentence or paragraph, for example), pay attention to how those words relate to one another and then predict what words it thinks will come next.

But unlike most other language models, LaMDA was trained on dialogue. During its training, it picked up on several of the nuances that distinguish open-ended conversation from other forms of language. One of those nuances is sensibleness. Basically: Does the response to a given conversational context make sense? For instance, if someone says:

“I just started taking guitar lessons.”

You might expect another person to respond with something like:

“How exciting! My mom has a vintage Martin that she loves to play.”

That response makes sense, given the initial statement. But sensibleness isn’t the only thing that makes a good response. After all, the phrase “that’s nice” is a sensible response to nearly any statement, much in the way “I don’t know” is a sensible response to most questions. Satisfying responses also tend to be specific, by relating clearly to the context of the conversation. In the example above, the response is sensible and specific.

LaMDA builds on earlier Google research, published in 2020, that showed Transformer-based language models trained on dialogue could learn to talk about virtually anything. Since then, we’ve also found that, once trained, LaMDA can be fine-tuned to significantly improve the sensibleness and specificity of its responses.

==============================================================

HOW TO BUILD DIRECTORIES OF " PHRASES " ?

From 6252 words , let us pick any word , say : ACHIEVEMENT

Now we ask the software to scan the Directory containing 3500 converted Bio Data , with instruction that every time the word " Achievement " is spotted , the software will immediately spot / record the " prefix " .

The software will record , ALL the words that appeared before " Achievement " as also the " Number of times " each of this prefix appeared

Word = ACHIEVEMENT

Prefix found......................... No of times found ( Occurrence )................ Probability of Occurrence

--------------------------------------------------------------------------------------------------------------------------------

* Major........................................ 10.................................................. 10 / 55 =

* Minor......................................... 9................................................... 9 / 55 =

* Significant................................... 8 .................................................... 8 / 55 =

* Relevant.................................. 7 ................................................... 7 / 55 =

* True.......................................... 6 ................................................. 6 / 55 =

* Factual........................................ 5

* My ........................................ 4

* Typical .................................. 3

* Collective................................... 2

* Approximate................................. 1

--------------------------------------------------------------------------------------------------------------------------------

TOTAL NO OF OCCURRENCES.......... 55...................................................... ( Total Probability ) 1.000

--------------------------------------------------------------------------------------------------------------------------------

As more and more bio data are scanned ,

* The Number of " Prefixes " will go on increasing

* The Number of " Occurrences " of each prefix will also go on increasing

* The overall " population size " will also go on increasing

* The " Probability of Occurrence " of each prefix will go on getting more and more accurate ie; more and more

representative

This process can go on and on and on ( as long as we keep on scanning bio data )

But " Accuracy Improvements " will decline / taper off , once a sufficiently large number of prefixes ( to the word , ACHIEVEMENT ), have been accumulated . Saturation will take place !

The whole process can be repeated with the WORDS that appear as " SUFFIXES " to the word " ACHIEVEMENT "

And the probability of occurrence of each " Suffix" , also determined

Word = ACHIEVEMENT

--------------------------------------------------------------------------------------------------------------------------------

Suffix ............................ No of Times Found....................................... Probability of Occurrence

--------------------------------------------------------------------------------------------------------------------------------

* Attained.......................... 20 ..

............................................................. 20 / 54

* Reached....................... 15...................................................................15 / 54

* Planned..................... 10 ................................................................... 10 / 54

* Targeted.................. 5................................................................... 5 / 54

* Arrived........................ 3 .................................................................... 3 / 54

* Recorded ................... 1 ...................................................................... 1 / 54

--------------------------------------------------------------------------------------------------------------------------------

TOTAL OF ALL OCCURRENCES .... 54 ( Population Size ).. Total Probability ........... 1.000

-------------------------------------------------------------------------------------------------------------------------------

Having figured out the " Probabilities of Occurrences " of each of the prefixes and each of the suffixes ( to a given word , - in this case , ACHIEVEMENT ) , we could next tackle the issue of " a given combination of prefix and suffix "

eg;

What is the probability of :

* Prefix = " Major " / Word = ACHIEVEMENT / Suffix = " Attained " ?

Why is all of this Statistical exercise required ?

If we wish to stop at merely " Deciphering " a resume , then I don't think , we need to go through this

For mere " Deciphering " , all we need is to create a KNOWLEDGE BASE of :

* Skills

* Knowledge

* Attitudes

* Attributes

* Industries

* Companies

* Functions

* Edu Qualifications

* Products / Services

* Names ...etc

Having created the " knowledge base " , simply scan a bio data , recognize " words " , compare with the words contained in the " knowledge base " , find CORRESPONDENCE / EQUIVALENCE , and allot / file each scanned word into respective " Fields " against each PEN ( Permanent Executive Number )

PRESTO !

You have dissected and stored the MAN , in appropriate boxes !

Our EDS has these " boxes " . Problem is manual data entry

The data entry operator ,

- searches appropriate " word " from appropriate " EDS Box " and transfers to appropriate screen

To eliminate this manual ( time consuming operation ) , we need ARDIS

We already have a DATA BASE of 6500 words

All we need to do is to write down against each word , whether it is a ,

* Skill

* Attribute

* Knowledge

* Edu

* Product

* Company

* Location

* Industry

* Function etc

The moment we do this , what was a mere " Data base " , becomes a " Knowledge Base " , ready to serve as a " COMPARATOR "

And as each NEW bio data is scanned , it will throw up words for which there is no " Clue "

Each such NEW word will have to be manually " Categorized " and added to the " Knowledge base "

Then what is the advantage of calculating for ,

* each WORD

* each SUFFIX

* each PREFIX

* each PHRASE

* each SENTENCE ,

- its probability of occurrence ?

The ADVANTAGES are :

# 1

Detect " unlikely " prefixes / suffixes

Suppose ARDIS detects , " Manor Achievement "

ARDIS finds that the " probability " of ,

* " Manor " as prefix to ACHIEVEMENT , is 0.00009 ( say , NIL )

hence , the CORRECT prefix has to be ,

* " Major " ( and not " Manor " ) , for which , the probability is ( say ) ... 0.4056

# 2

ARDIS detects words " Mr HANOVAR "

It recognizes this as a spelling mistake and corrects automatically to , " Mr HONAVAR "

OR,

it reads , place of birth as " KOLHAPURE "

It recognizes it as " KOLHAPUR " , or vice versa , if it says " My name is KOLHAPUR , "

# 3

Today , while scanning ( using OCR ) , when a mistake is detected , it gets highlighted on the screen or an asterisk / underline starts blinking

This draws attention of the operator , who manually corrects the " mistake " , after consulting a dictionary or his own knowledge base

Once ARDIS has calculated the probabilities of lakhs of words and even the probabilities of their " Most likely sequence of occurrences " , then , hopefully the OCR can " self - correct " any word or phrase , without operator intervention

So the scanning accuracy of OCR should eventually become 100 % and not 75 % - 85 % as at present

# 4

Eventually , we want that ,

- a bio data is scanned , and automatically

- re-constitutes itself into our converted BIO DATA FORMAT

This is the concept of ARGIS ( automatic resume generating intelligence software )

Here again , the idea is to eliminate the manual data entry of the entire bio data - our Ultimate Goal

But ARGIS is not possible without first installing ARDIS ,

and that too with the calculation of the " Probability of Occurrence " as THE MAIN FEATURE of the software

By studying and memorizing and calculating the " Probability of Occurrence " of lakhs of words / phrases / sentences , ARDIS actually " learns " English grammar through " Frequency of Usage "

And it is this Knowledge Base which enables ARGIS to re-constitute a bio data ( in our format ) , in a GRAMMATICALLY CORRECT way

Thursday 3 November 2016

Artificial Resume Deciphering Intelligent Software ( ARDIS )

LaMDA: our breakthrough conversation technology

No comments:

Post a Comment