10.1.1.57

Please download to get full document.

View again

of 6
5 views
PDF
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Second International Conference on Audio and Video-based Biometric Person Authentication (AVBPA'99), Washington D.C, 1999 XM2VTSDB: The Extended M2VTS Database K. Messer, J. Matas, J. Kittler University of Surrey Guildford, Surrey, GU2 5XH, UK. J. Luettin, G. Maitre IDIAP CP 592, 1920 Martigny, Switzerland. of video and audio signals is in the order of TBytes (1000 GBytes); technology allowing manipulation and e ective use of such amounts of data has only recently become available in the form o
Document Share
Documents Related
Document Tags
Document Transcript
  XM2VTSDB:TheExtendedM2VTSDatabase  K.Messer,J.Matas,J.KittlerJ.Luettin,G.Maitre UniversityofSurreyIDIAP Guildford,Surrey,GU25XH,UK.CP592,1920Martigny,Switzerland.  SecondInternationalConferenceonAudioandVideo-basedBiometricPersonAuthentication(AVBPA'99),WashingtonD.C,1999  Abstract  Inthispaperwedescribetheacquisitionandcon- tentofalargemulti-modaldatabaseintendedfor trainingandtestingofmulti-modalvericationsys- tems.TheXM2VTSDBdatabaseoerssynchron- isedvideoandspeechdataaswellasimagese- quencesallowingmultipleviewsoftheface.Itcon- sistsofdigitalvideorecordingstakenof295hun- dredsubjectsatonemonthintervalstakenovera periodofvemonths.Wealsodescribeaprotocol  forevaluatingvericationalgorithmsonthedata- base.Thedatabasehasbeenmadeavailabletoany- oneonrequesttotheUniversityofSurreythrough http://www.ee.surrey.ac.uk/Research/VSSP/xm2vtsdb.  1Introduction  Theuseofbiometricmeasurementsinsecurityap- plicationsisbecomingcommontoalevelwherea dedicatedjournal1]monitorsthedevelopmentsin thearea.Extremelyreliablemethodsofbiometric personalidenticationexist,e.g.ngerprintana- lysis,retinaloririsscans.Butmostofthesemeth- odsareconsideredunacceptablebyusersinallbut high-securityscenarios.Personalidenticationsys- temsbasedtheonanalysisofspeech,frontalorprole imagesoffacearenon-intrusiveandthereforeuser- friendly.Moreover,personalidentitycanbeoftenas- certainedwithoutclient'sassistance.However,the speechandimage-basedsystemsarelessrobustto imposterattack,especiallyiftheimposterpossesses informationaboutaclient,eg.aphotographorare- cordingofclient'sspeech.Multi-modalpersonalveri- cationisoneofthemostpromisingapproachesto user-friendly(henceacceptable)highlysecurepersonal vericationsystems6]. Recognitionandvericationsystemneedtraining; thelargerthetrainingset,thebettertheperform- anceachieved8].Thevolumeofdatarequiredfor trainingamulti-modalsystembasedontheanalysis ofvideoandaudiosignalsisintheorderofTBytes (1000GBytes);technologyallowingmanipulationand eectiveuseofsuchamountsofdatahasonlyrecently becomeavailableintheformofdigitalvideo. TheM2VTSprojectwassetuptoaddresstheprob- lemofsecuredaccesstobuildingsormulti-mediaser- vicesbytheuseofautomaticpersonvericationbased onsuchmulti-modalstrategies.Inorderforthepro-  jectpartnerstoreliablyandrobustlytrain,testand comparetheiralgorithmsalargemulti-modaldata- basewasrequired.Weareatpresentawareofonlytwo publiclyavailablemediumorlargescalemulti-modal databases,thedatabasecollectedwithintheM2VTS project,comprising37subjects4]andtheDAVID-BT database2].Asurveyofaudiovisualdatabasespre- paredbyChibelushiet.al.listmanyothers,butthese areeithermono-modalorsmall7].Fromthepoint ofviewofthedatabasesizeDAVID-BTiscompar- ablewiththeM2VTSdatabase:31clients-5sessions. However,thespeechpartofDAVID-BTissignicantly largerthanthatofM2VTSDB.Ontheotherhand- thequalityandreproducibilityofthedataavailable onanSVHStapeislow.Itisforthesereasonsthatit wasdecidedtocapturealargeaudio-visualdatabase, theXM2VTSDB,usinghighqualitydigitalvideo. Thepaperisorganisedasfollows.Inthenextsec- tionwedenethedatabasespecication.Thedata- baseacquisitionsystemusedisdescribedinsection 3.InSection4thecontentofthespeechshotisde- scribed.Thenthecontentoftheheadrotationshotis presented.InformationabouttheXM2VTSDBpro- tocoldesignedfortrainingandtestingpersonaliden- tityvericationalgorithmsisgiveninSection6.In Section7wegiveinformationonhowthedatabaseis distributedbeforereachingsomeconclusions.  2DatabaseSpecication  ThedesignofXM2VTSDBwasbasedontheexper- iencegainedasaresultofrecordingandexperimenting withtheM2VTSdatabase.Thedatabaseisprimar- ilyintendedforresearchanddevelopmentofpersonal identityvericationsystemswhereitisreasonableto   assumethattheclientwillbecooperative.Thebio- metricdatatoberecordedisofthetypethatwould normallybeeasilyacquiredduringanormalaccess claimintercoursebetweenanaccesspointsystemand theclient.Thesystemmayrequestfromtheclient someclientspecicinformation.Generallyitwillen- gagetheuserinasimpledialogueandrequestsimple taskstobeperformedwhichwillintroducesomesub-  jectdynamicsintotheintercoursesessionfromwhich usefulimagesequenceinformationcanbeextracted. Thescenarioadoptedreectedtheaboveconsid- erations.Weassumedthatadialogueofsome30 secsdurationwouldbeperfectlyacceptableinprac- ticalsituations.Thedialoguewassimulatedbyask- ingthesubjectstoutterapredenedsentence.Each subjectwasalsoaskedtomovehis/herheadasmight benecessaryforinstancetoreadsomenoticeorin- structions.Ourobjectivewastoinducethesubjects tomakeextremeheadrotationmovements,sothatwe canalsoextractheadsideprole.Inanoperational scenarioasideviewcameracouldbeusedtocapture thiskindofbiometricinformationinstead. Inordertocapturenaturalvariabilityofclients causedbychangesinphysicalcondition,hairstyle, dress,andmood,subjectswererecordedinfoursep- aratesessionsuniformlydistributedoveraperiodof5 months. Acontinuousvideorecordingwasmadeofeachsub-  ject,ratherthanafewsnapshotsfromeachrecording session,asvideodatanotonlyfacilitatescertainim- ageprocessingtaskssuchasheadsegmentation,eye detection,butmostimportantly,itisasourceofmul- tiplebiometricmodalities.Theseincludelipdynamics andface3Dsurfacemodelling.Continuousvideoalso supportsvericationofspeech/lipshapecorrelation andspeech/lipsignalsynchronisation. Thesubjectswereselectedtoincludeadultsofboth sexesandofdierentages.Aspeoplewearingglasses maybeinterestedingainingaccesstoserviceswith glassesonoro,bothinstanceswouldhavetobe presenttodeveloprobustalgorithms. Agoodqualityconsumermarketdigitalcamcorder wasusedtorecordthedatabase.Thisparticular choicehadbeenmadeonthegroundsthatthecamera systeminanalproductwouldhavetobeoflowcost. Thestateoftheartconsumerproductstodaywillbe lowcostproductstomorrow.Withgoodqualityre- cordingsonecaneasilyperformexperimentsonlower qualityvideowhichcanbeobtainedbyvariouspro- cessesofdegradation(blurring,noisecontamination, colourdistortion,decreaseinspatialandtemporalres- olution,reduceddynamicrangeandgreylevelresolu- tion). Thesedesignprinciplesaresimilartothoseadop- tedfortheM2VTSdatabase.Themaindierence betweenXM2VTSDBandM2VTSisinthesizeof thedatabaseandinthenumberofrecordingstaken foreachsubjectduringeachsession.Thesizeofthe M2VTSdatabase(37subjects)wasreasonablyrep- resentativeofsomeapplicationscenarios.However, evenforclientpopulationsofsuchmoderatesize,any impostertestsshouldbecarriedoutonasignicantly largerdatabase.Therearealsoapplicationswherethe databaseofclientswouldbeoftheorderofhundreds ratherthantens.Theseconsiderationsledtothecon- clusionthatanorderofmagnitudeincreaseinsizeof theM2VTSdatabasewaswarranted.Inviewofthe hugequantityofdata,atargetofsome300subjects wasthusaimedat. WhereasM2VTSdatabasecomprised5dierent shotsatdistinctsessionsrecorderoveraperiodof threemonths,thenewdatabaseisconstitutedby8 shotsrecordedinfourdistinctsessions.Thuseachses- sioncontainstworepetitionsofthespeciedsentence. Themainmotivationforthisrecordingpolicywasto increasethenumberofspeechrecordsforeachsub-  jecttofacilitatevericationalgorithmdevelopment andfusion.Thedesignandtrainingofanalgorithm requiresdatafromseveralrecordstoencapsulatein- traclientvariability.Furtherindependentdatamay berequiredforfeatureselectionandforthedesign ofthesupervisorinmultipleexpertfusion.Itisas- sumedthattherepetitionsofthespeechutterancein eachsessionwouldbesucientlydierentforthemto beconsideredasindependentrecordsforexperimental purposes.Thedierencewouldresultnotonlyfrom thenaturalvariabilitythatmightbeexhibitedeven underidenticalconditions,butalsoduetodierent emotionalstatesofthesubjectsduringthetwocon- secutiveattempts. Thedatabaseacquisitioncommencedwithapopu- lationof360volunteersbutthroughnaturalwastage only295completedthefoursessions.Oneachvisit (session)tworecordingsweremade:aspeechshot andheadrotationshot.Thespeechshotconsisted offrontalfacerecordingofeachsubjectduringthe dialogue.Thesecondpartconsistedofaheadrota- tionshot.Thedataacquiredduringthesetwoshots willbedescribedinmoredetailinSections4and5 respectively.  3TheDatabaseAcquisitionSystem  TheentiredatabasewasacquiredusingaSony VX1000Edigitalcam-corderandDHR1000UXdigital VCR.Thiscapturesvideoatacoloursamplingresolu-   tionof4:2:0and16bitaudioatafrequencyof32kHz. Thevideodataiscompressedatthexedratioof5:1 intheproprietaryDVformat.Thisformatalsodenes aframeaccuratetimecodewhichisstoredonthecas- settealongwiththeaudiovisualdata. Thisvideohardwarecanbeinterfacedtoacom- puterviaarewire(IEEE1394)3]port.Weusedan Intelbased586PCrunningWindows95andconnec- tedittothedigitalvideoequipmentusinganAdaptec AHA8940rewirecard.Softwareutilitieswerethen writtenthatenableausertoremotelycontroltheVCR toframeaccuracysearchingthroughthestoredtime- codesonthecassettes.Routineswerealsowritten thatallowedthecaptureofbothvideoandaudiodata inrealtimetothecomputerharddisk. Whencapturingthedatabasethecamerasettings werekeptconstantacrossallfoursessions.Thehead wasilluminatedfrombothleftandrightsideswith diusiongelsheetsbeingusedtokeepthisillumination asuniformaspossible.Abluebackgroundwasused toallowtheheadtobeeasilysegmentedoutusinga techniquesuchaschromakey.Ahigh-qualityclip-on microphonewasusedtorecordthespeech. Beforeeachvideoshotwasrecordedashortclip- perboardsequencewastakenthatuniquelyidentied thatshot.Thisclipperboardcontainedthesubject uniqueidenticationnumber,thesubjectsname,shot typeandsessionnumber.Alsoontheclipperboard wasacolourtestchartandresolutioncheckerchart. Thisenablesachecktobemadethatthequalityof therecordingsisconsistentacrossthewholedatabase andcouldhelpresolveanypotentialerrors. Therawdatabasecontainsapproximately30hours ofdigitalvideorecordings.Thishasallbeenmanually annotated.Everysubjecthasanindexleforeach ofthefourrecordingsessionswhichcontainthetape numberandtimecodesforselectedkeypointsinthe speechandvideodata. Usingtheinformationintheseindexlesandthe writtensoftwareenableustoindexintothedatabase andautomaticallyretrieveanysubsetofthedatabase andenableustoautomaticallyproduceeditedversions ofthedatabase.  4TheSpeechShot  Afterashortclipperboardsequencewasrecorded thesubjectwasaskedtositinchairandamicrophone wasclippedontotheirshirt.He/shewasthenasked toreadthreesentenceswhichwerewrittenonaboard positionedjustbelowthecamera.Thesubjectswere askedtoreadattheirnormalpace,topausebriey attheendofeachsentenceandtoreadthroughthe threesentencestwice.Thethreesentencesremained thesamethroughoutallfourrecordingsessionsand were 1. 0123456789  2. 5069281374  3. Joetookfathersgreenshoebenchout  Thedigitsinthesecondsentenceareinthesame orderasanotherlargespeechdatabasewhilstthethird sentencewaschosenbecauseitisphoneticallybal- anced. Figures2(a)-(d)showanimagegrabbedforasub-  jectfromeachsession.Thisimagedatacanbeused totrainandtestalgorithmsforfrontalviewauthen- tication.Figures2(e)-(h)showasequenceofimages grabbedfromthevideotakenattherstsessiondur- ingthespeechshot.Thesesequencescanbeusedto trainandtestlip-trackingsystems.Alltheaudiodata fromthisshothavebeengrabbedandplacedintoau- dioleswitheachlecontainingasinglesentence. Thisdatacanbeusedtotrainandtestspeakerveri- cationandrecognitionalgorithms.  5TheHeadRotationShot  Thenextshotconsistedofasequenceofrotating headmovements.Aftertheclipperboardshotthesub-  jectwasaskedtorotatehis/herheadfromthecentre totheleft,totheright,thenup,thendown,nally returningittothecentre.Theyweretoldthatafull side-prolewasrequiredandaskedtorunthroughthe entiresequencetwice. Figures3(a)-(h)showselectedframesfromthisse- quence.Thissequencewaskeptconstantforallfour sessions.Theseimagescanbeusedforproleor3D basedauthentication. Next,ifthesubjectwaswearingglasseshe/shewas askedtoremovethemandashortfrontprolevideo sequencewaslmed.Intotalabout1.5minutesof digitalvideowastakenpersubject,persession.  6EvaluationProtocol  Wehavedenedaprotocolthatmaybeusedto evaluatetheperformanceofvision-andspeech-based personauthenticationsystemsontheXM2VTSDB. Theprotocolisdenedforthetaskofperson  veric- ation  ,whereanindividualassertshisidentity.The vericationsystemcomparesthefeaturesofthatper- sonwithstoredfeaturescorrespondingtotheclaimed identityandcomputestheirsimilarity,whichisre- ferredtoasa  score  .Dependingonthescore,the systemdecideswhethertheidentityclaimistrueor not.Thisauthenticationtaskcorrespondstoan  open   testset  scenariowherepersons,unknowntothesys- tem,mightclaimaccess.Thesubjectswhosefeatures arestoredinthesystem'sdatabasearecalled  clients  whereaspersonsclaimingfalseidentityarereferredto as  impostors  . Thedatabasewasdividedintothreesets:training set,evaluationset,andtestset(seeFig.1).Thetrain- ingsetisusedtobuildclientmodels.Theevaluation setisselectedtoproduceclientandimpostoraccess scoreswhichareusedtonda  threshold  thatdeterm- inesifapersonisacceptedorrejected.Thethreshold canbesettosatisfycertainperformancelevelsonthe evaluationset.Inthecaseofmulti-modalclassiers, theevaluationsetmightalsobeusedtooptimally combinetheoutputsofseveralclassiers.Thetest setisselectedtosimulaterealauthenticationtests. Thethreesetscanalsobeclassiedwithrespectto subjectidentitiesintoclientset,impostorevaluation set,andimpostortestset.Forthisdescription,each subjectappearsonlyinoneset.Thisensuresthereal- isticevaluationofimposterclaimswhoseidentityis unknowntothesystem. Theprotocolisbasedon295subjects,4record- ingsessions,andtwoshots(repetitions)perrecording sessions.Thedatabasewasrandomlydividedinto200 clients,25evaluationimpostors,and70testimpost- ors(See9]forthesubjects'IDsofthethreegroups). Twodierentevaluationcongurationsweredened. Theydierinthedistributionofclienttrainingand clientevaluationdataascanbeseeninFig.1.  6.1PerformanceMeasures  Twoerrormeasuresofavericationsystemarethe  FalseAcceptancerate  (FA)andthe  FalseRejection rate  (FR).Falseacceptanceisthecasewhereanim- postor,claimingtheidentityofaclient,isaccepted. Falserejectionisthecasewhereaclient,claiminghis trueidentity,isrejected.FAandFRaregivenby  FA  =  EI=I    100%  FR  =  EC=C    100%(1) where  EI  isthenumberofimpostoracceptances,  I  thenumberofimpostorclaims,  EC  thenumberof clientrejections,and  C  thenumberofclientclaims. BothFAandFRcanbeinuencedbythethreshold. Thereisatrade-obetweenthetwoerrorrates,i.e. itispossibletoreduceeitherofthemwiththerisk ofincreasingtheotherone.Forthetestsetsofboth protocolcongurations,  I  is112  0  000(70impostors    8shots    200clients)and  C  is400(200clients    2 shots). Vericationsystemperformanceisoftenquotedin  EqualErrorRate  (EER).TheEERcanbeobtained afterafullauthenticationexperimenthasbeenper- formed.Thetrueidentitiesofthetestsubjectsare thenusedtocalculatethethresholdforwhichtheFA andFRareequal.TheEERdoesthereforenotcor- respondtoarealauthenticationscenarioandmight notwellpredicttheexpectedsystemperformance.In practicalapplicationsthethresholdneedstobeseta priori.Animportantmeasurefortheperformanceof asystemisthereforethedeviationoftheFA/FRdis- tributiononatestsetfromanevaluationset.Thisis particularlythecaseforapplicationswheretheFAor FRareconstrainedtolaywithincertainlimits.Itis thereforenotonlyimportanthowlargethesumofFA andFRis,butalsohowtheyaredistributed. Weareinterestedinsimulatingrealapplications andthereforesetthethresholdonthe  Evaluation Data  toobtaincertainfalseacceptance(FAE)and falserejection(FRE)values.FAEandFREcor- respondstoFAandFRobtainedontheevaluation set,respectively.Thesamethresholdwillthenbe usedonthetestset.Sinceapplicationrequirements mightconstraintheFAorFRtostaywithincer- tainlimits,thesystemisevaluatedforthreedierent thresholds  T  correspondingto  FAE  =0,  FRE  =0, and  FAE  =  FRE  :  T  FAE  =0  =argmin  T  (  FRE  j  FAE  =0)  T  FAE  =  FRE  =(  T  j  FAE  =  FRE  )  T  FRE  =0  =argmin  T  (  FAE  j  FRE  =0) (2) Onetestthusconsistsofatotalof6scores:  FA  FAE  =0  FR  FAE  =0  FA  FAE  =  FRE  FR  FAE  =  FRE  FA  FRE  =0  FR  FRE  =0  (3) Foreachgiventhreshold,the  TotalErrorRate  (TER)canbeobtainedasthesumofFAandFR:  TER  FAE  =0  =  FA  FAE  =0  +  FR  FAE  =0  TER  FAE  =  FRE  =  FA  FAE  =  FRE  +  FR  FAE  =  FRE  TER  FRE  =0  =  FA  FRE  =0  +  FR  FRE  =0  (4)  7Distribution  Initiallyaneditedversionofthedatabasewillbe madeavailableasaset201-hourMini-DVcassettes. Onespeechshot(3sentences)andoneheadrotation shotforall295subjectsacrossall4sessionswillbe onthissetoftapes.Therelevantindexleswillalso beprovided.Tousethedatabaseinthisformthe userwillrequireadigitalvideorecorder/cameraand acomputerwithappropriatesoftware. 
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks