Protein Sequence and Structure of Organic Acid

12  Download (0)

Full text

(1)

INFECTION ANDIMMUNITY,Apr. 1992, p. 1302-1313 0019-9567/92/041302-12$02.00/0

Copyright C)1992, AmericanSociety forMicrobiology

Cloning, Expression, and DNA Sequence Analysis of Genes Encoding Nontypeable Haemophilus

influenzae

High-Molecular-

Weight Surface-Exposed Proteins Related to Filamentous Hemagglutinin of Bordetella pertussis

STEPHEN J. BARENKAMPl* ANDELIZABETH LEININGER2

Edward MallinckrodtDepartment of Pediatrics, Washington UniversitySchoolof Medicine, and Divisionof InfectiousDiseases, St. Louis Children's Hospital, St. Louis, Missouri 63110,1 andLaboratory of

CellularPhysiology, Division of Bacterial Products, Center for Biologics Evaluation and Research, Food and Drug Administration, Bethesda, Maryland2

Received 4 November 1991/Accepted 10January1992

A group of high-molecular-weight surface-exposed proteins of nontypeable Haemophilus influenzae are

majortargetsof humanserumantibody (S. J. Barenkampand F. F.Bodor,Pediatr. Infect. Dis.J. 9:333-337, 1990). To further characterize these proteins, we cloned and sequenced genes encoding two related high-molecular-weight proteins from a prototype nontypeable Haemophilus strain. The gene encoding a

120-kDaHaemophilus protein consisted ofa 4.4-kbpopen readingframe, and the geneencodinga 125-kDa protein consisted ofa 4.6-kbp open reading frame. The first 1,259 bp of the two genes were identical.

Thereafter, thesequences begantodiverge,but overalltheywere80%identical, and the derived amino acid

sequences showed 70% identity. A protein sequence homologysearch demonstrated similarity between the derivedamino acidsequences of both clonedgenesand the derived amino acidsequenceofthegeneencoding filamentous hemagglutinin, a surface protein produced by the gram-negative pathogen Bordetella pertussis.

Antiserum raised againstarecombinantprotein encodedbythe4.6-kbpopenreadingframerecognizedboth the 120- and the 125-kDaproteinsintheprototype strainaswellasantigenicallyrelatedhigh-molecular-weight proteins in 75% ofa collection of 125 epidemiologically unrelated nontypeableH. influenzae strains. The antiserum directed againstthe recombinant protein also recognized purified filamentous hemagglutinin. A murine monoclonal antibody to filamentous hemagglutinin recognized both the 120-kDa and the 125-kDa protein inthe prototype strain as well as proteins identical to thoserecognized by therecombinant-protein antiserum in35% of the nontypeableH. influenzae strain collection. Thus,we have identifiedand partially characterizedagroupofhighly immunogenic surface-exposed proteinsofnontypeableH.influenzaewhichare

relatedtothefilamentoushemagglutinin ofB.pertussis.

NontypeableHaemophilusinfluenzaeareunencapsulated organisms which commonly inhabit the upper respiratory tract of humans. These organisms are a frequent cause of mucosal surface infections, such as otitis media, sinusitis, and bronchitis, and occasionally cause invasive diseases such as bacteremia and meningitis (22). The interactions between theorganismand the host whichdetermine whether colonization will occur and whether disease will develop followingcolonizationarenotcompletely understood. How-

ever,hostimmunity is thoughtto playan important role in this sequence of events, and antibodies directed against surfaceantigens of theseorganismsarethoughttobe central tohostprotection (9).

Thesurface antigensofnontypeableH. influenzaeagainst which protective antibodies are directed have not been completely defined. Bacterial outer membrane proteins ap- peartobetargetsofbactericidal antibody (9), andtwoofthe major outermembrane proteins ofnontypeable H. influen-

zae,designated P2 and P6, have beenidentifiedastargetsof human serum bactericidal antibody (23, 24). Other outer

membrane antigens are also likely to be targetsof bacteri- cidal antibodies. We previously identified a group of high- molecular-weight surface-exposed proteinsas majortargets of human serum antibody and observed a correlation be-

*Corresponding author.

tween the development ofserumbactericidal antibody and the appearance ofantibody directed against these proteins (1). The structure and function of these high-molecular- weight proteins were unclear fromthese earlier studies. In

an effortto further characterize these proteins, wecloned, expressed, andsequencedthegenesencodingtwoimmuno- dominant high-molecular-weight proteins from a prototype nontypeable Haemophilus strain. The nucleotidesequences

of the twocloned genes were different but closely related, and theproteinsencoded bythesegenes wereantigenically relatedtofilamentoushemagglutinin,asurfaceprotein found

on Bordetellapertussis. Furthermore, antigenically related high-molecular-weight proteinswerepresentinthe majority ofheterologous nontypeableH. influenzae strains.

MATERIALSANDMETHODS

Bacterial strains, plasmids, and phages. Nontypeable H.

influenzaestrain 12wasthe clinical isolatechosen for study.

Theorganismwas isolated in pure culture from the middle

ear fluid ofa child with acute otitis media. The strainwas

identifiedasH. influenzae by standardmethods (11) andwas

classified asnontypeable by its failure toagglutinate with a

panel of typing antisera for H. influenzae types a to f (Burroughs Wellcome Co., Research Triangle Park, N.C.) andfailuretoshowlines ofprecipitation with these antisera incounterimmunoelectrophoresis assays(42). Furthermore,

1302

Vol.60,No. 4

(2)

HAEMOPHILUS HIGH-MOLECULAR-WEIGHT PROTEIN GENES 1303 genomic DNA purified from strain 12 failed to hybridize with

a plasmid probe,pUO38,which contains DNA fromthecap region of type b H. influenzae (7).

An additional 125 epidemiologically unrelated nontype- able H. influenzae strains were chosen for the studies of strain heterogeneity. These isolates were recovered during episodes of clinical disease from children and adults in Canada, Massachusetts, Maryland, Connecticut, Missouri, New York, Ohio, Texas, and Alabama. Of these isolates, 100 were cultured from middle ear effusions collected by tympanocentesis, 10 were from blood, 10 were from sputum, and 5 were from cerebrospinal fluid. Approximatelyhalf of these isolates were described previously (25). The bulk of the remainder were isolated from middle earaspirates from children with acute otitis media, either by oneof theauthors (S.J.B.) or by his collaborators.

M13mp18 and M13mp19 were obtained from New En- gland BioLabs, Inc. (Beverly, Mass.). XEMBL3 arms and Escherichia coli LE392 were obtained from Stratagene (La Jolla, Calif.). pT7-7 was the kind gift ofStanleyTabor (37).

E. coli BL21(DE3)/pLysS was a gift from F.WilliamStudier.

Strain BL21(DE3) contains a single copy of the T7 RNA polymerase gene under the control of the lac regulatory system (36). Plasmid pLysS contains the T7lysozyme gene.

T7lysozyme binds to the T7 RNApolymerase in vitro, and pLysS stabilizes many toxic T7expression constructs, pre- sumably by binding and inactivatingthe lowquantitiesofT7 RNA polymerase producedbyBL21(DE3)inthe absence of isopropyl-3-D-thiogalactopyranoside (IPTG) (21).

Molecular cloning and plasmid subcloning. Chromosomal DNA from strain 12 was preparedby a modification ofthe method of Marmur (18). Sau3A partial restriction digests of chromosomal DNA were prepared and fractionated on su- crose gradients. Fractions containing DNA fragmentsinthe 9- to 20-kbp range were pooled, and a library was prepared by ligation into XEMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack (Stratagene) and plate- amplified in a P2 lysogen of E. coli LE392. Lambda plaque immunological screening was performed as described by Maniatis et al. (17). For plasmid subcloning studies, DNA from representative recombinant phage was subcloned into the T7 expression plasmid pT7-7 (37). This vector contains the T7 RNA polymerase promoter ,10, a ribosome-binding site, and thetranslationalstart site for the T7 gene 10protein upstreamfrom amultiplecloning site (see Fig. 2B).Standard DNA methods for manipulation of cloned DNA were as describedbyManiatis et al. (17) or Silhavyet al. (32).

DNA sequenceanalysis. DNAsequence analysiswas per- formed by the dideoxy method with the U.S. Biochemicals Sequenasekit assuggestedby themanufacturer. [35S]dATP waspurchased from New England Nuclear (Boston, Mass.).

Data were analyzed with Compugene software (2) and the Genetics Computer Group program from the University of Wisconsin (5) on a Digital VAX 8530 computer. Several 21-meroligonucleotide primers weregeneratedas necessary tocompletethesequences. Both strandsofthe HMW1 gene and a singlestrand oftheHMW2 gene were sequenced.

Analytical techniques. Western immunoblot analysis was performed to identify the recombinant proteins being pro- duced by reactive phage clones. Phage lysates grown in LE392 cells or plaques picked directly from a lawn of LE392 cells on YT plates were solubilized in gel electrophoresis sample buffer prior to electrophoresis. Sodium dodecyl sulfate (SDS)-polyacrylamide gel electrophoresis was per- formed on 7.5% or 11% polyacrylamide modified Laemmli gels asdescribed byLugtenberget al. (16). After transfer of

theproteinstonitrocellulose sheets, the sheets were probed sequentially with an E. coli-absorbed human serum sample containing high-titer antibody to the high-molecular-weight proteins and then with alkalinephosphatase-conjugated goat anti-humanimmunoglobulinG(IgG) second antibody (Tago, Burlingame, Calif.). In previous studies,we noted that sera from healthy adults contain high-titer antibody directed against surface-exposed high-molecular-weight proteins of nontypeable H. influenzae (1). Serum from one of these adults was used as the screening antiserumafterhavingbeen extensively absorbed with LE392 cells.

To identify recombinant proteins being produced by E.

coli transformedwithrecombinant plasmids, theplasmids of interest were used to transform E. coli BL21(DE3)/pLysS.

The transformed strains were grown to anA6w of0.5 in L brothcontaining 50,ug of ampicillin per ml. IPTG wasthen addedto 1 mM. One hour later, cells were harvested, and a sonicate of the cells was prepared. The protein concentra- tions of the sampleswere determined bythe bicinchoninic acid method (33) (BCA protein assay kit; Pierce Chemical Co., Rockford, Ill.) in accordance with the manufacturer's instructions. Cell sonicates containing100 ,ug of total protein were solubilized inelectrophoresissamplebuffer, subjected to SDS-polyacrylamidegelelectrophoresis, andtransferred to nitrocellulose. The nitrocellulose was then probed se- quentially with theE. coli-absorbedadult serumsample and then with alkaline phosphatase-conjugated goat anti-human IgG second antibody.

Western immunoblot analysis was also performed to de- termine whether homologous andheterologous nontypeable H. influenzae strains expressed high-molecular-weight pro- teins antigenically related to the protein encoded by the cloned HMW1 gene (rHMW1). Cell sonicates of bacterial cells were solubilized inelectrophoresis samplebuffer, sub- jected toSDS-polyacrylamidegelelectrophoresis,and trans- ferred to nitrocellulose. Nitrocellulose was probed sequen- tially with polyclonal rabbit rHMW1 antiserum (see below) and then with alkaline phosphatase-conjugated goat anti- rabbit IgG second antibody (Bio-Rad, Gaithersburg, Md.).

Finally, Western immunoblot analysis was performed to determine whether nontypeable Haemophilus strains ex- pressed proteins antigenically related to the filamentous hemagglutinin protein of Bordetellapertussis. Monoclonal antibodyX3C, a murine immunoglobulin G (IgG) antibody which recognizes filamentous hemagglutinin (13), wasused to probe cell sonicates byWestern blot. An alkaline phos- phatase-conjugated goat anti-mouse IgG second antibody (Tago) was used for detection.

Preparation of HMW1 recombinant-protein antiserum. E.

coli BL21(DE3)/pLysS was transformed with pHMW1-4, and expression of recombinant protein was induced with IPTG, as described above. A cell sonicate of the bacterial cells was prepared and separated into a supernatant and pelletfraction bycentrifugationat10,000xg for 30 min. The recombinant protein fractionated with the pellet fraction. A rabbit wassubcutaneously immunized on abiweekly sched- ule with 1 mg ofprotein from the pellet fraction, the first dose given with Freund's completeadjuvantandsubsequent doses with Freund's incomplete adjuvant. Following the fourth injection, the rabbit was bled. Prior to use in the Western blotassay, theantiserum was absorbedextensively with sonicates of the host E. coli strain transformed with the cloning vector alone.

ELISA with rHMW1 antiserum and purified filamentous hemagglutinin. Purified filamentoushemagglutinin wasagift from Pasteur Merieux Serums et Vaccines, Marcy l'Etoile,

VOL. 60, 1992

(3)

1304 BARENKAMP AND LEININGER

France. Enzyme-linked immunosorbent assay (ELISA) plates(Costar, Cambridge, Mass.)werecoated with60 ,ul of a 4-,g/ml solution offilamentous hemagglutinin in Dulbec- co's phosphate-buffered saline per well for 2 h at room temperature. Wells were blocked for 1 h with 1% bovine serum albumin in Dulbecco's phosphate-buffered saline prior toaddition ofserumdilutions. rHMW1 antiserumwas serially diluted in 0.1% Brij (Sigma, St. Louis, Mo.) in Dulbecco's phosphate-bufferedsaline and incubated for 3 h at room temperature. After beingwashed, the plateswere incubated with peroxidase-conjugated goat anti-rabbit IgG antibody(Bio-Rad) for2 hat room temperatureand subse- quentlydeveloped with2,2'-azino-bis(3-ethylbenzthiazoline- 6-sulfonicacid) (Sigma)at a concentration of 0.54mg/mlin 0.1 Msodium citratebuffer, pH 4.2,containing0.03%H202.

Absorbances were read on an automated ELISA reader (Dynatech Laboratories, Chantilly, Va.).

Nucleotide sequence accession numbers. The sequences of the HMW1 and HMW2 genes have been submitted to GenBank. The accession numbers are M84616 for the HMW1 gene andM84615 for the HMW2 gene.

RESULTS

Isolation and restriction mapping of recombinant phage expressing HMW1orHMW2high-molecular-weightrecombi- nant proteins. The nontypeable H. influenzae strain 12 genomic library was screened for clones expressing high- molecular-weight proteins with anE. coli-absorbed human serum sample containing a high titer of antibodies directed againstthehigh-molecular-weightproteins(1).Althoughthis human serumsample contained antibodiesagainstanumber of otherHaemophilus proteins, the undefined natureof the high-molecular-weight proteins and ourlack ofspecific an- tiseraprecluded theuseofa morespecific screening antise- rum.

Numerous strongly reactive clones were identified along with moreweakly reactive ones. Twentystrongly reactive cloneswere plaque-purified and examined byWestern blot forexpressionofrecombinantproteins.Each of thestrongly reactive clonesexpressed one of two types ofhigh-molecu- lar-weightproteins,designated HMW1 and HMW2 (Fig. 1).

Themajor immunoreactiveprotein bands in the HMW1 and HMW2lysatesmigratedwith apparent molecular masses of 125 and 120 kDa, respectively. In addition to the major bands,each lysate contained minor protein bands of higher apparentmolecular weight. Protein bands seen in the HMW2 lysates atmolecular masses of less than 120 kDa were not regularly observed and presumably represent proteolytic degradation products. Lysates of LE392 infected with the XEMBL3 cloningvector alone were nonreactive when im- munologically screened with the same serum sample (data not shown). Thus, the observed activity was not due to cross-reactive E. coli proteins or XEMBL3-encoded pro- teins. Furthermore, the recombinant proteins were not simply binding immunoglobulin nonspecifically, since the proteins were not reactive with the goat anti-human IgG conjugatealone, with normal rabbit sera, or with sera from a number ofhealthy young infants.

Representative clones expressing either the HMW1 or HMW2 recombinant proteins were characterized further.

Therestriction maps ofthe two phage types were -different from eachother,includingtheregions encoding the HMW1 and HMW2 structural genes. Figure 2A shows restriction mapsof representative recombinant phage which contained

kDa200

116

94 -

67

43 HMW1

HMW2

FIG. 1. Westernimmunoblot assayofphagelysates containing eitherthe HMW1 or HMW2 recombinant proteins. Lysateswere probed withanE. coli-absorbed adultserumsamplewithhigh-titer antibodyagainst high-molecular-weight proteins. The arrows indi- catethemajor immunoreactiveproteinbands of 125 and 120 kDain the HMW1and HMW2lysates, respectively.

the HMW1 or HMW2 structural genes. The locations of the structural genesareindicatedby the shaded bars.

Subcloning and expression of the HMW1 structural gene in recombinantplasmids. HMW1plasmid subcloneswerecon- structed byusing the T7 expression plasmid pT7-7 (Fig. 2A and B). HMW2 plasmid subclones were also constructed, but the results with these latter subclones were similar to those observed with the HMW1 constructs; so for the purposesofdiscussion, only the HMW1subcloning data will bepresented.

Theapproximatelocation anddirectionoftranscriptionof the HMW1 structural gene were initially determined by using plasmid pHMW1 (Fig. 2A). This plasmid was con- structed byinserting the8.5-kbBamHI-SalI fragmentfrom XHMW1 into BamHI- and SalI-cut pT7-7. E. coli trans- formed withpHMW1expressed animmunoreactive recom- binantproteinwithanapparentmolecularmassof 115 kDa, whichwas stronglyinducible with IPTG (data notshown).

This protein was significantly smaller than the 125-kDa major protein expressed bythe parentphage, suggestingthat it either was being expressed as a fusion protein or was truncated atthecarboxy terminus.

To more precisely localize the 3' end of the structural gene, we constructed additional plasmidswith progressive deletions from the 3' end ofthepHMW1construct. Plasmid pHMW1-1 was constructed by digestion of pHMW1 with PstI, isolation of the resulting 8.8-kb fragment, and religa- tion. Plasmid pHMW1-2 was constructed by digestion of pHMW1 with HindIll, isolation of the resulting 7.5-kb fragment, and religation. E. coli transformed with either plasmidpHMW1-1 or pHMW1-2 also expressed an immu- noreactive recombinant protein with an apparent molecular massof 115 kDa. These results suggested that the 3' end of the structural gene was 5' of the HindIII site. Figure 3 demonstrates the Western blot results with pHMW1-2-trans- formed cells before and after IPTG induction (lanes 3 and 4, respectively). The 115-kDa recombinant protein is indicated bythearrow. Transformants also demonstrated cross-reac- tive bands of lower apparent molecular weight, which pre- sumablyrepresent partial degradation products. Shown for INFECT. IMMUN.

(4)

HAEMOPHILUS HIGH-MOLECULAR-WEIGHT PROTEIN GENES 1305

A s

A HMW1 3

E Sp B E E M HP S

I I_ '

pHMW1

pHMWI-1

pHMW1-2

pHMW1-4 >-

pHMW1-7

pHMW1-14 >-

B BamHI EEcoRI HHindIlI MMlul PPstI S Sall Sp Spel

D T7Promoter

iiqlkb

S E E E H P H H P E S

IHMW2 III

B bla

llinc 11

pT7-

ConIEI

o. gin

origin 2473 bp

4j Sma I EcoRI Nde I

010 XbaI

/ 1~~~~~~~~~~~~~~~~7promotcr ~ ~

, ~~~~~~~~~~~I~ rI

CGATTCGAACTrCTCGATICGAACI-CTCGATAGAC1TCGAAAITAATACGACTCACTATAGGGAGA

llet aIJ arg ieR

CCACAACGG7'TCCCrCTA(TAA%ATAATlIGlTlAACTITAAGAAGGAGATATACAT+GGCTX AGA4AT-,I

rbs NSdeI E.;oRI arg aLa arg gly ser ser arg wal a.sp lea gin pro lys lek ilt ie asp ...

CGC GCCCfGGGATtTCTAIJLGTCGo, tGCAGCCCAAGC-jrATC ATC(jAT...

SMiaI Bandmi XbaI SaI Pst I HindII ClJI

FIG. 2. (A) Partial restrictionmapsofrepresentative HMW1 and HMW2 recombinant phage and of HMW1 plasmid subclones. The shadedboxesindicatethelocations of thestructural genes. In therecombinantphage,transcription proceedsfromlefttorightfor the HMW1 geneand fromrightto leftforthe HMW2gene. The methods used for construction of theplasmidsshown aredescribed in the text. (B) Restrictionmapof theT7expressionvectorpT7-7.Thisvectorcontains theT7 RNApolymerasepromoter+10,aribosome-bindingsite(rbs), andthetranslationalstartsitefor theT7 gene 10proteinupstreamfromamultiplecloningsite(37).

comparison are the results forE. coli transformed with the pT7-7cloningvectoralone (Fig. 3, lanes 1 and2).

Tomorepreciselylocalize the 5' end of the gene,plasmids pHMW1-4 and pHMW1-7 were constructed. Plasmid pHMW1-4 was constructed by cloning the 5.1-kb BamHI- HindIII fragment from XHMW1 into a pT7-7-derived plas- midcontainingthe upstream3.8-kbEcoRI-BamHIfragment.

E. coli transformed with pHMW1-4 expressed an immuno- reactiveproteinwithanapparentmolecularmassof approx- imately 160 kDa(Fig. 3,lane6). Although proteinproduction

was inducible with IPTG, the levels ofprotein production

in these transformantswere substantially lower than those with thepHMW1-2transformantsdescribed above. Plasmid pHMW1-7 was constructed by digesting pHMW1-4 with NdeI and SpeI. The 9.0-kbp fragment generated by this doubledigestionwasisolated,bluntended, andreligated.E.

coli transformed with pHMW1-7 also expressed an immu- noreactive proteinwith an apparent molecularmass of 160 kDa (data not shown), a protein identical in size to that expressed bythepHMW1-4transformants. This result sug- VOL. 60, 1992

(5)

1306 BARENKAMP AND LEININGER

200K 116K 94K 67K

1 2 3 4 5 6 7 8

pm -I

43K

U I U I U I U I

pT7-7 p HMW-2 pHMW1-14 pHMW1-4

FIG. 3. Western immunoblot assay of cell sonicates prepared from E. coli transformed with plasmid pT7-7 (lanes 1 and 2), pHMW1-2(lanes 3 and4), pHMW1-4(lanes 5 and6),orpHMW1-14 (lanes7and8). The sonicates were probed withanE. coli-absorbed adultserumsample withhigh-titerantibody againsthigh-molecular- weight proteins.Laneslabeled Uand Irepresent sonicatesprepared before and after induction of the growing samples with IPTG, respectively. The arrows indicate protein bands of interest as described in thetext.

gested that the initiation codon for the HMW1 structural gene was 3' of the SpeI site. DNA sequence analysis (describedbelow) confirmed this conclusion.

As noted above, the XHMW1 phage clones expressed a major immunoreactiveband of 125kDa,whereas the HMW1 plasmid clonespHMW1-4 and pHMW1-7,which contained what was believed to be the full-lengthgene, expressed an immunoreactiveprotein ofapproximately160 kDa. Thissize discrepancy was disconcerting. One possible explanation was that an additional gene or genes necessary for correct processingofthe HMW1 geneproductwere deleted inthe process of subcloning. To address thispossibility, plasmid pHMW1-14 was constructed. This construct wasgenerated bydigestingpHMW1withNdeI andMluI andinsertingthe 7.6-kbp NdeI-MluI fragmentisolated frompHMW1-4. Such aconstruct wouldcontainthefull-lengthHMW1gene as well asthe DNA3' ofthe HMW1 gene which was present in the originalHMW1phage.E.coli transformedwiththisplasmid expressed major immunoreactive proteins with apparent molecular masses of 125 and 160 kDaas well as additional degradation products (Fig. 3, lanes 7 and8). The 125- and 160-kDabandswereidenticalto themajorandminorimmu- noreactive bands detected in the HMW1 phage lysates (compare Fig. 1 and 3). Interestingly, the pHMW1-14con- struct also expressed significant amounts of protein in the uninduced condition, a situation not observed with the earlier constructs.

The relationship between the 125- and 160-kDa proteins remains somewhat unclear. Sequence analysis, described below, reveals that theHMW1 gene would be predicted to

encode a protein of 159 kDa. Our speculation is that the 160-kDa protein is a precursor form of the mature 125-kDa protein,with the conversion from one protein to the other beingdependent on the products of a downstream gene or genes.

Sequence analysis of the HMW1 and HMW2 genes. To further characterize the nature of the HMW1 and HMW2 gene products, the sequences of both genes were deter- mined.

HMW1 gene.The nucleotide sequences of both strands of the HMW1 gene and flanking DNA were determined by subcloning restriction fragments intophages M13mpl8and M13mpl9. Figure 4 shows the entire region sequenced and compares the HMW1 sequence with the HMW2 sequence by using the GAP program (5). Also indicatedareseveral of the restriction sites used for restrictionmapping and forgener- ation ofplasmid subclones. Sequenceanalysis of the HMW1 gene revealed a4,608-bpopenreadingframe(ORF), begin- ning with an ATG codon at nucleotide 351 andendingwith a TAG stop codon at nucleotide 4959. A putative ribosome- binding site with the sequence AGGAG begins 10 bp up- streamof theputativeinitiation codon. Five other in-frame ATG codons are located within 250 bp of thebeginningofthe ORF, but none of these ispreceded bya typicalribosome- binding site. The 5'-flanking region of the ORF contains a series of direct tandem repeats, with the 7-bp sequence ATCT7TTC repeated 16 times. These tandem repeats stop 100bp5' of theputativeinitiation codon. An 8-bpinverted repeat characteristic of a rho-independent transcriptional terminator(3, 5)ispresent,beginningatnucleotide4983,25 bp 3' of the presumedtranslational stop. Multiple termina- tion codons are present in all three reading frames both upstreamand downstream of the ORF.The derived amino acidsequence of theproteinencodedbythe HMW1 gene has a molecularweightof159,000, in good agreement with the apparent molecularweightsof theproteinsexpressed bythe HMW1-4 andHMW1-7transformants (Fig. 3). Thederived amino acidsequence of the amino terminus does not dem- onstratethecharacteristicsof atypical signalsequence(43).

The BamHI site used ingenerationofpHMW1comprisesbp 1743 through 1748 of the nucleotide sequence. The ORF downstream ofthe BamHI site would bepredictedtoencode aproteinof 111kDa,ingoodagreement with the 115kDawe estimated for the apparentmolecularmass of the pHMW1- encoded fusionprotein.

HMW2gene. Thenucleotidesequence of the HMW2 gene was similarly determined by subcloning restriction frag- ments from a representative HMW2 phage into M13mpl8 and M13mpl9 phages. The sequence of the HMW2 gene consists of a4,431-bpORF, beginning withanATG codon at nucleotide 352 and ending with a TAG stop codon at nucleotide 4783(Fig.4). The first 1,259bpof theORF ofthe HMW2 gene are identical to those of the HMW1 gene.

Thereafter, the sequences begin to diverge but are 80%

identical overall. With theexception of a single base addition at nucleotide 93 of the HMW2 sequence, the 5'-flanking

FIG. 4. Complete nucleotide sequences of the HMW1 and HMW2 genes and DNA sequences 5' and 3' to the coding regions. The sequences were aligned by usingthe GAP program to identify regions of identity and nonidentity. The putative ribosome-bindingsites correspondtonucleotides341 to 345 in HMW1 and 342 to 346 in HMW2. A single ORF is present in the HMW1 sequence from the initiation codonatnucleotide351tothe termination codonatnucleotide 4959. A single ORF is present in HMW2 from the initiation codon at nucleotide 352tothe termination codonatnucleotide 4783. The 5'-flanking regions of the two genes contain an identical set of direct tandem repeats from positions 129 through 240, with the sequenceATCTlTTCrepeated 16 times. Restriction sites of relevance used in restriction mapping, subcloning,and sequencingareshown in boldface.

INFECT. IMMUN.

(6)

Spel

1 ACAGCGTTCTCTTAATACTAGTACAAACCCACAATAAAATATGACAAACAACAATTACAACACCTTTTTTGCAGTCTATATGCAAATATT 9 0

II l 1111111111111111111111111

1 TAAATATACAAGATAATAAAAATAAATCAAGATTTTTGTGATGACAAACAACAATTACAACACCTTTTTTGCAGTCTATATGCAAATATT 90 91 TT AAAAAATAGTATAAATCCGCCATATAAAATGGTATAATCTTYCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCmTCA 179

II~~~~~~~~~~~~~~~~~~~~I1111I1111I1111Iii1111 1111 1111II11111 11

91 TTAAAAAAATAGTATAAATCCGCCATATAAAATGGTATAATCTTTCATCTTTCA?CTYCA&TCTTTCATCTTTCATCTTTCATCTTTCAT 180

180 CTTTCATCTTYCATCTTTCATCTTTCATCTTTCJLTCTTTC!TCT?TCTCTTTCATC?TTCACATGAAATGATGAACCGAGGGAAGGGAG 269

111 I11111111 1111 111111111111111 11111111111111111111111 111111 11111 1111111

181 CTTTCATC.TTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCACATGAAATGATGAACCGAGGGAAGGGAG 270

270 GGAGGGGCAAGAATGAAGAGGGAGCTGAACG.ACGCAAATGATAAAGTAATTTAATTGTTCAACTAACCTTAG&GAAAATATGAACAAG 3 59 271 GGAGGGGCAAGAATGAAGAGGGAGCTGAACGAACGCAAATGATAAAGTAATTTAATTGTTCAACTAACCTTAGGAGAAAATATGAACAAG 36 0

3 60 ATATATCGTCTCAAATTCAGCAAACGCCTGAATGCTTTGGTTGCTGTGTCTGAATTGGCACGGGGTTGTGACCATTCCACAGAAAAAGGC 4 4 9

111111111111111111I11111111111111111111111111111111111111111111111111111111111111111ii,1111111

361 ATATATCGTCTCAAATTCAGCAAACGCCTGAATGCTTTGGTTGCTGTGTCTGAATTGGCACGGGGTTGTGACCATTCCACAGAAAAAGGC 450

450 AGCGAAAAACCTGCTCGCATGAAAGTGCGTCACTTAGCGTTAAAGCCACTTTCCGCTATGTTACTATCTTTAGGTGTAACATCTATTCCA 539

1111111111111111111111111111111111111111111111111111111111111111111111111111l

451 AGCGAAAAACCTGCTCGCATGAAAGTGCGTCACTTAGCGTTAAAGCCACTTTCCGCTATGTTACTATCTTTAGGTGTAACATCTATTCCA 540

540 CAATCTGTTTTAGCAAGCGGCTTACAAGGAATGGATGTAGTACACGGCACAGCCACTATGCAAGTAGATGGTAATAAAACCATTATCCGC 629 541 CAATCTGTTTTAGCAAGCGGCTTACAAGGAATGGATGTAGTACACGGCACAGCCACTATGCAAGTAGATGGTAATAAAACCATTATCCGC 6 30

630 AACAGTGTTGACGCTATCATTAATTGGAAACAATTTAACATCGACCAAAATGAAATGGTGCAGTTTTTACAAGAAAACAACAACTCCGCC 719

111111111111 11111111111111111 11111111111111111111111111111111111lllllllllllll1111111111111111

631 AACAGTGTTGACGCTATCATTAATTGGAAACAATTTAACATCGACCAAAATGAAATGGTGCAGTTTTTACAAGAAAACAACAACTCCGCC 720

720 GTATTCAACCGTGTTACATCTAACCAAATCTCCCAATTAAAAGGGATTTTAGATTCTAACGGACAAGTCTTTTTAATCAACCCAAATGGT 809

tillllll 111111111 11111111111111111 1111111111111111111111III 1111111 11111111 111111 721 GTATTCAACCGTGTTACATCTAACCAAATCTCCCAATTAAAAGGGATTTTAGATTCTAACGGACAAGTCTTTTTAATCAACCCAAATGGI 810

310 ATCACAATAGGTAAAGACGCAATTATTAACACTAATGGCTTTACGGCTTCTACGCTAGACATTTCTAACGAAAACATCAAGGCGCGTAAT 899 811 ATCACAATAGGTAAAGACGCAATTATTAACACTAATGGCTTTACGGCTTCTACGCTAGACATTTCTAACGAAAACATCAAGGCGCGTAAT 900

900 TTCACCTTCGAGCAAACCAAAGATAAAGCGCTCGCTGAAATTGTGAATCACGGTTTAATTACTGTCGGTAAAGACGGCAGTGTAA&TCTT 989 901 TTCACCTTCGAGCAAACCAAAGATAAAGCGCTCGCTGAAATTGTGAATCACGGTTTAATTACTGTCGGTAAAGACGGCAGTGTAAATCT0 990

990 ATTGGTGGCAAAGTGAAAAACGAGGGTGTGATTAGCGTAAATGGTGGCAGCATTTCTTTACTCGCAGGGCAAAAATCACCATCAGCGAT 1079 Ill ll ll Ill III1111111! 111111111 IIIll llllII I I Il IIlIlIlII I I IIll

991 ATTGGTGGCAAAGTGAAAAACGAGGGTGTGATTAGCGTAAATGGTGGCAGCATTTCTTTACTCGCAGGGCAAAAAATCACCATCAGCGAT 1080

1080 ATAATAAACCCAACCATTACTTACAGCATTGCCGCGCCTGAAAATGAAGCGGTCAATCTGGGCGATATTTTTGCCAAAGGCGGTAACATT 1169

1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

1081 ATAATAAACCCAACCATTACTTACAGCATTGCCGCGCCTGAAAATGAAGCGGTCAATCTGGGCGATATTTTTGCCAAAGGCGGTAACATT 1170

1170 AATGTCCGTGCTGCCACTATTCGAAACCAAGGTAAACTTTCTGCTGATTCTGTAAGCAAAGATAAAAGCGGCAATATTGTTCTTTCCGCC 1259

11111111111111111111111111111111111111111111111111111111111111111111111111111ll1111111111111I

1171 AATGTCCGTGCTGCCACTATTCGAAACCCAAGGTAAACTTTCTGCTGATTCTGTAAGCAAAGATAAAAGCGGCAATATTGTTCTTTCCGCC 1260

1260 AAAGAGGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTCAAAATCAGCAAGCTAAAGGCGGCAAGCTGATGATTACAGGCGATAAAGTC 1349

111111111111111111111111111111111111111111111111111111111111111111111111111111111II1111111111

1261 AAAGAGGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTCAAAATCAGCAAGCTAAAGGCGGCAAGCTGATGATTACAGGCGATAAAGTC 1350

1350 ACATTAAAAACAGGTGCAGTTATCGACCTTTCAGGTAAAGAAGGGGGAGAAACTTACCTTGGCGGTGACGAGCGCGGCGAAGGTAAAAAC 1439

111111111111111111111111111111111111111111111111111111111111111111111111111 1111111111111111

1351 ACATTAAAAACAGGTGCAGTTATCGACCTTTCAGGTAAAGAAGGGGGAGAAACTTACCTTGGCGGTGACGAGCGCGGCGAAGGTAAAAAC 1440

14 4 0 GGCATTCAATTAGCAAAGAAAACCTCTTTAGAAAAAGGCTCAACCATCAATGTATCAGGCAAAGAAAAAGGCGGACGCGCTATTGTGTGG 1529 1441 GGCATTCAATTAGCAAAGAAAACCTCTTTAGAAAAAGGCTCAACCATCAATGTATCAGGCAAAGAAAAAGGCGGACGCGCTATTGTGTGG 1530

1530 GGCGATATTGCGTTAATTGACGGCAATATTAACGCTCAAGGTAGTGGTGATATCGCTAAAACCGGTGGTTTTGTGGAGACGTCGGGGCAT 1619 1531 GGCGATATTGCGTTAATTGACGGCAATATTAACGCTCAAGGTAGTGGTGATATCGCTAAAACCGGTGGTTTTGTGGAGACATCGGGGCAT 1620

1620 GATTTATTCATCAAAGACAATGCAATTGTTGACGCCAAAGAGTGGTTGTTAGACCCGGATAATGTATCTATTAATGCAGAAACAGCAGGA 1709 11111 III 1111111111111 111111111111 1111111 III 11111 III 11 III

1621 TATTTATCCATTGACAGCAATGCAATTGTTAAAACAAAAGAGTGGTTGCTAGACCCTGATGATGTAACAATTGAAGCCGAAGACCCCCTT 1710

1710 CGCAGCAATACTTCAGAAGACGATGAATACACGGGATCCGGGAATAGTGCCAGCACCCCAAAACGAAACAKAGAA.. AAGACAACATTA 1796

1111 11111 II III 111IIIII II 1 III 11 11111 if

1711 CGCAATAATACCGGTATAAATGATGAATTCCCAACAGGCACCGGTGAAGCAAGCGACCCTAAAAAAAATAGCGAACTCAAAACAACGCTA 1800

EcoRI

1797 ACAAACACAACTCTTGAGAGTATACTAAAAAAAGGTACCTTTGTTAACATCACTGCTAATCAACGCATCTATGTCAATAGCTCCATTAAT 1886 11 111111 11 I 11 111111 11 11 11 11 11 11111111 11 11

1801 ACCAATACAACTATTTCA ATTATCTGAAAACGCCTGGACAATGAATATAACGGCATCA.AGAAAACTTACCGTTAATAGCTCAATCAAC 1890

1887 TT... ATCCAATGGCAGCTTAACTCTTTGGAGTGAGGGTCGGAGCGGTGGCGGCGTTGAGATTAACAACGATATTACCACCGGTGATGAT 1973

11III 11111111111III IllllII lllllHIM 11111 11111111

1891 ATCGGAAGCAACTCCCACTTAATTCTCCATAGTAAAGGTCAGCGTGGCGGAGGCGTTCAGATTGATGGAGATATTAC.T 1968 1974 ACCAGAGGTGCAAACTTAACAATTTACTCAGGCGGCTGGGTTGATGTTCATAAAAATATCTCACTCGGGGCGCAAGGTAACATAAACATT 2063

11l1111 111 11111 11111 111111 1111111111111111111111 Ill 11

1969 TCTAAAGGCGGAAATTTAACCATTTATTCTGGCGGATGGGTTGATGTTCATAAAAATATTACGCTTGATCAGGGTTTTTTAAATATTACC 2058

2 0 6 4 ACAGCTAAACAAGATATCGCCTTTG. AGAAAGGAAGCAACCAAG. TCATTACAGGTCAAGGGACTATTACC...TCA 2135

11 II 11 1 l II 11 11 III 11 11IIII III II

2 059 GCCGCTTCCGTAGCTTTTGAAGGTGGAAATAACAAAGCACGCGACGCGGCATGCTAAAATTGTCGCCCAGGGCACTGTAACCATTACA 214 8

1307

Figure

Updating...

References